页:
[1]
|
如何用preg_match提取网页上面的中文?
如题,下面是一段html代码:
$lstr='<table width="100%" border="0" cellpadding="2" cellspacing="0" class="style1">
<tr>
<td width="28%">型号: </td>
<td width="72%" class="style6">2K3324</td>
</tr>
<tr>
<td>名称: <br></td>
<td class="style6">商品1</td>
</tr>
<tr>
<td>价格<br></td>
<td class="style6">200</td>
</tr>
<tr>
<td>出品: <br></td>
<td class="style6">中国生产</td>
</tr>
<tr>
<td>材质与标准: <br></td>
<td class="style6">ppc</td>
</tr>
<tr>
<td>使用电池: <br></td>
<td class="style6">AAA×3</td>
</tr>
<tr>
<td>规格: <br></td>
<td class="style6">15cm</td>
</tr>
<tr>
<td>装箱: <br></td>
<td class="style6">200件</td>
</tr>
<tr>
<td>毛重: <br></td>
<td class="style6">0.5</td>
</tr>
<tr>
<td>备注: </td>
<td class="style6">该商品必须配备6A电池使用。</td>
</tr>
<tr>
<td>描述: </td>
<td class="style6">该商品可以替代目前市面上的其他同类产品。</td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
</table>';
如果和把上面代码中的’型号: ‘,’名称: ‘,‘价格:‘等后面的值都提取出来啊?
謝謝。。。。。
[[i] 本帖最后由 waitfy 于 2008-6-23 04:35 PM 编辑 [/i]] |
|
各位老大,给点意见吧。
我用下面的语句,已经过滤了一部分了,但是还是有不少html标签。
preg_match_all ("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/",$makeFile,$str);
结果:
Array
(
[0] => Array
(
[0] => <td width="28%">型号: </td>
[1] => <td width="72%" class="style6">2K3324</td>
[2] => <td>名称: <br></td>
[3] => <td class="style6">商品1</td>
[4] => <td>价格<br></td>
[5] => <td class="style6">200</td>
[6] => <td>出品: <br></td>
[7] => <td class="style6">中国生产</td>
[8] => <td>材质与标准: <br></td>
[9] => <td class="style6">ppc</td>
[10] => <td>使用电池: <br></td>
[11] => <td class="style6">AAA×3</td>
[12] => <td>规格: <br></td>
[13] => <td class="style6">15cm</td>
[14] => <td>装箱: <br></td>
[15] => <td class="style6">200件</td>
[16] => <td>毛重: <br></td>
[17] => <td class="style6">0.5</td>
[18] => <td>备注: </td>
[19] => <td class="style6">该商品必须配备6A电池使用。</td>
[20] => <td>描述: </td>
[21] => <td class="style6">该商品可以替代目前市面上的其他同类产品。</td>
[22] => <td> </td>
[23] => <td> </td>
)
[1] => Array
(
[0] => <td width="28%">
[1] => <td width="72%" class="style6">
[2] => <td>
[3] => <td class="style6">
[4] => <td>
[5] => <td class="style6">
[6] => <td>
[7] => <td class="style6">
[8] => <td>
[9] => <td class="style6">
[10] => <td>
[11] => <td class="style6">
[12] => <td>
[13] => <td class="style6">
[14] => <td>
[15] => <td class="style6">
[16] => <td>
[17] => <td class="style6">
[18] => <td>
[19] => <td class="style6">
[20] => <td>
[21] => <td class="style6">
[22] => <td>
[23] => <td>
)
[2] => Array
(
[0] => td
[1] => td
[2] => td
[3] => td
[4] => td
[5] => td
[6] => td
[7] => td
[8] => td
[9] => td
[10] => td
[11] => td
[12] => td
[13] => td
[14] => td
[15] => td
[16] => td
[17] => td
[18] => td
[19] => td
[20] => td
[21] => td
[22] => td
[23] => td
)
[3] => Array
(
[0] => 型号:
[1] => 2K3324
[2] => 名称: <br>
[3] => 商品1
[4] => 价格<br>
[5] => 200
[6] => 出品: <br>
[7] => 中国生产
[8] => 材质与标准: <br>
[9] => ppc
[10] => 使用电池: <br>
[11] => AAA×3
[12] => 规格: <br>
[13] => 15cm
[14] => 装箱: <br>
[15] => 200件
[16] => 毛重: <br>
[17] => 0.5
[18] => 备注:
[19] => 该商品必须配备6A电池使用。
[20] => 描述:
[21] => 该商品可以替代目前市面上的其他同类产品。
[22] =>
[23] =>
)
[4] => Array
(
[0] => </td>
[1] => </td>
[2] => </td>
[3] => </td>
[4] => </td>
[5] => </td>
[6] => </td>
[7] => </td>
[8] => </td>
[9] => </td>
[10] => </td>
[11] => </td>
[12] => </td>
[13] => </td>
[14] => </td>
[15] => </td>
[16] => </td>
[17] => </td>
[18] => </td>
[19] => </td>
[20] => </td>
[21] => </td>
[22] => </td>
[23] => </td>
)
)
怎么修改这个匹配或者还有什么办法可以重新匹配呢?
望各位老大能够给点意见。
先谢了。 |
|
如果需要解析的就是这个形式的字符串的话
可以用strip_tags去掉所有的HTML标记,然后用exploded “\r\n”来分割,就是会有很多空值
但是用preg_match的话,有很多需要单独做匹配才能得到 |
|
不好意思,我自己看了一下,还是可以匹配的:)不过要一项一项替换
preg_match_all("/td.*[^<]/",$lstr,$preg);
for($i=0;$i<count($preg[0]);$i++)
{
$str=preg_replace("/td[^>]*?>/","",$preg[0][$i]);
} |
| fjyxian | 2008-6-24 09:10 AM |
|
<?
$lstr='<table width="100%" border="0" cellpadding="2" cellspacing="0" class="style1">
<tr>
<td width="28%">型号: </td>
<td width="72%" class="style6">2K3324</td>
</tr>
<tr>
<td>名称: <br></td>
<td class="style6">商品1</td>
</tr>
<tr>
<td>价格<br></td>
<td class="style6">200</td>
</tr>
<tr>
<td>出品: <br></td>
<td class="style6">中国生产</td>
</tr>
<tr>
<td>材质与标准: <br></td>
<td class="style6">ppc</td>
</tr>
<tr>
<td>使用电池: <br></td>
<td class="style6">AAA×3</td>
</tr>
<tr>
<td>规格: <br></td>
<td class="style6">15cm</td>
</tr>
<tr>
<td>装箱: <br></td>
<td class="style6">200件</td>
</tr>
<tr>
<td>毛重: <br></td>
<td class="style6">0.5</td>
</tr>
<tr>
<td>备注: </td>
<td class="style6">该商品必须配备6A电池使用。</td>
</tr>
<tr>
<td>描述: </td>
<td class="style6">该商品可以替代目前市面上的其他同类产品。</td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
</table>';
$tc = explode("6\">",$lstr);
for ($i=0;$i<=count($tc);$i++){
$td = explode("<",$tc[$i]);
$txt .= str_replace(" ","",$td[0])."<br>";
}
echo $txt;
?>
这样好象可以 |
| fjyxian | 2008-6-24 09:35 AM |
|
$txt = "";
while (list($key,$value) = each($matches[0])) {
$value = str_replace("6\">","",$value);
$txt .= str_replace("<","<br>",$value);
}
echo $txt;
和上面一样的效果 |
|
嗯,这个用正则的话
这样就就可以了
$lstr = str_replace(array("<br>","\n"),"",$lstr);
preg_match_all("/<tr>\s*<td[^<]*>([^<]*)<\/td>\s*<td[^>]*>([^<]*)<\/td>\s*<\/tr>/",$lstr,$data);
print_r($data); |
|
用DOM也可以的吧
$doc = new DOMDocument();
$doc->loadHTML($lstr);
$tr_node_list = $doc->getElementsByTagName("tr");
foreach($tr_node_list as $node){
$td_node_list = $node->getElementsByTagName("td");
$k = $td_node_list->item(0)->textContent;
$v = $td_node_list->item(1)->textContent;
$data_arr[$k] = $v;
} |
|
| 好好看看JS的正则吧,,要取的东西是用(.*?......)这样的东西取的 |
Powered by Discuz! Archiver 6.1.0
© 2001-2006 Comsenz Inc.
Processed in 0.01095 second(s), 2 queries |