阿布云

你所需要的，不仅仅是一个好用的代理。

Python 通过BeautifulSoup 来解析Html

发表于 2018-03-16

pip install beautifulsoup4

pip install lxml

pip install html5lib

lxml 和 html5lib 是解析器

<html><head><title>The Website Title</title></head>

<body>

Download my <strong>Python</strong> book from <a href="http://inventwithpython.com">my website</a>.

Learn Python the easy way!

By <span id="author">Al Sweigart</span>

</body></html>

上面的html保存html文件

import bs4

exampleFile = open('example.html')

exampleSoup = bs4.BeautifulSoup(exampleFile.read(),'html5lib')

elems = exampleSoup.select('#author')

type(elems)

print (elems[0].getText())

结果输出 Al Sweigart

BeautifulSoup 使用select 方法寻找元素，类似jquery的css选择器

soup.select(‘div’) ———————–所有为<div>的元素

soup.select(‘#author’)—————–id为author的元素

soup.select(‘.notice’)——————class 为notice的元素

参考《Python 编程快速上手—–让繁琐工作自动化》

新闻中心