Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
484 views
in Technique[技术] by (71.8m points)

I've get a resultset by python BeautifulSoup, but I don't know how to fetch the NavigableString inside them

html_text = driver.page_source
soup = BeautifulSoup(html_text, "html.parser")
get_details = soup.find_all('li', attrs={"class":"news"})
# get_details is an aggregation of results fetched by BeautifulSoup find_all() method

one instance of the resultset is as below:

<li class="news">blah blah blah what i want blah blah blah  <a href="/graphic/graphicInfoData/000002230030421305">View details</a></li>

What I want is the "blah blah blah what i want blah blah blah", the so-called Navigable string in BeautifulSoup. But I can not use .string attribute to a list, even when I use the print(get_details[0].string), the result is None, why?

by the way , as a comparison, below code works!

print(get_details[0].a.string)
>>> print(get_details[0].li.string)
    Traceback (most recent call last):
    File "<pyshell#57>", line 1, in <module>
    print(get_details[0].li.string)
    AttributeError: 'NoneType' object has no attribute 'string'

Any thoughts will be highly appreciated!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use .get_text() instead of .string:

print(get_details[0].a.get_text())

Output: View details

print(get_details[0].get_text())

Output: blah blah blah what i want blah blah blah View details

Be aware, that get_details[0].get_text() will get all the text of the li.

Following will only get the first part:

get_details[0].contents[0].strip()

Output: blah blah blah what i want blah blah blah


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...