Tuesday, November 4, 2008

Beautiful Soup is a cruel mistress

So I came upon a much-lauded Python HTML/XML parser named Beautiful Soup that is "designed for quick turnaround projects like screen-scraping". Perfect for my project, right? Probably, except that I'm a complete novice at Python and am having extreme difficulty wading through the documentation about the module, despite how extensive and detailed it is. Ironically, the interweb is a very small place and it just so happens that the creator of Beautiful Soup is my friend Sumana's husband, so if need be I suppose I could make a personal plea for help (albeit at the risk of revealing my total ignorance and losing their respect forever).

Behold, the result of hours of trying to fiddle with Beautiful Soup:

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://icasualties.org/oif/USDeaths.aspx")
soup = BeautifulSoup(page)
for para in soup('p'):
print para
print

D'oh.

No comments: