Tuesday, November 20, 2007

Scraping Web Pages with Python

Beautiful Soup is a Python HTML/XML parser. The following features make it stand out for screen-scraping: (1) high tolerance of web pages with bad markup; (2) simple methods for data extraction from web pages; (3) automatic conversion of web pages to Unicode/UTF-8 encoding.

No comments: