Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
As I was preparing for my talk tonight on web scraping, I came across a class library that has proved to be invaluable. The HTML Agility Pack is awesome. It allows you to download the HTML from a website and navigate through it like an XML document or using XPath queries. You could do this before by hosting an IE Browser control in your scraping app, and going through the document using DOM. However the IE browser control has a problem with badly formed HTML and unfortunately, most of the data on the web is not well formed. The HTML Agility Pack deals with badly formed HTML just as easily as it does with well formed HTML. This cut down the time required for me to write a scraper from a couple of days to a couple of hours.
People in my fantasy baseball league, beware. I've downloaded a significant amount of baseball data and I know how to use it!