I've written quite a few HTML scrapers (reading an HTML page, and parsing out information contained in it) and the biggest part of these programs are the string manipulation. I usually break the HTML page up into string arrays and run through the array looking for keywords. In .NET, you can break strings up using the .Split method of a string object or you can use Regular expressions. I find regular expressions powerful, but cryptic to write and maintain, so I use the split method more often than not. Darren Niemke has benchmarked different methods of
spliting strings in .NET