Other screen shots:
Email parser
Syntax checker
Text search
Search-and-replace
Word pairs

Set Machine screen shot :  web scraper / HTML parser

First, a simple example - scraping stock quotes from http://finance.yahoo.com.   The web pages generated by this site contain the text "Last Trade:" followed by a number.   Actually, it's :

The "other stuff" is invisible HTML code that can simply be ignored for this application.

Here's what the "Last Trade:" scraper looks like :

stock quote parser

This parser first locates the text "Last Trade:" on the web page, then looks for a Price pattern, i.e. a decimal number.   When it finds that it executes a sequence of actions called NewQuote that transmit the number, LastPrice, to the database.

This example can easily be expanded to extract any number of prices from the web page.

Now for something more complex :


HTML table parser

HTML table parser

This parser first searches the web page for one of the entries in the "TableLocator" string set.   This allows skipping forward to the desired table.   The parser then looks for the first table row (TR tag), then extracts the contents of the first three table data (TD) tags, proceeding row-by-row until the end of the table.  

This parser is simplified by the use of several node groups :

TR - HTML table row parser
TD - HTML table cell parser
HTMLTag - HTML tag parser
HTMLElement - HTML element parser

This parser can be adapted to parse HTML tables with any number of columns.   It would typically be used with WWWGrab.




Copyright © 2002-2008 WWWGrab.com.   All Rights Reserved.