| Other screen shots: |
|---|
| Email parser |
| Syntax checker |
| Text search |
| Search-and-replace |
| Word pairs |
First, a simple example - scraping stock quotes from http://finance.yahoo.com. The web pages generated by this site contain the text "Last Trade:" followed by a number. Actually, it's :
The "other stuff" is invisible HTML code that can simply be ignored for this application.
Here's what the "Last Trade:" scraper looks like :
This parser first locates the text "Last Trade:" on the web page, then looks for a Price pattern, i.e. a decimal number. When it finds that it executes a sequence of actions called NewQuote that transmit the number, LastPrice, to the database.
This example can easily be expanded to extract any number of prices from the web page.
Now for something more complex :
This parser first searches the web page for one of the entries in the "TableLocator" string set. This allows skipping forward to the desired table. The parser then looks for the first table row (TR tag), then extracts the contents of the first three table data (TD) tags, proceeding row-by-row until the end of the table.
This parser is simplified by the use of several node groups :
| TR | - HTML table row parser |
|---|---|
| TD | - HTML table cell parser |
| HTMLTag | - HTML tag parser |
| HTMLElement | - HTML element parser |
This parser can be adapted to parse HTML tables with any number of columns. It would typically be used with WWWGrab.
| Set Machine home | Download | Register | Tutorial | Help | Glossary | Contact info |