WebScraper (Web Data Extraction)

  • Rating:
  • Version: 0.1b
  • Publisher:
    websitescraper.sourceforge.net
  • File Size: 1.35 MB
  • Date: Mar 05, 2010
  • License: Free
  • Category:
    Web Search Tool
    Internet
WebScraper (Web Data Extraction) Download
Free Download WebScraper (Web Data Extraction) 0.1b

A web scraper to help you with your work. WebScraper sets out to create a powerful, robust, quick to deploy, web scraper. A web scraper is a tool that extracts specific parts of web pages rather than the entire html page as a crawler would.

The outcome of this project will be a tool, written in Java, that accomplishes the following
1. A powerful, fast, reliable way of scraping the web,
2. Simple to set up - no cumbersome XML configuration to write,
3. Fully automatable,
4. Extension points for parsing and exporting

Syntax
1. Parser - the extension that parses the HTML and creates a subset of that HTML (chunk) for export
2. Chunk Name - the label assigned to an HTML chunk.
3. Chunk - a part of the HTML we wish to capture.

The configuration file syntax is what defines which parts of the HTML we are going to capture. The first part is the chunk name we choose for the chunk we want to capture. Then a separator <|>. Next comes the name of the parser, in this case, lets say we want to use the token parser. Then come the parameters for this parser and a final separator of <||>. This repeats for as many pieces of information you want to capture per page.
For example, if we wish to capture the title of an HTML page, the following will capture most.
title <|>token:<title>####</title><||>

Requirements:
* Java

The license of this software is Free, you can free download and free use this web search tool software.

More Details:
Related Software: