Data extraction from the web based on pre-defined schema
摘要:
With the development of the Internet, the World Wide Web has become an invaluable information source for most organizations. However, most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents. Effectively extracting data from such documents remains a non-trivial task. In this paper, we present a schema-guided approach to extracting data from HTML pages. Under the approach, the user defines a schema specifying what to be extracted and provides sample mappings between the schema and the HTML page. The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required data in the form of XML conforming to the user-defined schema. A prototype system implementing the approach has been developed. The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy.
Data extraction from the web based on pre-defined schema
摘要:
With the development of the Internet, the World Wide Web has become an invaluable information source for most organizations. However, most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents. Effectively extracting data from such documents remains a non-trivial task. In this paper, we present a schema-guided approach to extracting data from HTML pages. Under the approach, the user defines a schema specifying what to be extracted and provides sample mappings between the schema and the HTML page. The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required data in the form of XML conforming to the user-defined schema. A prototype system implementing the approach has been developed. The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy.
Base-Catalyzed Dehydrogenative Si-O Coupling of Dihydrosilanes: Silylene Protection of Diols
作者:Martin Oestreich、Agnieszka Grajewska
DOI:10.1055/s-0030-1258055
日期:2010.10
The direct dehydrogenativecoupling of 1,3- and 1,4-diols and dihydrosilanes is efficiently catalyzed by Cs 2 CO 3 (10 mol%), cleanly affording six- and seven-membered 1,3-dioxo-2-silacycles with dihydrogen as the sole by-product. Conversely, 1,2-diols do not yield the expected 1,3-dioxo-2-silacyclopentanes, essentially forming cyclic disiloxanes instead. Aside from the synthetic convenience, the procedure