Data extraction from the web based on pre-defined schema
摘要:
With the development of the Internet, the World Wide Web has become an invaluable information source for most organizations. However, most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents. Effectively extracting data from such documents remains a non-trivial task. In this paper, we present a schema-guided approach to extracting data from HTML pages. Under the approach, the user defines a schema specifying what to be extracted and provides sample mappings between the schema and the HTML page. The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required data in the form of XML conforming to the user-defined schema. A prototype system implementing the approach has been developed. The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy.
The KHMDS-catalyzed tertiary alkylation of aldehydes, ketones or imines using tertiary benzylic organoboronates is reported. This protocol permitted the use of tertiary benzylic alkylboronates as t...
Carbonium ion rearrangement in the synthesis of γ-lactones from (E)-β-alkylcinnamic acids
作者:Lars Jalander
DOI:10.1016/s0040-4039(00)99910-6
日期:1984.1
A possible formation of a bridged carboniumion intermediate in the lactonization of (E)-β-t-butylcinnamic acid is discussed on the basis of deuteriation and 13C NMR experiments.
基于氘化和13 C NMR实验,讨论了在(E)-β-叔丁基肉桂酸的内酯化过程中桥碳鎓离子中间体的可能形成。
JALANDER, L., TETRAHEDRON LETT., 1984, 25, N 4, 457-460