Volume No. :   6

Issue No. :  3

Year :  2015

ISSN Print :  0976-2973

ISSN Online :  2321-581X


Allready Registrered
Click to Login

Web Data Extraction and Alignment Tools: A survey

Address:   Pranali Nikam, Yogita Gote, Vidhya Ghogare, Jyothi Rapalli
Student, Department of I.T , DYPIET, Pune
*Corresponding Author
DOI No: 10.5958/2321-581X.2015.00060.4

Data extraction from the web pages is the process of analyzing and retrieving relevant data out of the data sources (usually unstructured or poorly structure) in a specific pattern for further processing, involves addition of metadata and data integration details for further process in the data workflow. This survey describes overview of the different web data extraction and data alignment techniques. Extraction techniques are DeLa, DEPTA, ViPER, and ViNT. Data alignment techniques are Pairwise QRR alignment, Holistic alignment, Nested structure processing. Query Result pages are generated by using Web database based on Users Query. The data from these query result pages should be automatically extracted which is very important for many applications, such as data integration, which are needed to cooperate with multiple web databases. New method is proposed for data extraction t that combines both tag and value similarity. It automatically extracts data from query result pages by first identifying and segmenting the query result records (QRRs) in the query result pages and then aligning the segmented QRRs into a table. In which the data values from the same attribute are put into the same column. Data region identification method identify the noncontiguous QRRs that have the same parents according to their tag similarities. Speci?cally, we propose new techniques to handle the case when the QRRs are not contiguous, which may be due to presence of auxiliary information, such as a comment, recommendation or advertisement, and for handling any nested structure that may exist in the QRRs.
Combining Tag And Value Similarity(CTVS),Query result Record(QRR), Data extraction and label assignment for web database(DeLa), Data Extraction Based on Partial Tree Alignment(DEPTA), Visual Perception based Extraction of Records (ViPER), Visual information and Tag Structure based wrapper generator
Pranali Nikam, Yogita Gote, Vidhya Ghogare, Jyothi Rapalli . Web Data Extraction and Alignment Tools: A survey. Research J. Engineering and Tech. 6(3): July- Sept., 2015 page 381-386.
[View HTML]      [View PDF]

Visitor's No. :   145055