Author(s): Pranali Nikam, Yogita Gote, Vidhya Ghogare, Jyothi Rapalli

Email(s): ghogare.vidhya@gmail.com

DOI: 10.5958/2321-581X.2015.00060.4   

Address: Pranali Nikam, Yogita Gote, Vidhya Ghogare, Jyothi Rapalli
Student, Department of I.T , DYPIET, Pune
*Corresponding Author

Published In:   Volume - 6,      Issue - 3,     Year - 2015


ABSTRACT:
Data extraction from the web pages is the process of analyzing and retrieving relevant data out of the data sources (usually unstructured or poorly structure) in a specific pattern for further processing, involves addition of metadata and data integration details for further process in the data workflow. This survey describes overview of the different web data extraction and data alignment techniques. Extraction techniques are DeLa, DEPTA, ViPER, and ViNT. Data alignment techniques are Pairwise QRR alignment, Holistic alignment, Nested structure processing. Query Result pages are generated by using Web database based on Users Query. The data from these query result pages should be automatically extracted which is very important for many applications, such as data integration, which are needed to cooperate with multiple web databases. New method is proposed for data extraction t that combines both tag and value similarity. It automatically extracts data from query result pages by first identifying and segmenting the query result records (QRRs) in the query result pages and then aligning the segmented QRRs into a table. In which the data values from the same attribute are put into the same column. Data region identification method identify the noncontiguous QRRs that have the same parents according to their tag similarities. Speci?cally, we propose new techniques to handle the case when the QRRs are not contiguous, which may be due to presence of auxiliary information, such as a comment, recommendation or advertisement, and for handling any nested structure that may exist in the QRRs.


Cite this article:
Pranali Nikam, Yogita Gote, Vidhya Ghogare, Jyothi Rapalli . Web Data Extraction and Alignment Tools: A survey. Research J. Engineering and Tech. 6(3): July- Sept., 2015 page 381-386. doi: 10.5958/2321-581X.2015.00060.4

Cite(Electronic):
Pranali Nikam, Yogita Gote, Vidhya Ghogare, Jyothi Rapalli . Web Data Extraction and Alignment Tools: A survey. Research J. Engineering and Tech. 6(3): July- Sept., 2015 page 381-386. doi: 10.5958/2321-581X.2015.00060.4   Available on: https://ijersonline.org/AbstractView.aspx?PID=2015-6-3-13


Recomonded Articles:

Author(s): S. S. K. Deepak

DOI:         Access: Open Access Read More

Author(s): A. Narmada, P. Sudhakara Rao

DOI: 10.5958/2321-581X.2018.00029.6         Access: Open Access Read More

Author(s): Shubhangi A. Wakode, Sunil R. Gupta

DOI: 10.5958/2321-581X.2015.00039.2         Access: Open Access Read More

Author(s): Chaitali Katpatal, Pallavi Bijwe, Rashmi Fulper, Prof. B. M. Hardas

DOI: 10.5958/2321-581X.2019.00020.5         Access: Open Access Read More

Author(s): Deepti Verma, Deepika Chandrawanshi

DOI:         Access: Open Access Read More

Author(s): Bhalchandra S. Tankkar, Swapnil Wanjari

DOI: 10.5958/2321-581X.2015.00061.6         Access: Open Access Read More

Author(s): S. Sanjeev Kumar, N. Balamurugan

DOI:         Access: Open Access Read More

Author(s): Mandeep Singh Walia

DOI: 10.5958/2321-581X.2016.00010.6         Access: Open Access Read More

Author(s): Bhumika S. Zalavadia

DOI:         Access: Open Access Read More

Author(s): Mohan Awasti, Navneet Kumar Sahu, M.K. Kowar

DOI:         Access: Open Access Read More

Author(s): Pranali Nikam, Yogita Gote, Vidhya Ghogare, Jyothi Rapalli

DOI: 10.5958/2321-581X.2015.00060.4         Access: Open Access Read More

Research Journal of Engineering and Technology (RJET) is an international, peer-reviewed, research journal aiming at promoting and publishing original high quality research in all disciplines of engineering sciences and technology....... Read more >>>

RNI: Not Available                     
DOI: 10.5958/2321-581X 


Recent Articles




Tags