Semi-structured data extraction and modelling: the WIA Project

Colombo, Gianluca; Colombo, Ettore; Bonomi, Andrea; Mosca, Alessandro; Bassis, Simone

doi:10.4204/EPTCS.130.16

Computer Science > Software Engineering

arXiv:1309.7697 (cs)

[Submitted on 30 Sep 2013]

Title:Semi-structured data extraction and modelling: the WIA Project

Authors:Gianluca Colombo (Technological Transfer Consortium - C2T, Milan, Italy), Ettore Colombo (Technological Transfer Consortium - C2T, Milan, Italy), Andrea Bonomi (Technological Transfer Consortium - C2T, Milan, Italy), Alessandro Mosca (KRDB Research Centre - Free University of Bozen-Bolzano, Italy), Simone Bassis (Department of Computer Science - University of Milano, Italy)

View PDF

Abstract:Over the last decades, the amount of data of all kinds available electronically has increased dramatically. Data are accessible through a range of interfaces including Web browsers, database query languages, application-specific interfaces, built on top of a number of different data exchange formats. All these data span from un-structured to highly structured data. Very often, some of them have structure even if the structure is implicit, and not as rigid or regular as that found in standard database systems. Spreadsheet documents are prototypical in this respect. Spreadsheets are the lightweight technology able to supply companies with easy to build business management and business intelligence applications, and business people largely adopt spreadsheets as smart vehicles for data files generation and sharing. Actually, the more spreadsheets grow in complexity (e.g., their use in product development plans and quoting), the more their arrangement, maintenance, and analysis appear as a knowledge-driven activity. The algorithmic approach to the problem of automatic data structure extraction from spreadsheet documents (i.e., grid-structured and free topological-related data) emerges from the WIA project: Worksheets Intelligent Analyser. The WIA-algorithm shows how to provide a description of spreadsheet contents in terms of higher level of abstractions or conceptualisations. In particular, the WIA-algorithm target is about the extraction of i) the calculus work-flow implemented in the spreadsheets formulas and ii) the logical role played by the data which take part into the calculus. The aim of the resulting conceptualisations is to provide spreadsheets with abstract representations useful for further model refinements and optimizations through evolutionary algorithms computations.

Comments:	In Proceedings Wivace 2013, arXiv:1309.7122
Subjects:	Software Engineering (cs.SE); Computers and Society (cs.CY); Neural and Evolutionary Computing (cs.NE)
ACM classes:	H3; I.2; H.1.2
Cite as:	arXiv:1309.7697 [cs.SE]
	(or arXiv:1309.7697v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.1309.7697
Journal reference:	EPTCS 130, 2013, pp. 98-103
Related DOI:	https://doi.org/10.4204/EPTCS.130.16

Submission history

From: EPTCS [view email] [via EPTCS proxy]
[v1] Mon, 30 Sep 2013 01:06:56 UTC (93 KB)

Computer Science > Software Engineering

Title:Semi-structured data extraction and modelling: the WIA Project

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Semi-structured data extraction and modelling: the WIA Project

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators