Efficient Discovery of Ontology Functional Dependencies

Baskaran, Sridevi; Keller, Alexander; Chiang, Fei; Lukasz, Golab; Szlichta, Jaroslaw

Computer Science > Databases

arXiv:1611.02737 (cs)

[Submitted on 8 Nov 2016 (v1), last revised 24 May 2017 (this version, v3)]

Title:Efficient Discovery of Ontology Functional Dependencies

Authors:Sridevi Baskaran, Alexander Keller, Fei Chiang, Golab Lukasz, Jaroslaw Szlichta

View PDF

Abstract:Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.

Comments:	12 pages
Subjects:	Databases (cs.DB)
Cite as:	arXiv:1611.02737 [cs.DB]
	(or arXiv:1611.02737v3 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1611.02737

Submission history

From: Jaroslaw Szlichta [view email]
[v1] Tue, 8 Nov 2016 22:03:35 UTC (4,257 KB)
[v2] Wed, 16 Nov 2016 05:13:36 UTC (5,150 KB)
[v3] Wed, 24 May 2017 01:44:45 UTC (4,506 KB)

Computer Science > Databases

Title:Efficient Discovery of Ontology Functional Dependencies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Efficient Discovery of Ontology Functional Dependencies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators