Unsupervised Construction of Knowledge Graphs From Text and Code

Cao, Kun; Fairbanks, James

Computer Science > Machine Learning

arXiv:1908.09354 (cs)

[Submitted on 25 Aug 2019]

Title:Unsupervised Construction of Knowledge Graphs From Text and Code

Authors:Kun Cao, James Fairbanks

View PDF

Abstract:The scientific literature is a rich source of information for data mining with conceptual knowledge graphs; the open science movement has enriched this literature with complementary source code that implements scientific models. To exploit this new resource, we construct a knowledge graph using unsupervised learning methods to identify conceptual entities. We associate source code entities to these natural language concepts using word embedding and clustering techniques. Practical naming conventions for methods and functions tend to reflect the concept(s) they implement. We take advantage of this specificity by presenting a novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem. With our pipeline, we aim to assist scientists in building on existing models in their discipline when making novel models for new phenomena. By combining source code and conceptual information, our knowledge graph enhances corpus-wide understanding of scientific literature.

Comments:	25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 15th International Workshop On Mining and Learning with Graphs
Subjects:	Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Cite as:	arXiv:1908.09354 [cs.LG]
	(or arXiv:1908.09354v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1908.09354

Submission history

From: Kun Cao [view email]
[v1] Sun, 25 Aug 2019 16:10:31 UTC (1,214 KB)

Computer Science > Machine Learning

Title:Unsupervised Construction of Knowledge Graphs From Text and Code

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Unsupervised Construction of Knowledge Graphs From Text and Code

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators