Ground truth? Concept-based communities versus the external classification of physics manuscripts

Palchykov, Vasyl; Gemmetto, Valerio; Boyarsky, Alexey; Garlaschelli, Diego

doi:10.1140/epjds/s13688-016-0090-4

Computer Science > Digital Libraries

arXiv:1602.08451 (cs)

[Submitted on 6 Feb 2016]

Title:Ground truth? Concept-based communities versus the external classification of physics manuscripts

Authors:Vasyl Palchykov, Valerio Gemmetto, Alexey Boyarsky, Diego Garlaschelli

View PDF

Abstract:Community detection techniques are widely used to infer hidden structures within interconnected systems. Despite demonstrating high accuracy on benchmarks, they reproduce the external classification for many real-world systems with a significant level of discrepancy. A widely accepted reason behind such outcome is the unavoidable loss of non-topological information (such as node attributes) encountered when the original complex system is represented as a network. In this article we emphasize that the observed discrepancies may also be caused by a different reason: the external classification itself. For this end we use scientific publication data which i) exhibit a well defined modular structure and ii) hold an expert-made classification of research articles. Having represented the articles and the extracted scientific concepts both as a bipartite network and as its unipartite projection, we applied modularity optimization to uncover the inner thematic structure. The resulting clusters are shown to partly reflect the author-made classification, although some significant discrepancies are observed. A detailed analysis of these discrepancies shows that they carry essential information about the system, mainly related to the use of similar techniques and methods across different (sub)disciplines, that is otherwise omitted when only the external classification is considered.

Comments:	15 pages, 2 figures
Subjects:	Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
Cite as:	arXiv:1602.08451 [cs.DL]
	(or arXiv:1602.08451v1 [cs.DL] for this version)
	https://doi.org/10.48550/arXiv.1602.08451
Journal reference:	EPJ Data Science 2016 5:28
Related DOI:	https://doi.org/10.1140/epjds/s13688-016-0090-4

Submission history

From: Vasyl Palchykov [view email]
[v1] Sat, 6 Feb 2016 13:49:16 UTC (294 KB)

Computer Science > Digital Libraries

Title:Ground truth? Concept-based communities versus the external classification of physics manuscripts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Digital Libraries

Title:Ground truth? Concept-based communities versus the external classification of physics manuscripts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators