On the role of words in the network structure of texts: application to authorship attribution

Akimushkin, Camilo; Amancio, Diego R.; Oliveira Jr, Osvaldo N.

doi:10.1016/j.physa.2017.12.054

Computer Science > Computation and Language

arXiv:1705.04187 (cs)

[Submitted on 11 May 2017]

Title:On the role of words in the network structure of texts: application to authorship attribution

Authors:Camilo Akimushkin, Diego R. Amancio, Osvaldo N. Oliveira Jr

View PDF

Abstract:Well-established automatic analyses of texts mainly consider frequencies of linguistic units, e.g. letters, words and bigrams, while methods based on co-occurrence networks consider the structure of texts regardless of the nodes label (i.e. the words semantics). In this paper, we reconcile these distinct viewpoints by introducing a generalized similarity measure to compare texts which accounts for both the network structure of texts and the role of individual words in the networks. We use the similarity measure for authorship attribution of three collections of books, each composed of 8 authors and 10 books per author. High accuracy rates were obtained with typical values from 90% to 98.75%, much higher than with the traditional the TF-IDF approach for the same collections. These accuracies are also higher than taking only the topology of networks into account. We conclude that the different properties of specific words on the macroscopic scale structure of a whole text are as relevant as their frequency of appearance; conversely, considering the identity of nodes brings further knowledge about a piece of text represented as a network.

Subjects:	Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Cite as:	arXiv:1705.04187 [cs.CL]
	(or arXiv:1705.04187v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1705.04187
Journal reference:	Physica A v. 495, p. 49-58, 2018
Related DOI:	https://doi.org/10.1016/j.physa.2017.12.054

Submission history

From: Diego Amancio Dr. [view email]
[v1] Thu, 11 May 2017 14:00:10 UTC (513 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-05

Change to browse by:

cs
cs.SI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Camilo Akimushkin
Diego R. Amancio
Osvaldo N. Oliveira Jr.

export BibTeX citation

Computer Science > Computation and Language

Title:On the role of words in the network structure of texts: application to authorship attribution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the role of words in the network structure of texts: application to authorship attribution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators