Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

Ta, Hoang Thang; Gelbukha, Alexander; Sidorov, Grigori

Computer Science > Computation and Language

arXiv:2210.12659 (cs)

[Submitted on 23 Oct 2022]

Title:Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

Authors:Hoang Thang Ta, Alexander Gelbukha, Grigori Sidorov

View PDF

Abstract:Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including developing content for over 300 languages at the present. Therefore, the benefit that machines can automatically generate content to reduce human efforts on Wikipedia language projects could be considerable. In this paper, we propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models. The results are helpful not only for the data-to-text generation task but also for other relevant works in the field.

Comments:	29 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.12659 [cs.CL]
	(or arXiv:2210.12659v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.12659

Submission history

From: Hoang Thang Ta Mr. [view email]
[v1] Sun, 23 Oct 2022 08:34:33 UTC (2,153 KB)

Computer Science > Computation and Language

Title:Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators