Information Extraction based on Named Entity for Tourism Corpus

Chantrapornchai, Chantana; Tunsakul, Aphisit

doi:10.1109/JCSSE.2019.8864166

Computer Science > Computation and Language

arXiv:2001.01588 (cs)

[Submitted on 3 Jan 2020]

Title:Information Extraction based on Named Entity for Tourism Corpus

Authors:Chantana Chantrapornchai, Aphisit Tunsakul

View PDF

Abstract:Tourism information is scattered around nowadays. To search for the information, it is usually time consuming to browse through the results from search engine, select and view the details of each accommodation. In this paper, we present a methodology to extract particular information from full text returned from the search engine to facilitate the users. Then, the users can specifically look to the desired relevant information. The approach can be used for the same task in other domains. The main steps are 1) building training data and 2) building recognition model. First, the tourism data is gathered and the vocabularies are built. The raw corpus is used to train for creating vocabulary embedding. Also, it is used for creating annotated data. The process of creating named entity annotation is presented. Then, the recognition model of a given entity type can be built. From the experiments, given hotel description, the model can extract the desired entity,i.e, name, location, facility. The extracted data can further be stored as a structured information, e.g., in the ontology format, for future querying and inference. The model for automatic named entity identification, based on machine learning, yields the error ranging 8%-25%.

Comments:	6 pages, 9 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
MSC classes:	I.2, I.7
ACM classes:	I.2; I.7
Cite as:	arXiv:2001.01588 [cs.CL]
	(or arXiv:2001.01588v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2001.01588
Journal reference:	16th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2019, pp. 187-192
Related DOI:	https://doi.org/10.1109/JCSSE.2019.8864166

Submission history

From: Chantana Chantrapornchai [view email]
[v1] Fri, 3 Jan 2020 17:16:28 UTC (751 KB)

Computer Science > Computation and Language

Title:Information Extraction based on Named Entity for Tourism Corpus

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Information Extraction based on Named Entity for Tourism Corpus

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators