Context tree selection and linguistic rhythm retrieval from written texts

Galves, Antonio; Galves, Charlotte; Garcia, Nancy L.; Leonardi, Florencia

Statistics > Machine Learning

arXiv:0902.3619v2 (stat)

[Submitted on 20 Feb 2009 (v1), revised 13 Mar 2009 (this version, v2), latest version 19 Mar 2012 (v4)]

Title:Context tree selection and linguistic rhythm retrieval from written texts

Authors:Antonio Galves, Charlotte Galves, Nancy L. Garcia, Florencia Leonardi

View PDF

Abstract: We introduce a new criterion to select in a consistent way the probabilistic context tree generating a sample. The basic idea is to construct a totally ordered set of candidate trees. This set is composed by the ``champion trees'', the ones that maximize the likelihood of the sample for each number of degrees of freedom. The smallest maximizer criterion selects the infimum of the subset of champion trees whose gain in likelihood is negligible. In addition, we propose a new algorithm based on resampling to implement this criterion. This study was motivated by the linguistic challenge of retrieving rhythmic features from written texts. Applied to a data set consisting of texts extracted from daily newspapers, our algorithm identifies different context trees for European Portuguese and Brazilian Portuguese. This is compatible with the long standing conjecture that European Portuguese and Brazilian Portuguese belong to different rhythmic classes. Moreover, these context trees have several interesting properties which are linguistically meaningful.

Comments:	28 pages, 7 figures
Subjects:	Machine Learning (stat.ML); Applications (stat.AP)
Cite as:	arXiv:0902.3619 [stat.ML]
	(or arXiv:0902.3619v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.0902.3619

Submission history

From: Florencia Leonardi [view email]
[v1] Fri, 20 Feb 2009 16:45:58 UTC (336 KB)
[v2] Fri, 13 Mar 2009 15:57:51 UTC (336 KB)
[v3] Wed, 26 Oct 2011 13:05:08 UTC (412 KB)
[v4] Mon, 19 Mar 2012 09:08:43 UTC (133 KB)

Statistics > Machine Learning

Title:Context tree selection and linguistic rhythm retrieval from written texts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Context tree selection and linguistic rhythm retrieval from written texts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators