Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

Moreo, Alejandro; Esuli, Andrea; Sebastiani, Fabrizio

Computer Science > Computation and Language

arXiv:1810.09311 (cs)

[Submitted on 19 Oct 2018]

Title:Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

Authors:Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani

View PDF

Abstract:This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn and the SciPy stack. We here report on new experiments that we have carried out in order to test PyDCI, and in which we use as baselines new high-performing methods that have appeared after DCI was originally proposed. These experiments show that, thanks to a few subtle ways in which we have improved DCI, PyDCI outperforms both JaDCI and the above-mentioned high-performing methods, and delivers the best known results on the two popular benchmarks on which we had tested DCI, i.e., MultiDomainSentiment (a.k.a. MDS -- for cross-domain adaptation) and Webis-CLS-10 (for cross-lingual adaptation). PyDCI, together with the code allowing to replicate our experiments, is available at this https URL .

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1810.09311 [cs.CL]
	(or arXiv:1810.09311v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1810.09311

Submission history

From: Alejandro Moreo Fernández [view email]
[v1] Fri, 19 Oct 2018 07:27:24 UTC (172 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-10

Change to browse by:

cs
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alejandro Moreo
Andrea Esuli
Fabrizio Sebastiani

export BibTeX citation

Computer Science > Computation and Language

Title:Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators