negativas: a prototype for searching and classifying sentential negation in speech data

de Gois, Túlio Sousa; Cardoso, Paloma Batista

Abstract:Negation is a universal feature of natural languages. In Brazilian Portuguese, the most commonly used negation particle is não, which can scope over nouns or verbs. When it scopes over a verb, não can occur in three positions: pre-verbal (NEG1), double negation (NEG2), or post-verbal (NEG3), e.g., não gosto, não gosto não, gosto não ("I do not like it"). From a variationist perspective, these structures are different forms of expressing negation. Pragmatically, they serve distinct communicative functions, such as politeness and modal evaluation. Despite their grammatical acceptability, these forms differ in frequency. NEG1 dominates across Brazilian regions, while NEG2 and NEG3 appear more rarely, suggesting its use is contextually restricted. This low-frequency challenges research, often resulting in subjective, non-generalizable interpretations of verbal negation with não. To address this, we developed negativas, a tool for automatically identifying NEG1, NEG2, and NEG3 in transcribed data. The tool's development involved four stages: i) analyzing a dataset of 22 interviews from the Falares Sergipanos database, annotated by three linguists, ii) creating a code using natural language processing (NLP) techniques, iii) running the tool, iv) evaluating accuracy. Inter-annotator consistency, measured using Fleiss' Kappa, was moderate (0.57). The tool identified 3,338 instances of não, classifying 2,085 as NEG1, NEG2, or NEG3, achieving a 93% success rate. However, negativas has limitations. NEG1 accounted for 91.5% of identified structures, while NEG2 and NEG3 represented 7.2% and 1.2%, respectively. The tool struggled with NEG2, sometimes misclassifying instances as overlapping structures (NEG1/NEG2/NEG3).

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2504.04275 [cs.CL]
	(or arXiv:2504.04275v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.04275

Computer Science > Computation and Language

Title:negativas: a prototype for searching and classifying sentential negation in speech data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators