MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments

Datta, Debtanu; Soni, Shubham; Mukherjee, Rajdeep; Ghosh, Saptarshi

Computer Science > Computation and Language

arXiv:2310.18600 (cs)

[Submitted on 28 Oct 2023]

Title:MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments

Authors:Debtanu Datta, Shubham Soni, Rajdeep Mukherjee, Saptarshi Ghosh

View PDF

Abstract:Automatic summarization of legal case judgments is a practically important problem that has attracted substantial research efforts in many countries. In the context of the Indian judiciary, there is an additional complexity -- Indian legal case judgments are mostly written in complex English, but a significant portion of India's population lacks command of the English language. Hence, it is crucial to summarize the legal documents in Indian languages to ensure equitable access to justice. While prior research primarily focuses on summarizing legal case judgments in their source languages, this study presents a pioneering effort toward cross-lingual summarization of English legal documents into Hindi, the most frequently spoken Indian language. We construct the first high-quality legal corpus comprising of 3,122 case judgments from prominent Indian courts in English, along with their summaries in both English and Hindi, drafted by legal practitioners. We benchmark the performance of several diverse summarization approaches on our corpus and demonstrate the need for further research in cross-lingual summarization in the legal domain.

Comments:	Accepted at EMNLP 2023 (Main Conference)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2310.18600 [cs.CL]
	(or arXiv:2310.18600v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.18600

Submission history

From: Debtanu Datta [view email]
[v1] Sat, 28 Oct 2023 05:51:57 UTC (168 KB)

Computer Science > Computation and Language

Title:MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators