A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Prasad, Nishchal; Boughanem, Mohand; Dkaki, Taoufik

Computer Science > Information Retrieval

arXiv:2309.10563 (cs)

[Submitted on 19 Sep 2023 (v1), last revised 27 Jun 2024 (this version, v3)]

Title:A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Authors:Nishchal Prasad, Mohand Boughanem, Taoufik Dkaki

View PDF HTML (experimental)

Abstract:Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep-learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance and adaptability with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor; based on the input-occlusion sensitivity of the model, to explain the predictions with the most relevant sentences from the document. We explore these methods and test their effectiveness with extensive experiments and ablation studies on legal documents from India, the European Union, and the United States with the ILDC dataset and a subset of the LexGLUE dataset. MESc achieves a minimum total performance gain of approximately 2 points over previous state-of-the-art proposed methods, while ORSE applied on MESc achieves a total average gain of 50% over the baseline explainability scores.

Comments:	Published as non archival paper in the The 3rd International Workshop on Mining and Learning in the Legal Domain (MLLD-2023) at CIKM 2023, Birmingham, United Kingdom. (this https URL)
Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2309.10563 [cs.IR]
	(or arXiv:2309.10563v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2309.10563

Submission history

From: Nishchal Prasad [view email]
[v1] Tue, 19 Sep 2023 12:18:28 UTC (193 KB)
[v2] Mon, 25 Sep 2023 15:10:37 UTC (193 KB)
[v3] Thu, 27 Jun 2024 22:40:45 UTC (193 KB)

Computer Science > Information Retrieval

Title:A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators