#11433. Unsupervised approaches for measuring textual similarity between legal court case reports
August 2026 | publication date |
Proposal available till | 19-05-2025 |
4 total number of authors per manuscript | 0 $ |
The title of the journal is available only for the authors who have already paid for |
|
|
Journal’s subject area: |
Law;
Artificial Intelligence; |
Places in the authors’ list:
1 place - free (for sale)
2 place - free (for sale)
3 place - free (for sale)
4 place - free (for sale)
Abstract:
In the domain of legal information retrieval, an important challenge is to compute similarity between two legal documents. Precedents (statements from prior cases) play an important role in The Common Law system, where lawyers need to frequently refer to relevant prior cases. Till date, only a few text-based and predominant embedding based methods have been employed, for instance, TF-IDF based approaches, Word2Vec (Mikolov et al. 20XX) and Doc2Vec (Le and Mikolov 20XX) based approaches. We investigate the performance of 56 different methodologies for computing textual similarity across court case statements when applied on a dataset of Supreme Court Cases. Among the 56 different methods, thirty are adaptations of existing methods and twenty-six are our proposed methods. The methods studied include models such as BERT (Devlin et al. 20XX) and Law2Vec (Ilias 20XX). It is observed that the more traditional methods (such as the TF-IDF and LDA) that rely on a bag-of-words representation performs better than the more advanced context-aware methods (like BERT and Law2Vec) for computing document-level similarity.
Keywords:
BERT; Court case reports; Court case similarity; Doc2vec; Law2vec; Legal information retrieval; Topic modeling; Word2vec
Contacts :