bstract
Due to huge amount of legal information availability on the internet, as well as other
sources, it is important for the research community to do more extensive research on
the area of legal text processing, which can help us make sense out of the vast amount
of available data. This information growth has compelled the requirement to develop
systems that can help legal professionals as well as ordinary citizens get relevant legal
information with very little effort. In this survey paper, different text summarization
techniques are surveyed, with a specific focus on legal document summarization, as
this is one of the most important areas in the legal field, which can help with the quick
understanding of legal documents. This paper starts with the general introduction to text
summarization, following which various legal text summarization techniques are
discussed. Various available tools are also described in this paper which is used for
summarization of legal text. Two case studies are also presented in this work, where the
automatic summarization of heterogeneous legal documents from two countries is
considered. With the presented detailed review of the state of the art approaches,
comparative analysis from the case studies and also discussions on several important
research questions, this work is expected to provide a good starting point for
researchers to perform a more in-depth exploration of the area of legal document
summarization, more specifically with respect to the key future research directions
identified in this work.
Introduction
This age of data deluge has resulted into the growth of online information very rapidly
each day. This sort of online information growth is also seen in law field in the form of
legal documents [1]. A document is legal if the intention behind creating is enforcement
in the court of law. A legal document is also called as ‘written performatives’ by
Austin [2]. These documents include constitutions, contracts, deed,
orders/judgements/decrees, pleadings, statutes, wills. These documents are quite
elaborative in terms of structure from a general document and are very long to read and
understand [3]. It would be better if shorter versions are available for these long
documents in the form of summaries. Summaries are the shorter versions of long
documents which includes all relevant information.
The main motivation behind carrying out the extensive literature review of legal
document summarization, is that across the world, legal information is produced in large
amount by the numerous legal institutions. In India itself, there are 25 High Courts [4]
and 672 District Courts [5] which publish the legal reports publicly. This is of supreme
important because several cases are pending in Indian courts [as of 2019, 87.5 percent,
District and Subordinate courts] [6]. Since legal notes are long documents, so, legal
institutions engage legal experts to produce headnotes which is known as summary.
But, this is remarkably time-consuming task as it requires extensive human
participation. Thus, automatic summarization of legal documents can significantly help
legal practitioners, thereby also reducing human efforts significantly [3]. With the use of
automatic text summarization techniques, legal document headnotes (summaries) can
be generated.
One of the ways to get such automatic summaries is to use automatic text
summarization techniques, which can produce summaries without losing the relevant
information of the document under consideration [7]. Such kind of automatic
summarization technique has very high utility in the field of law, which has led to the
introduction of Automatic Legal Document Summarization Domain—a sub-domain of
text summarization in general [8].
Currently, the generation of legal document summary is a process where considerable
amount of human effort is involved. This process is labour-intensive, time-taking and
expensive. For example, legal professionals like lawyers and judges need to send the
cases to legal experts for creating summaries for them. Apart from this, these legal
professionals need to refer to previous similar cases in order to prepare their own
defences as well as provide verdicts [8]. The novice readers of legal documents also
want to get an idea on a current case as well as previous related cases, without having
to go through a huge number of complex legal documents [9]. Now-a-days, legal
documents are often very easily available through online sources [10], [11], [12], so that
ordinary citizens can also access them. Automatic summarization tools are also very
helpful for ordinary citizens because using such a system, summaries of any case can
easily be accessed. This also leads to a very high degree of transparency [13], since
such kinds of tools help get rid of a lot of hard to understand legal jargon. Thus
automatic summarization tools can prove to be of very high utility, in the field of law,
thereby facilitating fast processing of legal cases by legal professionals, quick
understanding of past cases by all the stakeholders, as well as a very high level of
transparency.
The domain of automatic legal document summarization differs from text summarization
in general, because these documents are often presented in many different structures,
depending upon the country of origin for the case, and also the heavy usage of
information carrying citations make the task of summarization even more challenging in
this domain [3]. For example, consider Figs. 1(a) and 1(b) below, which shows a
general structure of legal document from United States (US) and from India
respectively. The two documents are very different in terms of their structure, which can
introduce significant difficulties in developing a general legal document summarization
tool.
Due to the peculiarities of legal documents, some key research questions arise in the
field of legal document summarization, which are given below:
RQ1: The legal documents from different countries vary vastly, in terms of a number of
factors like document structures, lengths, etc.. How do they affect the quality of
automatic summarization?
RQ2: What are the metrics available to check the quality of summary? Is the evaluation
metric efficient enough to always give good results?
RQ3: How can the quality of legal summarization be improved by performing other
upstream Natural Language Processing (NLP) tasks?
RQ4: How to achieve better structuring of legal document summaries?
RQ5: Why has there been a lot of work for extractive legal summarization, while less or
no work for abstractive legal summarization?
In this work, a detailed survey is conducted so that it enhances the understanding of the
reader about legal document summarization, as well the reader is able to find answers
to some of the most important research questions in this domain. The main contribution
of this survey work can be summarized in the following points:
In order to understand the current state of the automatic legal document summarization
domain, an extensive literature survey is performed.
Several important research questions have been identified which point towards the
need of doing research in specific areas of legal document summarization.
A comparative analysis of several country specific legal documents is performed for the
task of text summarization.
After performing comparative analysis, several key observations are drawn that help
understand the current state of the techniques for summarization. Also, multiple
limitations have been identified which motivate specific potential future research
directions in this domain.
The paper is divided into 8 sections. Section 1 starts with the general introduction of
summary, then it tells the importance of summarization in general and in legal field.
Section 2 comprises of a discussion of text summarization in general, where some of
the state of the art works in the area of extractive and abstractive text summarization
are discussed. Evaluation metrics are discussed in Section 3. Section 4 discusses
various domain independent and domain specific legal document summarization
techniques. Then, in Section 5, some of the available legal document summarization
tools are discussed. Two legal document summarization case studies are presented in
Section 6, considering legal documents from US and India. The case studies are
enriched with a detailed comparative analysis of several summarization techniques in
the domain. Following which in Section 7, the findings of the literature survey are used
to address the research questions identified in the introduction section. Moreover, the
limitations of the current work in the domain are identified in the discussion section, with
the help of which future research directions are proposed. Finally the paper is
concluded in Section 8 with a summarization of the findings of the literature survey
work.