Legal Information Retrieval and Inquiry Tool
Legal Information Retrieval and Inquiry Tool
INQUIRY TOOL
* Note: a project based on data scraping and NLP models to answer questions
Abstract—Certainly, in the practice of law, the issue of case law language models. Which is interacted with using Streamlit – a
and legal documents mining and understanding is at the forefront Python module that offers an easy-to-use front end interface.
among priorities of both practicing lawyers and students. For this This study seeks to bring efficiency in the process of legal
reason, this research proposes an approach in the development
of a solution that employs the most current web scraping and research so as to improve the performance and education of the
natural language processing principles to address this problem. legal professionals and the learners. Concerning the system’s
By using Selenium, BeautifulSoup and Requests data from aim, the absence of artificial intelligence increases the time,
famous legal webs such as Indian Kanoon and Advocate Khoji are and effort needed to embark and pull out legal information in
collected and saved in the hard disk in CSV format systematically. an encapsulated manner, hence the aim of the system.
Subsequently, the users can choose specific datasets and ask
questions concerning the content. To assist the user in the A. Literature Review
understanding of upcoming events and to provide semantically
meaningful and contextually correct answers, the system connects 1) Artificial intelligence and law: An overview: The in-
API calls with two professional language models, which are tersection of artificial intelligence (AI) and law is a rapidly
Gemini and Lama 7B-Instruct. The entire work flow is being evolving field with profound implications for legal practice
implemented by the Streamlit module in python That’s how the
user interface is realized. The purpose of this tool is to cut down and administration. In ”Artificial Intelligence and Law: An
the time and work that it takes to work on legal research to the Overview,” Harry Surden explores how AI technologies, par-
barest minimum through increasing the efficiency of the tool and ticularly machine learning, are transforming the legal domain.
improving the experience of the lawyers and the law students. AI excels in automating tasks that require human-like intelli-
gence, such as legal research, document review, and predictive
I. I NTRODUCTION analytics (Surden, 2020).
In the field of legal research, crucial one is the opportunity AI tools in legal practice enhance efficiency by quickly
to fast and efficient search through hundreds of cases and analyzing vast amounts of case law and legal texts, aiding
other legal sources. According to the existing literature, it can in research and document drafting. Predictive analytics help
be noted that lawyers and law students for instance, spend a lawyers forecast litigation outcomes based on historical data.
lot of time searching for relevant information through large In judicial administration, AI provides data-driven insights
databases and Web sites. This process, however, may take to assist judges and improve decision-making consistency. In
quite some time and it is usually very involving hence time policing, AI is used for predictive policing to identify potential
consuming. criminal activities (Surden, 2020).
To overcome this hurdle, this work presents a system Despite its benefits, the integration of AI in law raises ethi-
that utilises contemporary web scraping and natural language cal and practical concerns, including data privacy, algorithmic
processing methodologies. Regarded as effective web scraping bias, and transparency. Addressing these challenges is crucial
tools, Selenium, BeautifulSoup, and Requests help our system to ensure that AI systems uphold legal and ethical standards,
extract information from popular legal websites like Indian paving the way for their successful adoption in the legal field
Kanoon and Advocate Khoji. Subsequently, these data forms (Surden, 2020).
are organized and saved in CSV files which makes it easier to 2) Cross-Domain Generalization and Knowledge Transfer
locate and analyze them. in Transformers Trained on Legal Data: The application of ar-
Users of this system can choose particular sets of data and tificial intelligence (AI) in the legal field has been extensively
ask questions concerning the data contained in the sets. In explored in recent years. Initially, AI systems in law relied
order to make the answers both meaningful and relevant, the heavily on knowledge representation techniques, but there
system uses API calls of the Gemini and Lama 7B-Instruct has been a shift towards machine learning approaches since
2000. This evolution parallels advancements in the broader ship between technological innovation and academic inquiry,
AI field. The use of AI has expanded across various legal shaping the future trajectory of AI research.
tasks, including document review and predictive coding. For Looking ahead, emerging research domains such as MoE,
instance, in litigation discovery, machine learning algorithms multimodality, and AGI are poised to redefine the generative
can analyze large volumes of documents to identify relevant AI landscape. These areas not only promise enhanced capa-
ones, thereby increasing efficiency and accuracy. bilities in language understanding and synthesis but also raise
Legal technology startups have played a significant role profound questions about the societal and economic impacts of
in this transformation, using machine learning to enhance AI technologies. The ongoing discourse surrounding projects
the efficiency and effectiveness of legal processes. University like Q* reflects a transformative phase in AI research, prompt-
research centers, such as Stanford’s CodeX, have also con- ing critical evaluation of existing paradigms and exploring new
tributed to the development and implementation of AI in legal avenues for innovation.
context. Despite these advancements, AI in law is not without In conclusion, the evolution of AI from its theoretical
limitations. Current AI systems excel in structured tasks but foundations to contemporary applications like Gemini and
struggle with abstract reasoning and open-ended tasks. potential projects like Q* exemplifies its dynamic evolution.
The integration of AI in legal practice is conceptualized into This literature review synthesizes key developments and future
three primary categories: administrators of law, practitioners of directions in generative AI, emphasizing the interdisciplinary
law, and those governed by law. Each group benefits differently nature of AI research and its profound implications for society.
from AI applications, whether through automating administra- 4) Google DeepMind’s gemini AI versus ChatGPT: A com-
tive tasks, assisting in legal practice, or aiding individuals and parative analysis in ophthalmology: Google’s Gemini AI
organizations in navigating legal systems. represents a significant leap in chatbot technology, showcasing
3) From google gemini to openai q*(q-star): A survey of advanced capabilities and innovative features. Central to Gem-
reshaping the generative artificial intelligence (ai) research ini’s design is its status as a “native multimodal” model, en-
landscape: Artificial Intelligence (AI) has undergone sig- abling it to process and learn from various data types, includ-
nificant evolution since its inception, rooted in foundational ing text, audio, and video. Gemini’s technical capabilities are
concepts such as Alan Turing’s ”Imitation Game” and early evident in its ability to analyze complex datasets such as charts
computational theories. The development of neural networks and images, which is a substantial advancement over earlier
and machine learning frameworks in the mid-20th century laid AI models like Bard. This capability is particularly relevant
the groundwork for today’s advanced AI models. Key mile- for applications in medicine and ophthalmology, where data
stones, including the rise of deep learning and reinforcement often comes in visual formats like medical images/scans. By
learning, have shaped contemporary AI trends, culminating analyzing these images, Gemini could potentially be a useful
in the development of sophisticated models like Mixture of tool for healthcare professionals in diagnosing and treating a
Experts (MoE) and multimodal AI systems. wide range of conditions.
Recent advancements in AI, exemplified by Google’s Gem- Moreover, Gemini’s potential in medicine extends beyond
ini and OpenAI’s ChatGPT, highlight the field’s dynamic image analysis. Its advanced language processing abilities
nature and its impact on both industry and academia. Gem- enable it to understand and interpret medical literature, pa-
ini, leveraging innovations like the ”spike-and-slab” attention tient histories, and research data, providing valuable insights
method, represents a significant leap in AI’s capability for for medical professionals. In ophthalmology, Gemini could
multi-turn conversational applications. Moreover, speculation assist in diagnosing eye conditions, analyzing patient-reported
surrounding projects like OpenAI’s Q* underscores ongoing symptoms, and suggesting treatment plans based on the latest
efforts to integrate advanced algorithms such as Q-learning and research and clinical guidelines. Previous attempts by models
A* into large language models (LLMs), potentially advancing like ChatGPT have shown promising results but have not yet
towards Artificial General Intelligence (AGI). reached levels suitable for clinical use. Large language models
The popularity and research focus within LLMs have shifted such as ChatGPT can make errors in understanding context or
towards multimodal capabilities and ethical considerations. provide outdated information, complicating their use in clinical
Ethical implications remain a persistent concern, reflecting the contexts.
community’s commitment to aligning AI advancements with Comparing AI responses, both Bard and ChatGPT provided
societal values. The dissemination of AI research has expanded thorough and practical advice for scenarios like waking up
rapidly, as seen in the proliferation of preprints on platforms with painful red eyes and flashes of light in one eye, correctly
like arXiv, facilitating swift knowledge exchange while posing recommending urgent medical evaluation for potential serious
challenges in terms of validation and academic scrutiny. conditions like retinal detachment. They also highlighted the
Technological advancements continue to drive research pri- importance of regular eye exams tailored to different age
orities in AI, with generative models like GPT and ChatGPT- groups and risk factors. However, when it came to floaters
3.5 influencing industry milestones. The advent of the ”Trans- or black dots in vision, ChatGPT demonstrated additional
former” model in 2017 marked a pivotal moment, catalyzing capability by discussing treatment options and suggesting
interest in deep learning and natural language processing urgent consultation in specific circumstances.
(NLP). These developments underscore a symbiotic relation- In testing image analysis capabilities, Gemini struggled with
processing specific medical images, while GPT-4 correctly complex. Language models can struggle with understanding
identified an image of a human eye but missed clinical details contextual ambiguities, irony, and sarcasm, which are preva-
like identifying a hyphema. This highlights ongoing challenges lent in human conversations. These challenges can lead to
and opportunities for improvement in AI models for precise biases in model outputs, impacting interpretability and trust
medical applications. in human-machine interactions.
Overall, Gemini AI represents a notable improvement in Recent research has focused on comparing the performance
text-based outputs over predecessor models. The comparative of prominent LLMs like ChatGPT 3.5, ChatGPT 4, Gemini
analysis between Gemini AI and other models reveals distinct AI, and LLaMA2 across diverse sentiment analysis tasks and
attributes and capabilities, with Gemini showing promise in linguistic contexts. These evaluations aim to uncover variations
areas like language understanding and multimodal data pro- or biases influenced by language choice, thereby enhancing the
cessing. These advancements suggest a dynamic and evolving generalizability and robustness of sentiment analysis models
landscape in AI language models, each with unique strengths in real-world applications. By testing these models on mul-
and weaknesses suited to different applications and use cases. tilingual datasets and culturally diverse scenarios, researchers
Continued research and development are crucial to enhancing seek to refine sentiment analysis tools for broader applicability
the reliability and applicability of AI chatbots in clinical and reliability across global markets and communities.
settings. Overall, the integration of LLMs in sentiment analysis rep-
5) Chatgpt vs gemini vs llama on multilingual sentiment resents a transformative step towards leveraging computational
analysis: In an era marked by the constant influx of digital tools to decode human emotions and opinions embedded in
information and communication, the ability to comprehend textual data. Future advancements will continue to address
and utilize human sentiments has become increasingly crucial challenges such as bias mitigation, cross-domain generaliza-
across diverse applications. Sentiment analysis, also known as tion, and adaptability to evolving languages, further enhancing
opinion mining or emotion AI, intersects Natural Language the effectiveness and ethical considerations of sentiment anal-
Processing (NLP), Machine Learning (ML), and computa- ysis technologies.
tional linguistics. It encompasses automated techniques to 6) Multilingual LAMA: Investigating knowledge in mul-
identify and analyze emotions, opinions, and attitudes ex- tilingual pretrained language models: Pretrained language
pressed in various data formats, primarily focusing on textual models (LMs) such as BERT (Devlin et al., 2019) and
data. The primary objective of sentiment analysis is to discern its multilingual variant, mBERT, have revolutionized natural
the sentiment or emotional tone conveyed in text—whether language processing (NLP) tasks by leveraging large-scale
positive, negative, or neutral. This process aims to unravel the pretraining on diverse text corpora. These models exhibit
complex web of emotions, opinions, and attitudes inherent in robust performance across various NLP benchmarks when
textual data, thereby revealing latent sentiments at individual, fine-tuned, demonstrating their capability to generalize well
group, or societal levels. to different tasks and languages (Peters et al., 2018; Howard
Beyond academic realms, sentiment analysis finds extensive and Ruder, 2018; Devlin et al., 2019).
application in industries such as marketing, customer feedback Recent research, motivated by the effectiveness of LMs
analysis, social media monitoring, and product reviews. These in generating text based on input templates without specific
applications leverage sentiment analysis to gain insights into fine-tuning, explores the extent of world knowledge encoded
public opinion, enabling informed decision-making based on in these models (Brown et al., 2020; Petroni et al., 2019).
the sentiments expressed in textual content. This widespread For instance, Petroni et al. (2019) demonstrated that LMs
adoption underscores the value of sentiment-aware technolo- can accurately complete fill-in-the-blank tasks like ”Paris is
gies in driving organizational strategies and enhancing user the capital of [MASK]” using knowledge gleaned from their
engagement. pretraining on large datasets like Wikipedia.
The advent of Large Language Models (LLMs), exemplified However, much of this research has historically focused on
by models like ChatGPT, GPT-4, Gemini AI, and LLaMA2, English, prompting a shift towards multilingual applications.
has significantly advanced sentiment analysis capabilities. Kassner et al. (2021) investigate mBERT’s capacity as a
These models, equipped with neural networks comprising multilingual knowledge base by evaluating its performance
millions to billions of parameters, excel in capturing intricate across 53 languages using the LAMA dataset. They translate
linguistic patterns and dependencies. They undergo extensive templates and entities into multiple languages, demonstrat-
pre-training on vast datasets, enabling them to understand ing that while mBERT generally performs well across some
and generate human-like language. LLMs leverage contextual languages, its effectiveness varies significantly depending on
understanding to interpret words and phrases based on their the language. This variability is attributed to language-specific
surrounding context within sentences or paragraphs, thereby biases inherent in the model, which influence its predictions
generating coherent and contextually appropriate responses. when queried in different languages (Kassner et al., 2021).
Despite their remarkable capabilities, evaluating LLMs out- The study also highlights the benefit of pooling predic-
side standard validation frameworks remains challenging. As- tions across languages to mitigate these biases and improve
sessing their performance in discerning nuanced and ambigu- overall performance. By leveraging its training on multiple
ous instances, particularly across diverse human languages, is Wikipedias, mBERT pooled predictions show enhancements
over monolingual approaches, even outperforming models like adapting these methods to NLP involves unique challenges
BERT in certain scenarios (Kassner et al., 2021). due to the structured nature of language. Future research
Overall, this research underscores the potential and chal- aims to refine these interpretability strategies, offering deeper
lenges of using multilingual pretrained LMs as versatile insights into how neural models achieve meaning composition
knowledge bases, advocating for further exploration into their in natural language understanding tasks.
language-independent capabilities and strategies to enhance 9) KAMEL: Knowledge Analysis with Multitoken Entities in
cross-linguistic performance. Language Models.: In recent years, there has been significant
7) On the Limitations of Large Language Models (LLMs): exploration into leveraging large language models (LMs) to
False Attribution: Recent research into authorship attribution store and retrieve relational knowledge akin to traditional
using Large Language Models (LLMs) has underscored signif- knowledge bases. The seminal work by Petroni et al. (2019)
icant challenges related to false attribution and hallucination. demonstrated that pre-trained LMs like BERT and RoBERTa
LLMs such as LLaMA-2-13B, Mixtral 8x7B, and Gemma- can be probed using cloze-style prompts to infer factual triples
7B offer automated annotation capabilities but often struggle from knowledge graphs, such as (Paris, capital, France). The
with accurately identifying the true author of a text, leading to evaluation primarily focused on the T-REx subset of the
ethical and legal implications. False attribution occurs when LAMA dataset, which contains 41 relations from Wikidata.
an LLM incorrectly attributes authorship to someone who did BERT-large achieved a Precision@1 of 32.3
not create the text, while hallucination refers to generating Following the original LAMA benchmark, subsequent re-
content that appears plausible but is not grounded in actual search introduced domain-specific probing datasets like Bio-
authorship. To address these concerns, novel metrics like the LAMA (Sung et al., 2021), MedLAMA (Meng et al., 2022),
Simple Hallucination Index (SHI) have been proposed to and KMIR (Gao et al., 2022). These datasets extend the eval-
quantify and evaluate these errors systematically. SHI mea- uation to diverse domains and highlight the growing interest
sures the proportion of chunks where LLMs provide incorrect in understanding the extent of relational knowledge embedded
attributions or fail to attribute correctly, providing insights within LMs. Techniques for prompt learning and fine-tuning
into their reliability in authorship tasks. Evaluations across have also been explored to enhance LM performance on these
multiple popular books have shown mixed results, with models tasks, showing improvements up to 48.6
like Mixtral 8x7B demonstrating high accuracy overall but However, recent studies have underscored challenges in LM
significant instances of hallucination, particularly in specific evaluation, suggesting that high performance on LAMA may
contexts such as works by certain authors like Smollett. often stem from memorization of training data rather than
Future research aims to refine LLMs’ capabilities in authorship true understanding of relational knowledge (Cao et al., 2021;
attribution by improving training methodologies and evalua- Zhong et al., 2021). To address these limitations and expand
tion frameworks, ultimately enhancing their trustworthiness in the scope of evaluation, the KAMEL dataset is introduced.
automated content generation for applications ranging from KAMEL incorporates 234 relations from Wikidata, offering a
literary analysis to legal documentation. broader variety of relations, including those with literals and
8) Visualizing and understanding neural models in NLP: multi-token entities. It aims to provide a more rigorous eval-
Recent advancements in natural language processing (NLP) uation framework, revealing that state-of-the-art LMs achieve
have seen neural models surpassing traditional feature-based lower F1 scores (17.7
classifiers across various tasks, yet their interpretability re- Overall, while LMs demonstrate promising capabilities in
mains a challenge. Unlike conventional models that use storing and retrieving factual knowledge, challenges such as
human-interpretable features such as parts-of-speech and syn- prompt design, dataset biases, and the distinction between
tactic parse features, neural models like LSTMs and Seq2Seq memorization and true understanding continue to shape re-
operate on low-dimensional word embeddings through com- search directions in this evolving field. The insights gained
plex multi-layer architectures. Understanding how these mod- from datasets like KAMEL contribute to a deeper under-
els handle semantic composition, including functions like standing of how LMs can effectively complement traditional
negation and intensification, is crucial but not straightforward knowledge bases in various NLP applications.
due to their opaque internal mechanisms. 10) Attacks on Third-Party APIs of Large Language Mod-
This paper explores several strategies to interpret meaning els: Recent advancements in Large Language Models (LLMs)
composition in neural models. Methods include representation such as GPT, Gemini, and Llama have shown significant
plotting and salience measurement using derivatives to assess promise across various sectors like finance, healthcare, and
the contribution of neural units to composition. Visualization marketing. These models excel in tasks such as summarization,
techniques reveal insights such as LSTM’s ability to focus question answering, data analysis, and content generation, en-
sharply on key words and the competitive performance of hancing operational efficiency and decision-making processes.
composition across multiple clauses. Moreover, neural models However, integrating LLMs into real-world applications poses
exhibit sharp dimensional locality, with specific dimensions in- challenges, including reliance on outdated or inaccurate data,
dicating negation and quantification, highlighting their ability difficulty in customization for specialized domains, and limi-
to capture semantic nuances. tations in broadening their applicability for complex reasoning
While inspired by visualizing techniques in computer vision, tasks.
To mitigate these challenges, integrating third-party APIs to prevent data scraping could pose a threat to the extraction
with LLMs has become crucial. This approach leverages process. The process comes with the need for constant review
real-time data access, complex calculations, and specialized since reliability and efficiency of the method in providing the
functionalities (like image recognition) to expand the capa- legal information that can be processed by the user is crucial.
bilities of LLMs without requiring extensive retraining. For 3) Ethical Issue about collection upon the sub-
instance, OpenAI’s GPT Store hosts millions of customized jects/Particiapants: It is possible to identify several ethical
ChatGPT variants, enhancing operational flexibility through issues regarding the collection of the information from the
plugin-based APIs. legal web resources such as Indian Kanoon and Advocate
However, integrating third-party APIs introduces security Khoji. Privacy issues emerge in the sense that people may not
risks by expanding the attack surface, potentially leading be aware that their legal data they post on the social media
to data breaches or manipulation of LLM outputs. Attacks sites is being collected in bulk. That is why transparency and
such as insertion-based, deletion-based, and substitution-based consent become an issue in question to decide on ethical data
attacks on API responses can compromise the integrity and re- collection and respect the users’ rights and their preferences
liability of LLM-generated outputs. Addressing these security concerning the use of this legal information. However, due to
concerns requires robust verification mechanisms and secure the possible relationship between researcher and subject, it is
API integration protocols. crucial to minimize any ethical issues concerning this kind of
extraction of data by being precise and sticking to the set rules
II. M ETHODODLOGY
of case handling in the legal context to ensure the accuracy
A. Modules/Algorithms/Functionalities/Protocols and professionalism of identified data.
Basically we 4 main modules to our project :- 1.Data
Scraping and information retrieval:- we use Selenium and bs4
for searching and scraping of data 2.Question and inquiry
tool:- we use gemini and 3rd party lama API’s to send and
retrieve content from the data. 3.User interface and GUI :-
we use Streamlit library to display the UI to showcase the
annualized data 4.Database module :- where the collected
scraped data is stored in csv file.
B. Data Collection Approaches/Strategies
We’ve developed a project focused on extracting data from
law websites like Indian Kanoon and Advocate Khoji. Using
Selenium, BeautifulSoup, and Requests, we automate the Fig. 1. accuracy comparision of different algorithms
retrieval of legal information, storing it efficiently in CSV
format. Users can select specific data points and pose inquiries, III. P ROPOSED S YSTEM
facilitated by API calls. Our approach harnesses Gemini and
Lama 7B-Instruct models to generate tailored responses to A. a) Proposed System Introduction
user questions. This streamlined tool, built with the Streamlit - The described system applies an innovative scheme of
module in Python, aims to assist lawyers and students in scraping data from the legal web-resources, such as Indian
swiftly accessing and exploring legal content Kanoon and Advocate Khoji, using Selenium for specific
1) a)Advantages of the strategy:-: The strength we have data search according to the user’s criteria. This approach
in our project is that it can crawl data from various law makes it possible to get real-time and accurate information
websites like Indian Kanoon and Advocate Khoji with the and therefore, increases the possibilities of adjusting the data
help of Selenium, BeautifulSoup, and Requests. This way it is gathering process.
easier and fast to provide relevant data as the system retrieves - To further increase efficiency and decrease the expendi-
the information depending on the specific user request. The tures, system does not use APIs and instead, uses web scraping
Gemini and Lama 7B-Instruct models are applied to generate approaches. Hence, the orchestrated interaction with Selenium
responses based on the API calls of the proposed solution, and BeautifulSoup offers enhanced adaptability in scraping
which personalize answers to the questions asked by the users. legal data from the source websites themselves.
This approach does not only improve the browsing of the legal - After data extraction, the obtained legal data is then stored
content, but also made the dependency on high costing data in a CSV format consistently. Then to the data within CSV
scraping APIs minimized, which result to be affordable to file encoding and preprocessing operations are run on the data.
lawyers as well as to the students. . - In the last step, the system integrates API calls to Gemini
2) b) Limitation of strategy: A weakness of our project is and Lama in order to form the replies to the questions posed
that this project has been scraping data from the websites like by the user according to the identified legal materials. Such
Indian Kanoon and Advocate Khoji. Thus, any modification in incorporation of superior natural language processing models
the structure of these websites or the application of measures guarantees fast data analysis and generation of timely content
to the users thus increasing the reach and comprehension of websites. Besides that, this not only reduces the difficulties
the legal information searched. towards it but also helps in saving a lot of money. This way is
free from official API restrictions and allows obtaining more
legal data for further analysis or for implementation in various
applications, which in turn makes the workflow more adaptive
and efficient.
3) 3) Query and inquiry Module: The data collected from
previous step is used as base knowledge for the model to work
on. The user selects the data to further question among the
given options. Then the user can choose the model to use for
inquires and questioning.
the whole data along with the question is sent as an API
request. and after the response is displayed to the user.
the question and response are stored on display using
session state from streamlit module
4) 4) Database module: The CSV files serve as the storage
medium for the information collected. Initially, the database,
created from data scraping, contains details such as title of the
case,the author and the data regarding the case.
IV. M ODEL C OMPARISON : G EMINI VS L AMA
Fig. 3. proposed system diagram
In the ever changing world of AI, models like LLaMA
and Gemini are pioneering developments in conversation AI.
B. b) List of Modules Facebook AI Research (FAIR) has been showing its experience
1) User Interface and GUI Module 2) Data Extraction on task oriented dialogue systems that have the potential to
Module 3) Query and inquiry Module 4) Database Module maintain multi-turn interactions through this year in different
1) 1) User Interface and GUI Module:-: Streamlit is the venues using language signals. The other team tracked by
specific GUI used in our project to incorporate the framework such techniques is, LLaMA of FAIR implemented with some
and enable advanced functionality for the users in terms of codec and fine-tune architectures which overcome BERT”
data manipulation. Being designed with the aspect of user- i.wikipedia It uses sources of external knowledge like database
friendliness in mind, it guarantees simple operations for all and APIs for answering to provide accurate information from
users. various fields. Gemini is a multi-turn dialogue understanding
There are two basic interventions in the case of legal data, and generation system developed by Google using dual en-
the search query through which the data is extracted and the coder architecture with memory augmented mechanism. This
questions posed to the data. In the case of legal data extraction, them to store information across conversations so that they can
the combination of collecting user inputs and system response give relevant responses over multiple turns.
guarantees the rather elaborate and optimal method of data The analysis of LLaMA and Gemini shows how the two
handling. models benefit each other in the improvement of the coherence
Consequently, Streamlit improves and elevates our project’s and productivity of the conversational flow. Thus, due to the
user interface toward fulfilling conventional standards and high efficiency of the outlined functions of LLaMA in task-
serve the purpose of incorporating user inputs, handling the oriented interactions and integration of knowledge sources, it
queries, and displaying the results. This professional and can be recommended for use in tasks that require accurate
comprehensive tool enables efficient qualitative analysis and and informative responses. But Gemini’s architecture allows
visualization of legal data for the better understanding of the for dealing with intricate conversations, as well as detailed
various concerned legal professionals and students. dialogue state space management as is appropriate for en-
2) 2) Data Extraction Module:: The procedure strategies gaging with users for a longer period of time. These clearly
start with the examination of the string for hashtags and then established strengths enable knowledgeable deployment of
isolating them. Thus, an automated Edge driver with the help these models for any sort of conversational AI requirements
of the Selenium library is created to run a web scraping script. steadfastly corresponding to requisite user attraction and in-
This arrangement also helps to conduct specific searches for teraction.
the legal information through sites such as Indian Kanoon and
Advocate Khoji. Using the web-scraping tool BeautifulSoup, A. Performance Metrics and Benchmarks:
the HTML of the web pages is analyzed and the legal text Thus, evaluating the work of LLaMA 3 and Gemini by
along with the metadata is systematically saved in a CSV. focusing on the metrics and standards, several differences
This data extraction method turns out to be beneficial since can be far observed. As seen in Table 3, LLaMA 3 has
it does not depend on the reliance on the official APIs of law high accuracy and is faster than Gemini for all the evaluated
metrics. This efficiency gives this factor a cutting edge in that
its precision in delivering high value outputs is not matched
by any other factor.
On the other hand, in as much as the computational resource
requirements are concerned Gemini presents a pretentious
optimized resource consumption ratio. Despite LLaMA 3
being more accurate in response, Gemini has better resource
management for operations that need less computational power
but need high performance quality.
B. Application Scenarios:
Thus, there are situations where LLaMA 3 and Gemini are
appropriate to use for specific strengths and weaknesses of
Fig. 4. home page for data extraction
each model have been discovered.
C. Strengths:
LLaMA 3: Outperforms in all jobs that include complicated
natural language processing and generation skills. Due to its
small part count and MoE architecture, it has high associative
performance but it doesn’t take a lot of memory in the mobile
device for deployment.
Gemini: Remarkable in terms of performing multiple modal
data processing in effective manner on the various types of
data. More advanced vision capabilities make the Gemini
strong in tasks that require image analysis and merging them
with text data that is very useful in the multi-modal develop-
ment of AI.
D. Limitations:
LLaMA 3: May have difficulties in those situations where
data needs to be processed in real time because it is oriented
to achieving maximum accuracy.
Gemini: Facilitates problematic issues in relatively specific
language processes that require rich contextual analysi
E. Optimal Use Cases for Each Model:
As seen, LLaMA 3 achieves the best use where it comes
across accurate business sectors in fluency models, document Fig. 5. Extracted links and data
evaluation, and linguistic performance by legal qualified per-
sonnel or researchers.
Gemini is valuable for use in situations requiring the fast
analysis of information, for example, assessing trends in
opinions on social networks or evaluating customers’ attitudes
towards certain products.
V. 6) R ESULTS AND O UTPUTS
VI. C ONCLUSION
Thus, this study outlines an efficient approach to web
scraping and analyzing legal content from websites like Indian
Kanoon and Advocate Khoji. Selenium for dynamic data, and
BeautifulSoup for structured allow for an accurate extraction
of legal texts and metadata. The current incorporation of highly
refined natural language processing models supplemented by
API calls to Gemini and Lama also improves the system’s
capacity of producing customised insights and output from
the accumulated mass of legal material. \
This approach makes not only sense in terms of simpli-
fying the work with legal data extraction but also provided
legal professionals and students the advanced instruments for
information categorization and analysis. Through the models
such as Roberta and BERT, the system offers deeper analysis
and accurate recommendations for the legal sphere. All these
advancements will help improve the access, speed and depth
when it comes to legal information search and analysis of cases
and policy performance, which will set the stage for the future
development of further innovations in legal data management