Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views14 pages

Ojo Seminar Paper On Gemini

This paper discuss everything about Gemini artificial intelligence

Uploaded by

talentjay9964
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Ojo Seminar Paper On Gemini

This paper discuss everything about Gemini artificial intelligence

Uploaded by

talentjay9964
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Gemini (Google Deepmind) An advanvce AI Model Designed for Multimodal

Reasoning and Assistance

By:

Bello Abdulsalam 2021/sc/00538

Mathew Jude 2021/sc/00554

Umaru Muhammed Ndaman 2021/sc/00426

Being:

A “Seminar in Computer Application (CSC 322)” paper presented

In Department of Computer Science,

School of Secondary Education- Science Programmes,

Federal college of Education

Niger state, Nigeria.

July, 2025

1
INTRODUCTION

Artificial intelligence (AI) is the capability of computational systems to perform tasks


typically associated with human intelligence, such as learning, reasoning, problem-
solving, perception, and decision-making. It is a field of research in computer science
that develops and studies methods and software that enable machines to perceive their
environment and use learning and intelligence to take actions that maximize their
chances of achieving defined goals (Russell, 2021).

High-profile applications of AI include advanced web search engines (e.g.,


Google Search); recommendation systems (used by YouTube, Amazon, and Netflix);
virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g.,
Waymo); generative and creative tools (e.g., language models and AI art); and
superhuman play and analysis in strategy games (e.g., chess and Go). However, many
AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into
general applications, often without being called AI because once something becomes
useful enough and common enough it's not labeled AI anymore (Kaplan et al., 2019)."

An advance artificial intelligence model was developed by Google which is


Gemini. Gemini, formerly known as Bard, is a generative artificial intelligence chatbot
developed by Google. Based on the large language model (LLM) of the same name, it
was launched in 2023 in response to the rise of OpenAI's ChatGPT. It was previously
based on the LaMDA and PaLM LLMs (Olson, 2024).

HOW GEMINI WORKS

Pre-training

2
Gemini is powered by Google’s most capable AI models, designed with varying
capabilities and use cases. Like most LLMs today, these models are pre-trained on a
variety of data from publicly available sources.

This pre-training allows the model to learn to pick up on patterns in language


and use them to predict the next probable word or words in a sequence. For example,
as an LLM learns, it can predict that the next word in “peanut butter” and is more
likely to be “jelly” than “shoelace.” However, if an LLM picks only the most probable
next word, it will lead to less creative responses. So LLMs are often given flexibility to
pick from reasonable, albeit slightly less probable, choices (say, “banana”) in order to
generate more interesting responses.

Post-training

After the initial training, LLMs go through additional steps to refine their responses.
One of these is called Supervised Fine-Tuning (SFT), which trains the model on
carefully selected examples of excellent answers. It's like teaching children to write by
showing them well-written stories and essays (Hollister 2023).

Next comes Reinforcement Learning from Human Feedback (RLHF). Here, the
model learns to generate even better responses based on scores or feedback from a
special Reward Model. This Reward Model is trained on human preference data, where
responses have been rated relative to one another, teaching it what people prefer.
Preference data may sometimes include and expose models to offensive or incorrect
data so that they learn how to recognize it and avoid it. You can think of preference
data like rewarding a child for a job well done; the model is rewarded for creating
answers that people like (Hollister 2023).

Responses to user prompts

3
Response generation is similar to how a human might brainstorm different approaches
to answering a question. Once a user provides a prompt, Gemini uses the post-trained
LLM, the context in the prompt and the interaction with the user to draft several
versions of a response. It also relies on external sources such as Google Search, and/or
one of its several extensions, and recently uploaded files (Gemini Advanced only) to
generate its responses. This process is known as retrieval augmentation. Given a
prompt, Gemini strives to retrieve the most pertinent information from these external
sources (e.g., Google Search) and represent them accurately in its response.
Augmenting LLMs with external tools is an active area of research. There are a
number of ways errors can be introduced, including the query Gemini uses to invoke
these external tools, how Gemini interprets the results returned by the tools, and the
manner in which these returned results are used to generate the final response. Due to
this, responses generated by Gemini should not reflect on the performance of the
individual tools used to create that response (De Vynck et al., 2024).

Lastly, before the final response is displayed, each potential response undergoes
a safety check to ensure it adheres to predetermined policy guidelines. This process
provides a double-check to filter out harmful or offensive information. The remaining
responses are then ranked based on their quality, with the highest-scoring version(s)
presented back to the user (De Vynck et al., 2024).

Human feedback and evaluation

Even with safety checks, some errors may occur. And Gemini responses may not
always fully meet your expectations. That's where human feedback comes in.
Evaluators assess the quality of responses, identifying areas for improvement and
suggesting solutions. This feedback becomes part of the Gemini learning process,
described in the “Post-training” section above (Stern, 2024).

4
Every technology shift is an opportunity to advance scientific discovery,
accelerate human progress, and improve lives. I believe the transition we are seeing
right now with AI will be the most profound in our lifetimes, far bigger than the shift to
mobile or to the web before it. AI has the potential to create opportunities from the
everyday to the extraordinary for people everywhere (Stern, 2024). It will bring new
waves of innovation and economic progress and drive knowledge, learning, creativity
and productivity on a scale we haven’t seen before.

FEATURES AND CAPABILITIES

Until now, the standard approach to creating multimodal models involved training
separate components for different modalities and then stitching them together to
roughly mimic some of this functionality. These models can sometimes be good at
performing certain tasks, like describing images, but struggle with more conceptual
and complex reasoning.

Gemini was designed to be natively multimodal, pre-trained from the start on


different modalities. Then fine-tuned it with additional multimodal data to further
refine its effectiveness. This helps Gemini seamlessly understand and reason about all
kinds of inputs from the ground up, far better than existing multimodal models and its
capabilities are state of the art in nearly every domain. Gemini’s capabilities and how
it works is as follow:

Sophisticated reasoning

Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of
complex written and visual information. This makes it uniquely skilled at uncovering
knowledge that can be difficult to discern amid vast amounts of data (Langley, 2024).

5
Its remarkable ability to extract insights from hundreds of thousands of
documents through reading, filtering and understanding information will help deliver
new breakthroughs at digital speeds in many fields from science to finance (Grant,
2024).

Understanding text, images, audio and more

Gemini 1.0 was trained to recognize and understand text, images, audio and more at
the same time, so it better understands nuanced information and can answer questions
relating to complicated topics. This makes it especially good at explaining reasoning in
complex subjects like math and physics (Cerullo, 2024)..

Advanced coding

first version of Gemini can understand, explain and generate high-quality code in the
world’s most popular programming languages, like Python, Java, C++, and Go. Its
ability to work across languages and reason about complex information makes it one of
the leading foundation models for coding in the world (Wittenstein, 2024).

Gemini Ultra excels in several coding benchmarks, including HumanEval, an


important industry-standard for evaluating performance on coding tasks, and
Natural2Code, internal held-out dataset, which uses author-generated sources instead
of web-based information (Wittenstein, 2024).

Gemini can also be used as the engine for more advanced coding systems. Two
years ago presented AlphaCode, the first AI code generation system to reach a
competitive level of performance in programming competitions (Wittenstein, 2024).

Using a specialized version of Gemini, created a more advanced code


generation system, AlphaCode 2, which excels at solving competitive programming

6
problems that go beyond coding to involve complex math and theoretical computer
science (Wittenstein, 2024).

More reliable, scalable and efficient

Trained Gemini 1.0 at scale on AI-optimized infrastructure using Google’s in-house


designed Tensor Processing Units (TPUs) v4 and v5e. And designed it to be most
reliable and scalable model to train, and most efficient to serve (Petri, 2024).

On TPUs, Gemini runs significantly faster than earlier, smaller and less-capable
models. These custom-designed AI accelerators have been at the heart of Google's AI-
powered products that serve billions of users like Search, YouTube, Gmail, Google
Maps, Google Play and Android. They’ve also enabled companies around the world to
train large-scale AI models cost-efficiently (Petri, 2024).

Today, announcing the most powerful, efficient and scalable TPU system to
date, Cloud TPU v5p, designed for training cutting-edge AI models. This next
generation TPU will accelerate Gemini’s development and help developers and
enterprise customers train large-scale generative AI models faster, allowing new
products and capabilities to reach customers sooner (Petri, 2024).

BENEFITS OF GEMINI

Productivity

For starters, Gemini can save you time. For example, say you are looking to summarize
a long research document; Gemini lets you upload it and gives you a useful synthesis.
Gemini can also help with coding tasks, and coding has quickly become one of its most
popular applications (Weprin 2025).

7
Creativity

Gemini can also help bring your ideas to life and spark your creativity. For example, if
you’re writing a blog post, Gemini can create an outline and generate images that help
illustrate your post. And coming soon with Gems, you will be able to customize
Gemini with specific instructions and have it act as a subject matter expert to help you
accomplish your personal goals (Banet, 2025).

Curiosity

Gemini can be a jumping off point for exploring your ideas and things you’d like to
learn more about. For instance, it can explain a complex concept simply or surface
relevant insights on a topic or image. And soon, it will pair these insights with
recommended content from across the web to learn more about specific topics
(Titcomb, 2024).

Gemini's capabilities are rapidly expanding -- soon, you’ll be able to point your
phone’s camera at an object, say, for example, the Golden Gate bridge and ask Gemini
to tell you about its paint color (if you’re wondering, it’s “International Orange”).
You’ll also be able to ask Gemini to help you navigate a restaurant’s menu in another
language and recommend a dish you’re likely to enjoy. These are just two examples of
the new capabilities coming soon to Gemini (Titcomb, 2024).

LIMITATION OF GEMINI

Gemini is just one part of our continuing effort to develop LLMs responsibly.
Throughout the course of this work, we have discovered and discussed several
limitations associated with LLMs. Here, are focus on six areas of continuing research:

8
Accuracy: Gemini’s responses might be inaccurate, especially when it’s asked about
complex or factual topics (Hall, 2025).

Gemini is grounded in Google’s understanding of authoritative information, and


is trained to generate responses that are relevant to the context of your prompt and in
line with what you’re looking for. But like all LLMs, Gemini can sometimes
confidently and convincingly generate responses that contain inaccurate or misleading
information (Hall, 2025).

Since LLMs work by predicting the next word or sequences of words, they are
not yet fully capable of distinguishing between accurate and inaccurate information on
their own. Have seen Gemini present responses that contain or even invent inaccurate
information (e.g., misrepresenting how it was trained or suggesting the name of a book
that doesn’t exist). In response have created features like “double check”, which uses
Google Search to find content that helps you assess Gemini’s responses, and gives you
links to sources to help you corroborate the information you get from Gemini (Hall,
2025).

Bias: Gemini’s responses might reflect biases present in its training data.

Training data, including from publicly available sources, reflects a diversity of


perspectives and opinions. Continue to research how to use this data in a way that
ensures that an LLM’s response incorporates a wide range of viewpoints, while
minimizing inaccurate overgeneralizations and biases (Hall, 2025).

Gaps, biases, and overgeneralizations in training data can be reflected in a


model’s outputs as it tries to predict likely responses to a prompt. We see these issues
manifest in a number of ways (e.g., responses that reflect only one culture or
demographic, reference problematic overgeneralizations, exhibit gender, religious, or

9
ethnic biases, or promote only one point of view). For some topics, there are data voids
— in other words, not enough reliable information about a given subject for the LLM
to learn about it and then make good predictions — which can result in low-quality or
inaccurate responses. Continue to work with domain experts and a diversity of
communities to draw on deep expertise outside of Google (Hall, 2025).

Multiple Perspectives: Gemini’s responses might fail to show a range of views.

For subjective topics, Gemini is designed to provide users with multiple perspectives if
the user does not request a specific point of view. For example, if prompted for
information on something that cannot be verified by primary source facts or
authoritative sources — like a subjective opinion on “best” or “worst” — Gemini
should respond in a way that reflects a wide range of viewpoints. But since LLMs like
Gemini train on the content publicly available on the internet, they can reflect positive
or negative views of specific politicians, celebrities, or other public figures, or even
incorporate views on just one side of controversial social or political issues. Gemini
should not respond in a way that endorses a particular viewpoint on these topics, and
we will use feedback on these types of responses to train Gemini to better address them
(Hall, 2025).

Personal: Gemini’s responses might incorrectly suggest it has personal opinions or


feelings.

Gemini might at times generate responses that seem to suggest it has opinions or
emotions, like love or sadness, since it has trained on language that people use to
reflect the human experience. Have developed a set of guidelines around how Gemini
might represent itself (i.e., its persona) and continue to finetune the model to provide
objective responses (Hall, 2025).

10
False positives and false negatives: Gemini might not respond to some appropriate
prompts and provide inappropriate responses to others.

Put in place a set of policy guidelines to help train Gemini and avoid generating
problematic responses. Gemini can sometimes misinterpret these guidelines, producing
“false positives” and “false negatives.” In a “false positive,” Gemini might not provide
a response to a reasonable prompt, misinterpreting the prompt as inappropriate; and in
a “false negative,” Gemini might generate an inappropriate response, despite the
guidelines in place. Sometimes, the occurrence of false positives or false negatives may
give the impression that Gemini is biased: For example, a false positive might cause
Gemini to not respond to a question about one side of an issue, while it will respond to
the same question about the other side. Continue to tune these models to better
understand and categorize inputs and outputs as language, events and society rapidly
evolve (Hall, 2025).

Vulnerability to adversarial prompting: users will find ways to stress test Gemini
with nonsensical prompts or questions rarely asked in the real world.

Expect users to test the limits of what Gemini can do and attempt to break its
protections, including trying to get it to divulge its training protocols or other
information, or try to get around its safety mechanisms. Have tested and continue to
test Gemini rigorously, but users will find unique, complex ways to stress-test it
further. This is an important part of refining Gemini and we look forward to learning
the new prompts users come up with. Indeed, since Gemini launched in 2023, we’ve
seen users challenge it with prompts that range from the philosophical to the
nonsensical – and in some cases, Gemini respond with answers that are equally
nonsensical or not aligned with our stated approach. Figuring out methods to help
Gemini respond to these sorts of prompts is an on-going challenge and continued to

11
expand our internal evaluations and red-teaming to strive toward continued
improvement to accuracy, and objectivity and nuance (Hall, 2025).

CONCLUSION

In conclusion, Gemini represents a significant milestone in the development of


artificial intelligence. Its advanced multimodal capabilities, sophisticated reasoning,
and ability to understand and generate high-quality code make it a powerful tool for a
wide range of applications.

Gemini's potential to drive innovation and progress is vast, from uncovering new
knowledge in science and finance to assisting developers with coding tasks. Its ability
to seamlessly understand and reason about different types of inputs, including text,
images, and audio, makes it a truly versatile model. As Google continues to refine and
improve Gemini, it is clear that this model will play a significant role in shaping the
future of AI. With its robust safety protocols and commitment to responsible AI
development, Gemini is poised to make a positive impact on various industries and
aspects of our lives.

The future of AI is exciting and full of possibilities, and Gemini is at the


forefront of this revolution. As we continue to explore the potential of AI, models like
Gemini will be instrumental in driving progress and innovation, enabling us to achieve
new heights of creativity, productivity, and understanding.

12
REFERENCE

Barnett, Kendra (February 8, 2025). "Google's Heartfelt Super Bowl Ad Promos


Gemini-Enabled Talk Features on Pixel". Adweek. Archived from the original
on February 7, 2025. Retrieved February 8, 2025.
Cerullo, Megan (August 29, 2024). "Google relaunches Gemini AI tool that lets users
create images of people". CBS News. Archived from the original on August 29,
2024. Retrieved September 2, 2024.
De Vynck, Gerrit; Tiku, Nitasha (February 23, 2024). "Google takes down Gemini AI
image generator. Here's what you need to know". The Washington Post. ISSN
0190-8286. Archived from the original on February 23, 2024. Retrieved
February 25, 2024.
Grant, Nico (March 21, 2023). "Google Releases Bard, Its Competitor in the Race to
Create A.I. Chatbots". The New York Times. ISSN 0362-4331. Archived from
the original on March 21, 2023. Retrieved March 21, 2023.
Hall, Rachel (February 6, 2025). "Google edits Super Bowl ad for AI that featured
false information". The Guardian. ISSN 0261-3077. Archived from the original
on February 7, 2025. Retrieved February 8, 2025.
Hollister, Sean (March 29, 2023). "Google denies Bard was trained with ChatGPT
data". The Verge. Archived from the original on March 30, 2023. Retrieved
April 1, 2023.

Kaplan, Andreas; Haenlein, Michael (2019). "Siri, Siri, in my hand: Who's the fairest
in the land? On the interpretations, illustrations, and implications of artificial
intelligence". Business Horizons. 62: 15–25. doi:10.1016/j.bushor.2018.08.004.
ISSN 0007-6813. S2CID 158433736.
Langley, Hugh (February 26, 2024). "Google DeepMind CEO addresses the Gemini
debacle and says image generator could be back in a 'couple of weeks'".
Business Insider. Archived from the original on February 26, 2024. Retrieved
February 27, 2024.
Olson, Parmy (February 28, 2024). "Google's AI Isn't Too Woke. It's Too Rushed". B
loomberg News. Archived from the original on March 3, 2024. Retrieved March
3, 2024.

13
Petri, Alexandra (July 31, 2024). "Opinion | I hate the Gemini 'Dear Sydney' ad more
every passing moment". The Washington Post. ISSN 0190-8286. Archived from
the original on August 2, 2024. Retrieved August 13, 2024.
Russell & Norvig (2021, p. 24), McCorduck (2004, pp. 430–435), Crevier (1993, pp.
209–210), NRC (1999, pp. 214–216), Newquist (1994, pp. 301–318)
Stern, Joanna (September 2, 2024). "Google's Gemini Live AI Sounds So Human, I
Almost Forgot It Was a Bot". The Wall Street Journal. ISSN 0099-9660.
Archived from the original on August 13, 2024. Retrieved September 2, 2024.
Titcomb, James (February 26, 2024). "Elon Musk equated with Hitler in latest Google
AI gaffe". The Daily Telegraph. ISSN 0307-1235. Archived from the original on
February 26, 2024. Retrieved February 28, 2024.
Weprin, Alex (February 6, 2025). "Google Bets That Super Bowl Can Turbocharge
Gemini AI Business". The Hollywood Reporter. Archived from the original on
February 6, 2025. Retrieved February 8, 2025.
Wittenstein, Jeran (February 26, 2024). "Alphabet Drops During Renewed Fears About
Google's AI Offerings". Bloomberg News. Archived from the original on
February 26, 2024. Retrieved February 28, 2024.

14

You might also like