Generative AI Basics
In contrast, some AI models can produce an incredible variety of text, images, and
sounds that we’ve never read, seen, or heard before. This kind of AI is known as
generative AI. It holds within it massive potential for change, both in and out of
the workplace.
Possibilities of Language Models
Generative AI might seem like this hot new thing, but in reality researchers have
been training generative AI models for decades. Some have even made the news in the
past few years. Maybe you remember articles from 2018 when a company named Nvidia
unveiled an AI model that could produce random photorealistic images of human
faces. The pictures were surprisingly convincing. And although they weren’t
perfect, they were definitely a conversation-starter. Generative AI was slowly
beginning to enter the public consciousness.
As researchers worked on AI that could make specific kinds of
images, others were focused on AI related to language. They were training AI models
to perform all sorts of tasks that involved interpreting text. For example, you
might want to categorize reviews of one of your products as positive, negative, or
neutral. That’s a task that requires an understanding of how words are combined in
everyday use, and it’s a great example of what experts call natural language
processing (NLP). Because there are so many ways to “process” language, NLP
describes a broad category of AI. (For more on NLP, see Natural Language Processing
Basics.)
Some AIs that perform NLP are trained on huge amounts of data, which in
this case means samples of text written by real people. The internet, with its
billion web pages, is a great source of sample data. Because these AI models are
trained on such massive amounts of data, they’re known as large language models
(LLMs). LLMs capture, in incredible detail, the language rules humans take years to
learn. These large language models make it possible to do some incredibly advanced
language-related tasks.
Summarization: If you’re given a sentence and you understand how all the words come
together to make a point, you can probably rewrite the sentence to express the same
idea. Since AI models know the rules for syntax, and they’ve learned which words
can be swapped for others, they can make a remix too. Taking a whole paragraph and
condensing into one or two sentences is just another kind of remix. This kind of
AI-assisted summarization can be very helpful in the real world. It can create
meeting notes from an hour-long recording. Or write an abstract of a scientific
paper. It’s the ultimate elevator-pitch generator.
Translation: LLMs are like a collection of rules for how a language structures
words into ideas. Each language has its own rules. In English, we typically put
adjectives before nouns, but in French it’s usually the other way around. AI
translators are trained to learn both sets of rules. So when it’s time for a
sentence remix, AI can use a second set of rules to express the same idea. Voilà,
you have yourself a great translation. And programming languages are languages too.
They have their own set of rules, so AI can translate a loose set of instructions
into actual code. A personal pocket programmer can open a lot of doors for a lot of
people.
Error correction: Even the most experienced writers make the occasional grammatical
or spelling mistake. Now, AI will detect (and sometimes auto-correct) anything
amiss. Also, patching up errors is important when simply listening to someone
speak. You might miss a word or two because you’re in a noisy environment, but you
use context to fill in the gap. AI can do this too, making speech-to-text tasks
like closed captioning even more accurate.
Question answering: This is the task that launched generative AI into the
limelight. AIs such as ChatGPT are capable of interpreting the intention of a
question or request. Then, it can generate a large amount of text based on the
request. For example, you could ask it for a one-sentence summary of the three most
popular works of William Shakespeare, and you’d get:
"Romeo and Juliet" - A tragic tale of two young lovers from feuding families whose
love ultimately leads to their untimely deaths.
"Hamlet" - The story of a prince haunted by his father’s ghost, grappling with
revenge and the existential questions of life and death.
"Macbeth" - A chilling drama of ambition and moral decline as a nobleman, driven by
his wife’s ambition, succumbs to a bloody path of murder to seize the throne.
Then, you could continue the conversation by asking for more information about
Hamlet, as though you were talking with your Language Arts teacher. This kind of
interaction is a great example of getting just-in-time information with a simple
request.
Guided image generation: LLMs can be used in tandem with image generation models so
that you can describe the image you want, and AI will attempt to make it for you.
Here’s an example of asking for “a 2D line art drawing of Juliet standing in the
window of an old castle.” Because there are so many descriptions and images of
Romeo & Juliet on the internet, the AI generator didn’t need any further
information to make a guess at an appropriate image.
A 2D line art drawing of Juliet standing in the window of an old castle.
[AI-generated image using DreamStudio at stability.ai with the prompt, “A 2D line
art drawing of Juliet standing in the window of an old castle.”]
Related to guided image generation, some AI models can add new content into
existing images. For example, you could extend the borders of a picture, allowing
the AI to draw in what is likely to appear based on the context of the original
picture.
Text-to-speech: Similar to how AI can convert a string of words into a picture,
there are AI models that can convert text to speech. Some models can analyze audio
samples of a person speaking. It learns that person’s unique speech patterns, and
can reproduce them when converting text to new audio. To the casual listener, it’s
hard to tell the difference.
These are just a few examples of how LLMs are used to create new text, images, and
sounds. Almost any task that relies on an understanding of how language works can
be augmented by an AI. It’s an incredibly powerful tool that you can use for both
work and play.
Impressive Predictions
Now that you have an idea of what generative AI is capable of, it’s important to
make something very clear. The text that a generative AI generates is really just
another form of prediction. But instead of predicting the value of a home, it
predicts a sequence of words that are likely to have meaning and relevance to the
reader.
The predictions are impressive, to be sure, but they are not a sign that the
computer is “thinking.” It doesn’t have an opinion about the topic you ask about,
nor does it have intentions or desires of its own. If it ever sounds like it has an
opinion, that’s because it’s making the best prediction of what you expect as a
response. For example, asking someone “Do you prefer coffee or tea?” elicits a
certain kind of expected response. A well-trained model can predict a response,
even if it doesn’t make sense for a computer to want any kind of drink.