LLM Multimodal Guidelines

The document outlines guidelines for creating prompts for a Multimodal Large Language Model (LLM) that processes multiple data types, such as text and images. It emphasizes the importance of crafting relevant, varied, and complex queries that require the model to analyze the content of images for accurate responses. Additionally, it provides specific instructions and considerations for writing effective queries, including avoiding basic prompts and ensuring proper language use.

Uploaded by

monkey0luffy237

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

LLM Multimodal Guidelines

Uploaded by

monkey0luffy237

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1

Multimodal Guidelines
Date Created: 08/09/2024

Table of Contents
Introduction to LLMs (Large Language Models) ..................................................................1
Project Overview ............................................................................................................1
Introduction to Multimodal ..............................................................................................1
Diagram for Multimodal processing ..................................................................................2
Instructions to the Task ...................................................................................................2
DO's and DON'Ts ............................................................................................................3

Introduction to LLMs (Large Language Models)

Large Language Models (LLMs) are advanced artificial intelligence (AI) systems trained on vast amounts of data
to understand and generate human-like text. These models are capable of performing various tasks such as text
generation, summarization, translation, and more.

Project Overview
A Multimodal LLM is a type of AI system that can process multiple types of data simultaneously. Consider the
different ways we learn, such as reading, seeing pictures, and hearing sounds. LLMs can now learn from all
these sources, which helps them perform tasks in a more human-like way.

For example, imagine you have an LLM chatbot built into a personal device such as Smart Glasses, which
enables you to interact with it while on the go. Instead of being limited to writing text into a web browser, you
can ask it about something you see in the real world as you “show” it to the model through a visual or audio
input to the device. By combining these modes, multimodal LLMs can improve the accuracy and naturalness of
their responses.

Introduction to Multimodal
In this task, you will be creating prompts/queries about a provided image. A prompt or query is the instruction
or question given to the chatbot. These queries will be used to test and train a multimodal LLM chatbot. By
collecting a large variety of prompts/queries, we can teach the model to understand the context of an image as
well as the best ways to respond to various instructions and questions about it. These prompts need to be
questions that people might feasibly ask this model about the image.

Prompts need to be relevant to the image, sound natural and conversational, and be varied in form and
content. This requires the contributor to use good reasoning and creative thinking skills to produce a variety of
prompts that are neither repetitive nor overly simple. Basic prompts such as “describe this photo” or “how
many jars are in this picture?” are not helpful when trying to train models to process more involved
knowledge-based queries because these can be answered with basic image recognition technology. Instead,
we want to teach the model to “think” about what is in an image before it responds.

This content is for internal use only

For example, if looking at an image of the Statue of Liberty, a good prompt in this case could be, “What time
does it open for visitors?” Another prompt for the image could be, “Is there a gift shop inside?”

As another example, if shown an image of a plant, a good prompt in this case might be, “How much sunlight
should this get?”

Diagram for Multimodal processing

This simulator utilizes “Text and Image as input” functionality for multimodal, with the output of “Text” as
shown here.

Instructions to the Task

1. Review the image provided on the left-hand side under “Judgment.” Consider what kind of real-world
situation you might be in when looking at the contents of the image.
2. Type three independent queries about the image into the prompt boxes. Each query must adhere to
the criteria outlined in the “Important Considerations when Writing Queries” section below.
3. After proofreading your three queries, click the “Test Validation” button.

This content is for internal use only

Important Considerations when Writing Queries

1. Queries must be relevant to the image. In other words, they must require the LLM model to
analyze/consider the content of the image to provide a valid response. For example, if the image
shows a kitchen table with several ingredients on it, a valid query would be, "What main dish could I
make using the ingredients on this table?"

Queries that the LLM could answer without referencing the image, such as "What is the best cut of
meat for pot roast?" would not be acceptable because the response could be answered without the
need for the LLM to consider the image's contents.

2. Queries should be no longer than 40 words.

3. Queries should differ from one another as much as possible.

4. Phrasing should be natural and realistic.

5. Queries should be sufficiently complex—they should not be answerable through basic image
recognition software.

6. Queries should aim to elicit brief responses—realistic questions about the contents of an image do
not typically require long responses. For example, an effective query about an image of the Eiffel
Tower would not be, “Provide a detailed description of the history of this monument.” This would
require the model to provide a lengthy response.

7. Queries should be written with proper punctuation and capitalization and free from spelling/grammar
errors, profanity, or otherwise objectionable text or content.

8. Queries should be natural questions that could be asked in the moment and should not be overly
formal, nor should they address the model by beginning with “hey chatbot.”

9. Queries about the scene or objects in the image should not contain the actual words "in this image."
Imagine you are seeing the image in person, not asking about an image on a screen.

10. Queries need to elicit a text response only. The “multimodal” part of this model refers to the form of
inputs it can receive, not the kind of responses it can create. It cannot perform actions like creating a
reminder, note, or contact, nor can it place calls or send a message. However, it can translate,
summarize, or rewrite a block of text if one exists in the image.

DO's and DON'Ts

Based on the Task Instructions, here are some practical tips for performing this task effectively:

DO:
• Write prompts that rely on the image to be answered.
• Vary your prompts—repetitive prompts asking the same or very similar questions is not helpful.
• Create queries that elicit brief responses.
• Use proper punctuation, capitalization, and spelling.
• Write a query that elicits a text response.

This content is for internal use only

DON’T:
• Write basic prompts. Queries should be sufficiently complex.
• Write prompts that are overly formal or that informally address the model by starting with “hey
chatbot.”
• Include the words “in this image.”

This simulator and these guidelines are intended to introduce you to LLMs. Guidelines and requirements will
vary for every project.

This content is for internal use only

Notes Prompt Engineering Intro
No ratings yet
Notes Prompt Engineering Intro
19 pages
2017 EC Grade 8 English Model Exam
100% (1)
2017 EC Grade 8 English Model Exam
7 pages
I-Tech Park
No ratings yet
I-Tech Park
94 pages
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
100% (1)
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
115 pages
Understanding Ambiguity in Language
0% (1)
Understanding Ambiguity in Language
13 pages
French Resources - Resources
100% (6)
French Resources - Resources
4 pages
An Exegetical Summary of 1, 2, 3 John
100% (4)
An Exegetical Summary of 1, 2, 3 John
198 pages
Prompt Engineering
No ratings yet
Prompt Engineering
18 pages
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
100% (3)
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
275 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (6)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
LLMs Guide for Developers & Data Scientists
100% (14)
LLMs Guide for Developers & Data Scientists
132 pages
Laser A1+ TB - Units 1 - 5
75% (4)
Laser A1+ TB - Units 1 - 5
37 pages
Hiligaynon Language Guide
100% (1)
Hiligaynon Language Guide
6 pages
Eiken Grade 3 Interview Teacher Guide
100% (2)
Eiken Grade 3 Interview Teacher Guide
29 pages
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
No ratings yet
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
285 pages
The White Knight Tirant Lo Blanc
No ratings yet
The White Knight Tirant Lo Blanc
249 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
LLMs: A Researcher's Guide
No ratings yet
LLMs: A Researcher's Guide
46 pages
Unit 6 Test Study Guide
No ratings yet
Unit 6 Test Study Guide
6 pages
Prompt Engineering Learning Resources
No ratings yet
Prompt Engineering Learning Resources
16 pages
Efficient Multimodal Large Language Models - A Survey
No ratings yet
Efficient Multimodal Large Language Models - A Survey
36 pages
Prompt Design and Engineering
No ratings yet
Prompt Design and Engineering
25 pages
Lec8 - Large Multimodal Models
No ratings yet
Lec8 - Large Multimodal Models
45 pages
#Makers 1: - Unit 7 (Test A)
No ratings yet
#Makers 1: - Unit 7 (Test A)
5 pages
Module 2
No ratings yet
Module 2
17 pages
(Seminar-01) A Survey On Multimodal Large Language Models
No ratings yet
(Seminar-01) A Survey On Multimodal Large Language Models
63 pages
MULTIMODAL LLMs
No ratings yet
MULTIMODAL LLMs
82 pages
Show and Guide: Instructional-Plan Grounded Vision and Language Model
No ratings yet
Show and Guide: Instructional-Plan Grounded Vision and Language Model
19 pages
LLMs: A Researcher's Guide
No ratings yet
LLMs: A Researcher's Guide
46 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Sound Devices
No ratings yet
Sound Devices
18 pages
Unlocking The Power of LLMs - Transformative Use Cases Across Industries
No ratings yet
Unlocking The Power of LLMs - Transformative Use Cases Across Industries
44 pages
Aec - English Proficiency and Life Skills - I - Level 5
No ratings yet
Aec - English Proficiency and Life Skills - I - Level 5
3 pages
AI Prompting Guide for Beginners
No ratings yet
AI Prompting Guide for Beginners
3 pages
TP Assignment
No ratings yet
TP Assignment
3 pages
James Olney: Memory and Narrative PDF
No ratings yet
James Olney: Memory and Narrative PDF
25 pages
Introduction To Large Language Models-2025072419561496
No ratings yet
Introduction To Large Language Models-2025072419561496
16 pages
LINKERS Writing Unit 3
No ratings yet
LINKERS Writing Unit 3
5 pages
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
No ratings yet
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
29 pages
NoteLLM-2 - Multimodal Large Representation Models For Recommendation
No ratings yet
NoteLLM-2 - Multimodal Large Representation Models For Recommendation
12 pages
What Are Multimodal Models
No ratings yet
What Are Multimodal Models
6 pages
Tut Letter SEVEN
No ratings yet
Tut Letter SEVEN
51 pages
P D E: I A M: Rompt Esign and Ngineering Ntroduction and Dvanced Ethods
No ratings yet
P D E: I A M: Rompt Esign and Ngineering Ntroduction and Dvanced Ethods
26 pages
How To Bridge The Gap Between Modalities: A Comprehensive Survey On Multi-Modal Large Language Model
No ratings yet
How To Bridge The Gap Between Modalities: A Comprehensive Survey On Multi-Modal Large Language Model
15 pages
Can Chatgpt Detect IA
No ratings yet
Can Chatgpt Detect IA
10 pages
Clase1 Generating Your First Text
No ratings yet
Clase1 Generating Your First Text
18 pages
AILLM
No ratings yet
AILLM
3 pages
2023 Multimodal Large Language Models - A Survey
No ratings yet
2023 Multimodal Large Language Models - A Survey
10 pages
Userdrive 1844/AIPrompts/65da8a56045061708821078
No ratings yet
Userdrive 1844/AIPrompts/65da8a56045061708821078
62 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Multimodal AI Tool Integration
No ratings yet
Multimodal AI Tool Integration
25 pages
Chips 8
No ratings yet
Chips 8
1 page
LMM Model
No ratings yet
LMM Model
41 pages
2 Notes
No ratings yet
2 Notes
3 pages
Overview of Large Language Models
No ratings yet
Overview of Large Language Models
47 pages
A Survey On Multimodal Large Language Models
No ratings yet
A Survey On Multimodal Large Language Models
18 pages
A Survey On Multimodal Large Language Models
No ratings yet
A Survey On Multimodal Large Language Models
15 pages
01 - What and Why of Prompts
No ratings yet
01 - What and Why of Prompts
21 pages
LLM Model
No ratings yet
LLM Model
43 pages
《A Primer on Large Language Models and their Limitations
No ratings yet
《A Primer on Large Language Models and their Limitations
33 pages
Prompt Engineer Xar
No ratings yet
Prompt Engineer Xar
26 pages
Guide 4 Prompt Engineering
No ratings yet
Guide 4 Prompt Engineering
1 page
LLM - Introduction 2024
No ratings yet
LLM - Introduction 2024
77 pages
Untitled 2
No ratings yet
Untitled 2
3 pages
LLMs: Applications & Challenges
No ratings yet
LLMs: Applications & Challenges
30 pages
AnyMAL - An Efficient and Scalable Any-Modality Augmented Language Model
No ratings yet
AnyMAL - An Efficient and Scalable Any-Modality Augmented Language Model
23 pages
Assessing The Strengths and Weaknesses of Large Language Models
No ratings yet
Assessing The Strengths and Weaknesses of Large Language Models
12 pages
Multi Modal
No ratings yet
Multi Modal
25 pages
Levels of AI Agents - From Rules To Large Language Models
No ratings yet
Levels of AI Agents - From Rules To Large Language Models
8 pages
3D-LLM: Enhancing Language Models with 3D World Integration
No ratings yet
3D-LLM: Enhancing Language Models with 3D World Integration
13 pages
Technical Seminar
No ratings yet
Technical Seminar
16 pages
AudioChatLlama: Towards General-Purpose Speech Abilities For LLMs
No ratings yet
AudioChatLlama: Towards General-Purpose Speech Abilities For LLMs
11 pages
Unit 3: Cohesion and Coherence
No ratings yet
Unit 3: Cohesion and Coherence
8 pages
Prompt Engineering Guide
No ratings yet
Prompt Engineering Guide
122 pages
Appagent: Multimodal Agents As Smartphone Users
No ratings yet
Appagent: Multimodal Agents As Smartphone Users
10 pages
Llava 2304.08485
No ratings yet
Llava 2304.08485
19 pages
Hector Taipe Quispe Inglés (Estadounidense) Nivel 2
No ratings yet
Hector Taipe Quispe Inglés (Estadounidense) Nivel 2
4 pages
Fluent Python - Notatki
No ratings yet
Fluent Python - Notatki
3 pages
Carta Didáctica - Formato
No ratings yet
Carta Didáctica - Formato
16 pages
FSL 10-20-30 Unit Plan
No ratings yet
FSL 10-20-30 Unit Plan
15 pages
Private Schools SPLD Vacation - Greater Accra
No ratings yet
Private Schools SPLD Vacation - Greater Accra
2 pages
Gemma 3 Report
No ratings yet
Gemma 3 Report
25 pages
Mobile App Development
No ratings yet
Mobile App Development
9 pages
German Language
No ratings yet
German Language
56 pages
JSP Expression Language Guide
No ratings yet
JSP Expression Language Guide
2 pages
Unit 1: "What Are They Doing?"
No ratings yet
Unit 1: "What Are They Doing?"
10 pages
Rhetorical Appeals
No ratings yet
Rhetorical Appeals
2 pages