Module 3 & 4 – Important Questions with Answers
Q1. (7 Marks)
What are the typical data structures used to obtain nested data from GPT-4 outputs? With
an example, describe the steps to generate a hierarchical list.
Answer:
- GPT-4 often returns nested outputs in formats like:
1. Lists/Arrays → Ordered data.
2. Dictionaries (JSON objects) → Key-value hierarchical data.
3. Tuples → Group of related values.
4. Tree-like structures → Represent parent-child relationships.
Steps to generate hierarchical list:
1. Prompt GPT for structured output (JSON/dict).
2. Parse response using json.loads() or dict.
3. Identify keys and nested values (parent-child).
4. Convert into hierarchy/tree.
5. Store/visualize as nested list/dict.
Example:
{
"Course": "GenAI",
"Modules": [
{"Module": 3, "Topics": ["Prompts", "Summarization", "Chunking"]},
{"Module": 4, "Topics": ["LangChain", "RAG", "Prompt Templates"]}
]
}
Hierarchy: Course → Modules → Topics
Extra Points:
- Useful for knowledge graphs, chatbot reasoning, RAG pipelines.
- Ensures clean structured outputs from LLMs.
Q2. (7 Marks)
What are the data formats supported by ChatGPT? Compare and differentiate between JSON
and YAML files with their use cases.
Answer:
- ChatGPT supports: Plain text, Markdown, CSV, JSON, YAML, XML, HTML tables.
- Common formats: JSON & YAML.
Comparison:
JSON:
- Syntax: Strict ({} [])
- Readability: Machine-friendly
- Data types: Strings, numbers, arrays, objects
- Use case: Web APIs, data exchange
- Example: {"name":"AI"}
YAML:
- Syntax: Indentation-based
- Readability: Human-friendly
- Data types: Same + multi-line strings, comments
- Use case: Config files, ML pipelines
- Example: name: AI
Use Cases:
- JSON → API responses, chatbot outputs, web apps.
- YAML → Configurations (Kubernetes, ML models, LangChain configs).
Extra Points:
- JSON → Standardized, platform-independent.
- YAML → Easier for developers to edit manually.
Q3. (8 Marks)
Discuss the importance/benefits of chunking text in LLMs with a suitable diagram. How to
identify what portions to chunk and not to chunk?
Answer:
Chunking = splitting large documents into smaller parts.
Importance:
1. LLMs have context limits (GPT-4 ~128k tokens).
2. Avoids memory overflow.
3. Improves retrieval accuracy in RAG.
4. Preserves meaning & context.
5. Supports efficient embedding + vector search.
How to chunk properly:
- ✅Chunk by semantic units (paragraphs, sections).
- ✅Maintain sentence completeness.
- ✅Use sliding window (overlap for context).
- ❌Avoid random token splits.
- ❌Avoid breaking tables/code blocks midway.
Diagram:
Large Document → Chunking (Intro | Method | Results | Conclusion)
→ Embedding → Stored in Vector DB → Retrieved Chunks → GPT
Extra Points:
- Essential in document QA systems, chatbots, legal/medical AI.
- Prevents hallucination by giving exact context.
Q4. (8 Marks)
Identify the various chunking strategies with their advantages and disadvantages.
Answer:
1. Fixed-size Chunking
- Divide into equal token sizes (e.g., 500 tokens).
- ✅Easy, fast.
- ❌May cut sentences, lose meaning.
2. Semantic Chunking
- Split based on topics/paragraphs.
- ✅Preserves meaning, context.
- ❌Needs NLP/pre-processing.
3. Recursive Chunking
- Break into sections → then subsections.
- ✅Works for books/reports.
- ❌Higher compute cost.
4. Sliding Window Chunking
- Overlapping chunks (context preserved).
- ✅Best for QA, continuity.
- ❌Redundant storage.
Extra Points:
- Hybrid approach (Semantic + Sliding Window) is best.
- Choice depends on application (QA, summarization, retrieval).
Q5. (7 Marks)
How can LLMs be used for sentiment analysis? Discuss techniques to improve sentiment
analysis and limitations of LLMs in this task.
Answer:
LLMs in Sentiment Analysis:
- Classify text as positive, negative, neutral.
Techniques:
1. Zero-shot → Direct classification via prompt.
2. Few-shot → Provide examples before query.
3. Fine-tuning → Train on sentiment datasets.
4. RAG-based classification → Add domain-specific context.
Improvements:
- Role prompting → (“You are a sentiment analyzer”).
- Chain-of-thought → step-by-step reasoning.
- Ensemble methods → LLM + ML models.
Limitations:
- Bias from training data.
- Weak in sarcasm/irony detection.
- Domain-specific confusion (finance, medical).
- Sensitive to prompt wording.
Extra Points:
- Best results by combining LLMs with supervised classifiers.
Q6. (7 Marks)
What is Role Prompting? Explain its Benefits, Challenges, and When to use it.
Answer:
Role Prompting = Assigning an LLM a role/identity to control responses.
Example: “You are a doctor explaining treatment.”
Benefits:
1. Produces structured, domain-specific answers.
2. Reduces ambiguity in responses.
3. Increases accuracy and relevance.
Challenges:
1. Too rigid → limits creativity.
2. Poor role definition → misleading output.
3. Over-dependence → model less flexible.
When to use:
- Customer support chatbots.
- Legal/Medical/Financial summarization.
- Teaching/training assistants.
- Domain-specific QA.
Extra Points:
- Works best when combined with few-shot prompting.
Q7. (8 Marks)
Apply the LangChain PromptTemplate to design a few-shot prompt for a customer service
chatbot.
Answer:
LangChain PromptTemplate allows reusable, structured prompts with variables.
Few-shot → provide example Q&A before user query.
Code Example:
from langchain.prompts import PromptTemplate
template = """
You are a customer service assistant.
Examples:
Q: How to reset my password?
A: Go to settings → security → reset password.
Q: How to check order status?
A: Visit 'My Orders' section.
Now answer:
Q: {query}
"""
prompt = PromptTemplate(template=template, input_variables=["query"])
print(prompt.format(query="How to return a product?"))
Output:
A: Please go to “My Orders → Return section” and follow the instructions.
Extra Points:
- Ensures consistent tone & format.
- Reduces hallucination by grounding in examples.
Q8. (8 Marks)
Apply the RAG (Retrieval Augmented Generation) pattern using LangChain to enhance a
chatbot’s ability to answer domain-specific questions from custom documents.
Answer:
RAG = Retrieval + Generation
Steps:
1. Split documents → TextSplitter.
2. Embed chunks → OpenAIEmbeddings.
3. Store in vector DB (FAISS/Pinecone).
4. Retrieve relevant context.
5. Query + retrieved text → LLM → Answer.
Code Flow:
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
embeddings = OpenAIEmbeddings()
db = FAISS.from_texts(docs, embeddings)
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(llm=OpenAI(), retriever=retriever)
response = qa.run("Explain refund policy from manual")
print(response)
Advantages:
- Domain-specific accurate answers.
- Reduces hallucinations.
- Scales to large datasets.
- Improves trustworthiness of chatbot.
Extra Points:
- Used in enterprise bots, healthcare assistants, legal AI.