Lecture 19

The document outlines the process of synthetic data generation using Large Language Models (LLMs) and the components involved in a Retrieval-Augmented Generation (RAG) architecture. It details the steps from data loading and indexing to retrieval and generation, emphasizing the importance of embedding models and vector databases for efficient information processing. Additionally, it highlights quality assurance measures like safety checks and post-processing techniques to enhance the final output generated by the LLM.

Uploaded by

Eishah Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Lecture 19

Uploaded by

Eishah Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Generative AI

Fundamentals

©2023 Databricks Inc. — All rights reserved

QUIZ 04

What is meant by Synthetic Data Generation?

And how are LLMs aiding this technique?

Start: 3:40 End: 3:50

Infrastructural Components of a (RAG)
Indexing
•The process begins with data loaders, which retrieve data from various sources, including
unstructured documents (e.g., PDFs, docs), semi-structured data (e.g., XML, JSON, CSV), and even
structured data residing in SQL databases using data connectors.

•Document splitters organize the data and prepare it for efficient processing by the embedding model.

•They achieve this by segmenting the documents into logical units – sentences or paragraphs – based
on predefined rules. This segmentation ensures that information remains semantically intact while
preparing it for further processing.

•Tokenizer takes each logical unit (e.g., paragraph) from the document splitter and breaks it
intotokens, depending on the chosen embedding model and the desired level of granularity. Using a
single tokenizer ensures consistency throughout the system.
Indexing

•The embedding model converts each token into a

numerical vector representation, capturing its
semantic meaning within the context of the
surrounding text.

•Pre-trained embedding models, either word embeddings or contextual embeddings, achieve this by
mapping the tokens into these vector representations.

•Finally, an indexing component takes over.

•It packages the generated embedding vectors along with any associated metadata (e.g., document
source information) and sends them to a specialized embedding database – the vector
database (vector DB) – for efficient storage.

•This database becomes the foundation for the retrieval stage, where the RAG architecture searches
for relevant information based on user queries.
Retrieval
•The user submits a prompt that needs to be processed.The prompt is prepared to match the same
structure (embeddings) used during the indexing phase.

•Safety, ethical, and quality checks are applied. Ensures the prompt aligns with guidelines and
prevents misuse.

•The prompt is tokenized and converted to embeddings.

•The system searches a vector database for embeddings similar to the prompt’s vector. Retrieved data
chunks represent relevant content linked to the user’s query.

•A ranking service assigns scores to each chunk based on similarity to the prompt’s vector. The system
prioritizes the most relevant chunks for the response.
Generation
•The top-ranked chunks and the embedded prompt are passed to the LLM (Large Language Model).The
LLM processes the information to generate a coherent, informative response.
•The LLM produces the final output, ensuring it is context-aware and aligned with user expectations and
the response is presented to the user.
The raw output from the LLM might undergo some post-processing steps to enhance its quality. This
could involve tasks like:
Text Normalization: Ensuring consistency in formatting, such as converting all numbers to a standard
format or handling special characters.
Spell Checking: Identifying and correcting any potential typos or spelling errors.
Grammar Correction: Refining the
grammatical structure of the
generated text for clarity and
coherence.
Redundancy Removal: Eliminating
unnecessary repetition or irrelevant
information that may clutter the
response.

Project Document Management Procedure
91% (11)
Project Document Management Procedure
32 pages
Project On Google Analytics
100% (2)
Project On Google Analytics
36 pages
16 Case Study For Communciations PDF
No ratings yet
16 Case Study For Communciations PDF
9 pages
Steps Involved in RAG
No ratings yet
Steps Involved in RAG
4 pages
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
100% (10)
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
6 pages
Pec Gen Ai Notes
No ratings yet
Pec Gen Ai Notes
11 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Practical RAG
No ratings yet
Practical RAG
127 pages
Lecture # 14-1 Introduction To RAG
No ratings yet
Lecture # 14-1 Introduction To RAG
56 pages
Transcript For Explaining Retrieval-Augmented Generation (RAG) To Colleagues
No ratings yet
Transcript For Explaining Retrieval-Augmented Generation (RAG) To Colleagues
6 pages
Rag
No ratings yet
Rag
10 pages
OCI Generative AI
No ratings yet
OCI Generative AI
19 pages
Knowledge Retrieval Based On Generative AI: 1 Te-Lun Yang
No ratings yet
Knowledge Retrieval Based On Generative AI: 1 Te-Lun Yang
8 pages
GenAI PDF
No ratings yet
GenAI PDF
34 pages
Llmrag
No ratings yet
Llmrag
6 pages
Chapters
No ratings yet
Chapters
7 pages
Generative AI PPT Final
No ratings yet
Generative AI PPT Final
34 pages
Natural Language Processing
No ratings yet
Natural Language Processing
11 pages
RAG Slide ENG
No ratings yet
RAG Slide ENG
41 pages
Chapter 3 Methods
No ratings yet
Chapter 3 Methods
20 pages
Privacy First RAG Closed-Loop LLMs For Industrial Data Security
No ratings yet
Privacy First RAG Closed-Loop LLMs For Industrial Data Security
12 pages
RAG 570 Hasnad Ahmed2
No ratings yet
RAG 570 Hasnad Ahmed2
9 pages
LLM Fundamentals and Prompt Engineering Study Guide
No ratings yet
LLM Fundamentals and Prompt Engineering Study Guide
4 pages
RAG and Vector Database Guide
No ratings yet
RAG and Vector Database Guide
18 pages
CrateDB and LangChain
No ratings yet
CrateDB and LangChain
14 pages
Retrieval-Augmented Generation For AI-Generated Content A Survey
No ratings yet
Retrieval-Augmented Generation For AI-Generated Content A Survey
28 pages
A Comprehensive Survey of Retrieval-Augmented Generation (RAG) : Evolution, Current Landscape and Future Directions
No ratings yet
A Comprehensive Survey of Retrieval-Augmented Generation (RAG) : Evolution, Current Landscape and Future Directions
18 pages
RAG Deep-Dive Research Report
No ratings yet
RAG Deep-Dive Research Report
46 pages
Session03 - RNN
No ratings yet
Session03 - RNN
69 pages
Sithafal Project Tasks
No ratings yet
Sithafal Project Tasks
2 pages
LangChain & RAG - U1
No ratings yet
LangChain & RAG - U1
32 pages
Minor Proj
No ratings yet
Minor Proj
15 pages
Author Name Title Paper/Submission ID Submitted by Submission Date Total Pages Document Type
No ratings yet
Author Name Title Paper/Submission ID Submitted by Submission Date Total Pages Document Type
8 pages
Challenge
No ratings yet
Challenge
8 pages
Survey on Retrieval-Augmented Generation for AI Content
No ratings yet
Survey on Retrieval-Augmented Generation for AI Content
22 pages
Semantic Search and Beyond handout-Tim-Clarke
No ratings yet
Semantic Search and Beyond handout-Tim-Clarke
16 pages
4-HC24.PrimisAI - Hans Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI - Hans Bouwmeester.v4
29 pages
AGENTIC RAG-Tech Stack
No ratings yet
AGENTIC RAG-Tech Stack
18 pages
Retrieval Augmented Generation (RAG) For Everyone
No ratings yet
Retrieval Augmented Generation (RAG) For Everyone
57 pages
Rag
No ratings yet
Rag
4 pages
Module 4 - RAG (Retrieval Augmented Generation) - PEC GenAI Course
No ratings yet
Module 4 - RAG (Retrieval Augmented Generation) - PEC GenAI Course
23 pages
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
No ratings yet
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
12 pages
Large Language Models
No ratings yet
Large Language Models
1 page
Maximizing Rag Efficiency A Comparative Analysis of Rag Methods
No ratings yet
Maximizing Rag Efficiency A Comparative Analysis of Rag Methods
25 pages
How To Train LLM
No ratings yet
How To Train LLM
6 pages
RAG Training NEW
No ratings yet
RAG Training NEW
47 pages
Retrieval Augmented Generation - Streamlining The Creation of Intelligent Natural Language Processing Models
No ratings yet
Retrieval Augmented Generation - Streamlining The Creation of Intelligent Natural Language Processing Models
8 pages
A Survey On Retrieval-Augmented Text Generation For Large Language Models
No ratings yet
A Survey On Retrieval-Augmented Text Generation For Large Language Models
18 pages
Advanced RAG Techniques for LLM Apps
No ratings yet
Advanced RAG Techniques for LLM Apps
54 pages
Rag Semi Structured
No ratings yet
Rag Semi Structured
20 pages
AI & RAG for Exam Prep
No ratings yet
AI & RAG for Exam Prep
16 pages
Building LLM Applications
No ratings yet
Building LLM Applications
14 pages
Advanced Gen-AI Development
No ratings yet
Advanced Gen-AI Development
57 pages
AI Model Enhancement Techniques
No ratings yet
AI Model Enhancement Techniques
9 pages
Data Engineer Generative Ai
No ratings yet
Data Engineer Generative Ai
17 pages
GENAI1
No ratings yet
GENAI1
25 pages
23mca1047
No ratings yet
23mca1047
57 pages
Aistudy 240521200530 db141c56
No ratings yet
Aistudy 240521200530 db141c56
18 pages
Ai 1
No ratings yet
Ai 1
22 pages
R AG: Incorporating Retrieval Information Into Retrieval Augmented Generation
No ratings yet
R AG: Incorporating Retrieval Information Into Retrieval Augmented Generation
13 pages
Retrieval-Augmented Generation in NLP
No ratings yet
Retrieval-Augmented Generation in NLP
17 pages
Understanding Retrieval-Augmented Generation (RAG)
No ratings yet
Understanding Retrieval-Augmented Generation (RAG)
12 pages
Data Analyst & Developer Resume
No ratings yet
Data Analyst & Developer Resume
7 pages
Date Dimension in SSAS Guide
No ratings yet
Date Dimension in SSAS Guide
7 pages
Database Schema for Streaming Service
No ratings yet
Database Schema for Streaming Service
3 pages
Python Data Science Projects
No ratings yet
Python Data Science Projects
14 pages
TSM Commands
100% (1)
TSM Commands
11 pages
Hive Architecture
No ratings yet
Hive Architecture
7 pages
Employee Attendance Management Report SEN
No ratings yet
Employee Attendance Management Report SEN
16 pages
How To Validate A Backup: 1. Validating A Logical Export (Taken Using Exp Utility)
No ratings yet
How To Validate A Backup: 1. Validating A Logical Export (Taken Using Exp Utility)
3 pages
Personal Information Management
No ratings yet
Personal Information Management
34 pages
Doctors Appointment System
No ratings yet
Doctors Appointment System
5 pages
Digital Marketing Assignment Guide
No ratings yet
Digital Marketing Assignment Guide
3 pages
Sales Analysis
No ratings yet
Sales Analysis
4 pages
Safe and Responsible Use of Computer, Internet, and Email
100% (1)
Safe and Responsible Use of Computer, Internet, and Email
23 pages
Rajkumar Shanmugam A
No ratings yet
Rajkumar Shanmugam A
5 pages
Protein Structure Classification Guide
No ratings yet
Protein Structure Classification Guide
50 pages
Data Science Project VI - Ipynb - Colaboratory
No ratings yet
Data Science Project VI - Ipynb - Colaboratory
15 pages
TESDA Circular No. 034-2020
No ratings yet
TESDA Circular No. 034-2020
15 pages
INTRODUCTION Railway Reservation
No ratings yet
INTRODUCTION Railway Reservation
61 pages
Unit 1
No ratings yet
Unit 1
41 pages
Roles and Responsibilities of A Database Administrator
No ratings yet
Roles and Responsibilities of A Database Administrator
5 pages
Thesis On Warehouse Management
100% (3)
Thesis On Warehouse Management
8 pages
Data Warehousing Schemas and Objects
No ratings yet
Data Warehousing Schemas and Objects
24 pages
DQM Exercise
No ratings yet
DQM Exercise
15 pages
The Report Writing Format Outline
No ratings yet
The Report Writing Format Outline
5 pages
Geographic Information System (GIS) : Get Inspired
No ratings yet
Geographic Information System (GIS) : Get Inspired
2 pages
(Premier Reference Source) Tariq Ashraf, Puja Anand Gulati-Developing Sustainable Digital Libraries - Socio-Technical Perspectives (Premier Reference Source) - Information Science Publishing (2010)
100% (1)
(Premier Reference Source) Tariq Ashraf, Puja Anand Gulati-Developing Sustainable Digital Libraries - Socio-Technical Perspectives (Premier Reference Source) - Information Science Publishing (2010)
379 pages
Ubiquitous Computing
No ratings yet
Ubiquitous Computing
31 pages

Lecture 19

Uploaded by

Lecture 19

Uploaded by

Generative AI

©2023 Databricks Inc. — All rights reserved

What is meant by Synthetic Data Generation?

Start: 3:40 End: 3:50

•The embedding model converts each token into a

•Finally, an indexing component takes over.

•The prompt is tokenized and converted to embeddings.

You might also like