0% found this document useful (0 votes)

22 views10 pages

Multimodal Report Generation

Uploaded by

saivivek reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views10 pages

Multimodal Report Generation

Uploaded by

saivivek reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Multimodal Report Generation (from a Slide

Deck)

In this cookbook we show you how to build a multimodal report generator. The pipeline parses a
slide deck and stores both text and image chunks. It generates a detailed response that contains
interleaving text and images.

NOTE: This pipeline operates over the entire document and does not do retrieval, in order to
retrieve the full context. You can of course explore adding a higher-level retrieval layer where
you retrieve the relevant document(s) first before feeding to a multimodal model.

Setup
import nest_asyncio

nest_asyncio.apply()

Setup Observability
We setup an integration with LlamaTrace (integration with Arize).

If you haven't already done so, make sure to create an account here:
https://llamatrace.com/login. Then create an API key and put it in the PHOENIX_API_KEY
variable below.

!pip install -U llama-index-callbacks-arize-phoenix

# setup Arize Phoenix for logging/observability

import llama_index.core
import os

PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] =
f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
"arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)

Load Data
Here we load the Conoco Phillips 2023 investor meeting slide deck.

!mkdir data
!mkdir data_images
!wget "https://static.conocophillips.com/files/2023-conocophillips-
aim-presentation.pdf" -O data/conocophillips.pdf

Model Setup
Setup models that will be used for downstream orchestration.

from llama_index.core import Settings

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(model="gpt-4o")

Settings.embed_model = embed_model
Settings.llm = llm

Use LlamaParse to Parse Text and Images

In this example, use LlamaParse to parse both the text and images from the document, using our
multimodal mode (+ Sonnet 3.5).

This returns both the parsed document using Sonnet, but also the rendered image chunks saved
locally.

from llama_parse import LlamaParse

parser = LlamaParse(
result_type="markdown",
use_vendor_multimodal_model=True,
vendor_multimodal_model_name="anthropic-sonnet-3.5",
)

print(f"Parsing slide deck...")

md_json_objs = parser.get_json_result("data/conocophillips.pdf")
md_json_list = md_json_objs[0]["pages"]

Parsing PDF file...

Started parsing the file under job_id 412ac275-abe2-4585-be43-
5680e7754740

print(md_json_list[10]["md"])

# Commitment to Disciplined Reinvestment Rate

Disciplined Reinvestment Rate is the Foundation for Superior Returns

on and of Capital, while Driving Durable CFO Growth

| Metric | Value |
|--------|-------|
| 10-Year Reinvestment Rate | ~50% |
| CFO CAGR 2024-2032 | ~6% |
| Mid-Cycle Planning Price | at $60/BBL WTI |

| Period | Industry Growth Focus | ConocoPhillips Strategy Reset |

Reinvestment Rate |
|--------|------------------------|-------------------------------|---
----------------|
| 2012-2016 | >100% Reinvestment Rate | - | ~$75/BBL WTI Average |
| 2017-2022 | - | <60% Reinvestment Rate | ~$63/BBL WTI Average |
| 2023E | - | - | at $80/BBL WTI |
| 2024-2028 | - | - | at $60/BBL WTI (with $80/BBL WTI option shown) |
| 2029-2032 | - | - | at $60/BBL WTI (with $80/BBL WTI option shown) |

*Chart shows ConocoPhillips Average Annual Reinvestment Rate (%) over

time, with historic rates in grey and projected rates in blue.*

Reinvestment rate and cash from operations (CFO) are non-GAAP

measures. Definitions and reconciliations are included in the
Appendix.

print(md_json_list[1].keys())

dict_keys(['page', 'md', 'images', 'items'])

image_dicts = parser.get_images(md_json_objs,
download_path="data_images")

Setup and Build Index

In this section we create a set of nodes from the slide deck, one per page, and attach the
corresponding rendered image file path as metadata for each parsed page chunk.

We then build a simple summary index over the saved deck. NOTE: We could do vector indexing
too, but here we want to produce comprehensive reports which oftentimes require access to the
entire document.

Get Text Nodes

from llama_index.core.schema import TextNode
from typing import Optional

# get pages loaded through llamaparse

import re

def get_page_number(file_name):
match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
if match:
return int(match.group(1))
return 0
def _get_sorted_image_files(image_dir):
"""Get image files sorted by page."""
raw_files = [f for f in list(Path(image_dir).iterdir()) if
f.is_file()]
sorted_files = sorted(raw_files, key=get_page_number)
return sorted_files

from copy import deepcopy

from pathlib import Path

# attach image metadata to the text nodes

def get_text_nodes(json_dicts, image_dir=None):
"""Split docs into nodes, by separator."""
nodes = []

image_files = _get_sorted_image_files(image_dir) if image_dir is

not None else None
md_texts = [d["md"] for d in json_dicts]

for idx, md_text in enumerate(md_texts):

chunk_metadata = {"page_num": idx + 1}
if image_files is not None:
image_file = image_files[idx]
chunk_metadata["image_path"] = str(image_file)
chunk_metadata["parsed_text_markdown"] = md_text
node = TextNode(
text="",
metadata=chunk_metadata,
)
nodes.append(node)

return nodes

# this will split into pages

text_nodes = get_text_nodes(md_json_list, image_dir="data_images")

print(text_nodes[10].get_content(metadata_mode="all"))

page_num: 11
image_path: data_images/412ac275-abe2-4585-be43-5680e7754740-page-
10.jpg
parsed_text_markdown: # Commitment to Disciplined Reinvestment Rate

Disciplined Reinvestment Rate is the Foundation for Superior Returns

on and of Capital, while Driving Durable CFO Growth

| Metric | Value |
|--------|-------|
| 10-Year Reinvestment Rate | ~50% |
| CFO CAGR 2024-2032 | ~6% |
| Mid-Cycle Planning Price | at $60/BBL WTI |

| Period | Industry Growth Focus | ConocoPhillips Strategy Reset |

*Chart shows ConocoPhillips Average Annual Reinvestment Rate (%) over

time, with historic rates in grey and projected rates in blue.*

Reinvestment rate and cash from operations (CFO) are non-GAAP

measures. Definitions and reconciliations are included in the
Appendix.

import os
from llama_index.core import (
StorageContext,
SummaryIndex,
load_index_from_storage,
)

if not os.path.exists("storage_nodes_summary"):
index = SummaryIndex(text_nodes)
# save index to disk
index.set_index_id("summary_index")
index.storage_context.persist("./storage_nodes_summary")
else:
# rebuild storage context
storage_context =
StorageContext.from_defaults(persist_dir="storage_nodes_summary")
# load index
index = load_index_from_storage(storage_context,
index_id="summary_index")

Build Query Engine

We now use LlamaIndex abstractions to build a structured query engine. In contrast to a
standard RAG query engine which just outputs plain text, here we define a structured output
schema (ReportOutput), and attach it to the LLM. By using this structured LLM, the RAG query
engine will output a structured output.
from llama_index.llms.openai import OpenAI
from pydantic.v1 import BaseModel, Field
from typing import List
from IPython.display import display, Markdown, Image

class TextBlock(BaseModel):
"""Text block."""

text: str = Field(..., description="The text for this block.")

class ImageBlock(BaseModel):
"""Image block."""

file_path: str = Field(..., description="File path to the image.")

class ReportOutput(BaseModel):
"""Data model for a report.

Can contain a mix of text and image blocks. MUST contain at least
one image block.

"""

blocks: List[TextBlock | ImageBlock] = Field(

..., description="A list of text and image blocks."
)

def render(self) -> None:

"""Render as HTML on the page."""
for b in self.blocks:
if isinstance(b, TextBlock):
display(Markdown(b.text))
else:
display(Image(filename=b.file_path))

system_prompt = """\
You are a report generation assistant tasked with producing a well-
formatted context given parsed context.

You will be given context from one or more reports that take the form
of parsed text.

You are responsible for producing a report with interleaving text and
images - in the format of interleaving text and "image" blocks.
Since you cannot directly produce an image, the image block takes in a
file path - you should write in the file path of the image instead.
How do you know which image to generate? Each context chunk will
contain metadata including an image render of the source chunk, given
as a file path.
Include ONLY the images from the chunks that have heavy visual
elements (you can get a hint of this if the parsed text contains a lot
of tables).
You MUST include at least one image block in the output.

You MUST output your response as a tool call in order to adhere to the
required output format. Do NOT give back normal text.

"""

llm = OpenAI(model="gpt-4o", system_prompt=system_prompt)

sllm = llm.as_structured_llm(output_cls=ReportOutput)

query_engine = index.as_query_engine(
similarity_top_k=10,
llm=sllm,
# response_mode="tree_summarize"
response_mode="compact",
)

response = query_engine.query(
"Give me a summary of the financial performance of the
Alaska/International segment vs. the lower 48 segment"
)

response.response.render()

<IPython.core.display.Markdown object>

<IPython.core.display.Markdown object>
response = query_engine.query(
"Give me a summary of whether you think the financial projections
are stable, and if not, what are the potential risk factors. "
"Support your research with sources."
)

response.response.render()
<IPython.core.display.Markdown object>

<IPython.core.display.Markdown object>

<IPython.core.display.Markdown object>
<IPython.core.display.Markdown object>

Llama Parse Docs
No ratings yet
Llama Parse Docs
632 pages
Chatgpt For Data Analytics: Live Online Training
100% (1)
Chatgpt For Data Analytics: Live Online Training
124 pages
Panaversity Certified Agentic and Robotic AI Engineer
No ratings yet
Panaversity Certified Agentic and Robotic AI Engineer
50 pages
Building RAG-based LLM Applications For Production: Blog Detail
No ratings yet
Building RAG-based LLM Applications For Production: Blog Detail
78 pages
Ultimate 2025 Coding Guide
No ratings yet
Ultimate 2025 Coding Guide
5 pages
Llamaparsedocs
No ratings yet
Llamaparsedocs
668 pages
Smart Expense Tracker With AI-powered Budgeting
No ratings yet
Smart Expense Tracker With AI-powered Budgeting
15 pages
Build Scalable RAG-Based LLM Apps
100% (2)
Build Scalable RAG-Based LLM Apps
39 pages
Gen Project
No ratings yet
Gen Project
7 pages
Multimodal - Report - Generation-Copy1 - Jupyter Notebook
No ratings yet
Multimodal - Report - Generation-Copy1 - Jupyter Notebook
19 pages
OCR & Groq: Fast Data Extraction
No ratings yet
OCR & Groq: Fast Data Extraction
17 pages
LlamaIndex Talk (Data + AI Summit 2024)
No ratings yet
LlamaIndex Talk (Data + AI Summit 2024)
58 pages
A Practical Guide To Fast Fine-Tuning Your LLMs With Unsloth - by Dr. Ashish Bamania - Apr, 2025 - AI Advances
No ratings yet
A Practical Guide To Fast Fine-Tuning Your LLMs With Unsloth - by Dr. Ashish Bamania - Apr, 2025 - AI Advances
27 pages
Document Classification With LayoutLMv3
No ratings yet
Document Classification With LayoutLMv3
25 pages
Enhanced Stock Prediction Pipeline With RAG and Fine-Tuned LLM
No ratings yet
Enhanced Stock Prediction Pipeline With RAG and Fine-Tuned LLM
10 pages
Prompt Engg LLMs Tech
No ratings yet
Prompt Engg LLMs Tech
86 pages
Kavya Agarwal Resume
No ratings yet
Kavya Agarwal Resume
1 page
Financial Report Analysis With LlamaParse AutoMode 1735669391
No ratings yet
Financial Report Analysis With LlamaParse AutoMode 1735669391
23 pages
GPT Index Readthedocs Io en Latest
No ratings yet
GPT Index Readthedocs Io en Latest
292 pages
6-Week Project Plan - Advanced NIFTY 50 Stock Prediction System
No ratings yet
6-Week Project Plan - Advanced NIFTY 50 Stock Prediction System
9 pages
Introduction To Batch Processing
No ratings yet
Introduction To Batch Processing
23 pages
Extracting Financial Data from PDFs
No ratings yet
Extracting Financial Data from PDFs
25 pages
Linalg Math Start23 Jupyter Notebook
No ratings yet
Linalg Math Start23 Jupyter Notebook
17 pages
30 Ai Projects - 2025 05 26
No ratings yet
30 Ai Projects - 2025 05 26
28 pages
B.Tech 8 Semester Project 2012: Anshul Goyal (20084027) Shrish Chandra Mishra (20084050) Amit Singh (20084057)
No ratings yet
B.Tech 8 Semester Project 2012: Anshul Goyal (20084027) Shrish Chandra Mishra (20084050) Amit Singh (20084057)
26 pages
Unit I
No ratings yet
Unit I
12 pages
Agentic RAG System With ReAct For Financial Reports
No ratings yet
Agentic RAG System With ReAct For Financial Reports
7 pages
LangChain Release Notes Extraction Workflow Plan
No ratings yet
LangChain Release Notes Extraction Workflow Plan
7 pages
How I Built A Basic RAG For PDF QA in A Few Lines of Python Code - by DR Julija - Medium
No ratings yet
How I Built A Basic RAG For PDF QA in A Few Lines of Python Code - by DR Julija - Medium
8 pages
Building An Multimodal Knowledge Assistant: Jerry Liu September 23, 2024
No ratings yet
Building An Multimodal Knowledge Assistant: Jerry Liu September 23, 2024
40 pages
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
No ratings yet
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
6 pages
LLMS&TRANSFORMERS
No ratings yet
LLMS&TRANSFORMERS
4 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms Addison Wesley Data Amp Analytics Series 1nbsped 0138199191 9780138199197
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms Addison Wesley Data Amp Analytics Series 1nbsped 0138199191 9780138199197
432 pages
Fast API
No ratings yet
Fast API
14 pages
Complete Python Notes
No ratings yet
Complete Python Notes
4 pages
Investlm: A Large Language Model For Investment Using Financial Domain Instruction Tuning
No ratings yet
Investlm: A Large Language Model For Investment Using Financial Domain Instruction Tuning
10 pages
Python Full Deeply Explanation
No ratings yet
Python Full Deeply Explanation
8 pages
DL 9
No ratings yet
DL 9
10 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
KG Neo4j
No ratings yet
KG Neo4j
6 pages
Python Ai ML
No ratings yet
Python Ai ML
11 pages
Examplee
No ratings yet
Examplee
8 pages
Project Ti
No ratings yet
Project Ti
13 pages
Stage 1 - Data Ingestion and Organization
No ratings yet
Stage 1 - Data Ingestion and Organization
9 pages
Ai in Review
No ratings yet
Ai in Review
27 pages
Article 1
No ratings yet
Article 1
6 pages
Context Management
No ratings yet
Context Management
2 pages
2112 The Elephant in The Room
No ratings yet
2112 The Elephant in The Room
31 pages
Lecture 5 k8s
No ratings yet
Lecture 5 k8s
179 pages
GenAI For Managers Brochure
No ratings yet
GenAI For Managers Brochure
4 pages
Extracting Text From PDF Files With Python - A Comprehensive Guide - Modo Leitor
No ratings yet
Extracting Text From PDF Files With Python - A Comprehensive Guide - Modo Leitor
17 pages
Project
No ratings yet
Project
7 pages
Data Science Document Processing & Structuring Project
No ratings yet
Data Science Document Processing & Structuring Project
6 pages
1 10
No ratings yet
1 10
3 pages
Data Science for Finance Students
No ratings yet
Data Science for Finance Students
5 pages
A Step-By-step Tutorial On Machine Learning For Engineers
No ratings yet
A Step-By-step Tutorial On Machine Learning For Engineers
18 pages
AI & RAG for Exam Prep
No ratings yet
AI & RAG for Exam Prep
16 pages
Autogen Company Research Example
No ratings yet
Autogen Company Research Example
8 pages
Python Lists
No ratings yet
Python Lists
6 pages
Auto Summarization
No ratings yet
Auto Summarization
36 pages
LLM For QnA Proposal
No ratings yet
LLM For QnA Proposal
12 pages
Knowledge Graphs and or Vs For LLMs 1717939076
No ratings yet
Knowledge Graphs and or Vs For LLMs 1717939076
20 pages
Introducing Transformers Agents 20
No ratings yet
Introducing Transformers Agents 20
8 pages
How To Analyze A PDF With The Layout-Parser Package. - by Brendan Ferris - Towards Data Science
No ratings yet
How To Analyze A PDF With The Layout-Parser Package. - by Brendan Ferris - Towards Data Science
3 pages
LlamaIndex Talk (W&B Fully Connected 2024)
No ratings yet
LlamaIndex Talk (W&B Fully Connected 2024)
38 pages
Prompt Caching in RAG Workflow For Financial Analysis
No ratings yet
Prompt Caching in RAG Workflow For Financial Analysis
18 pages
2023 Google Cloud Customer Award Winners Ebook
No ratings yet
2023 Google Cloud Customer Award Winners Ebook
104 pages
SSRN 5265751
No ratings yet
SSRN 5265751
52 pages
Gen Ai 7,8,9,10
No ratings yet
Gen Ai 7,8,9,10
7 pages
Nova
No ratings yet
Nova
21 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Data Science
No ratings yet
Data Science
7 pages
MLB Bayesian Hierarchies Dino Pymc Jupyter Notebook
No ratings yet
MLB Bayesian Hierarchies Dino Pymc Jupyter Notebook
14 pages
LLM Security Privacy Survey 2402.00888v2
No ratings yet
LLM Security Privacy Survey 2402.00888v2
51 pages
How To Build A Machine Learning Model - by Chanin Nantasenamat - Towards Data Science
No ratings yet
How To Build A Machine Learning Model - by Chanin Nantasenamat - Towards Data Science
12 pages
Assignment
No ratings yet
Assignment
5 pages
DeepSeek Cheat Sheet by Prompt Security
No ratings yet
DeepSeek Cheat Sheet by Prompt Security
1 page
Four Factors Celtics Start Jupyter Notebook
No ratings yet
Four Factors Celtics Start Jupyter Notebook
13 pages
LLMs in Financial Data
No ratings yet
LLMs in Financial Data
11 pages
Ai Foundation Syllabus
No ratings yet
Ai Foundation Syllabus
30 pages
Web Scraping
No ratings yet
Web Scraping
11 pages
Bayesian Telemetry Abridged Pymc Jupyter Notebook
No ratings yet
Bayesian Telemetry Abridged Pymc Jupyter Notebook
10 pages
Applying LLMs To Threat Intelligence - by Thomas Roccia - Nov, 2023 - SecurityBreak
No ratings yet
Applying LLMs To Threat Intelligence - by Thomas Roccia - Nov, 2023 - SecurityBreak
25 pages
Comprehensive Benchmark Suite For Evaluating Gemma Models
No ratings yet
Comprehensive Benchmark Suite For Evaluating Gemma Models
15 pages
On Hardware Security Bug Code Fixes by Prompting Large Language Models
No ratings yet
On Hardware Security Bug Code Fixes by Prompting Large Language Models
15 pages
BRIGHTER Dataset Paper
No ratings yet
BRIGHTER Dataset Paper
20 pages
One-Shot Image Segmentation with Matcher
No ratings yet
One-Shot Image Segmentation with Matcher
22 pages
Project
No ratings yet
Project
3 pages
RAGBench - Explainable Benchmark For Retrieval-Augmented Generation Systems
No ratings yet
RAGBench - Explainable Benchmark For Retrieval-Augmented Generation Systems
18 pages
Evaluation and Analysis of Large Language Models For Clinical Text Augmentation and Generation
No ratings yet
Evaluation and Analysis of Large Language Models For Clinical Text Augmentation and Generation
10 pages
ChatGPT App for the Visually Impaired
No ratings yet
ChatGPT App for the Visually Impaired
6 pages
DS - Incorporating Sentiment Analysis Into Financial Models
No ratings yet
DS - Incorporating Sentiment Analysis Into Financial Models
9 pages
LLM Ensemble Matches Human Forecasts
No ratings yet
LLM Ensemble Matches Human Forecasts
20 pages
LLM Sast Llift
No ratings yet
LLM Sast Llift
26 pages

Multimodal Report Generation

Uploaded by

Multimodal Report Generation

Uploaded by

Multimodal Report Generation (from a Slide

!pip install -U llama-index-callbacks-arize-phoenix

# setup Arize Phoenix for logging/observability

from llama_index.core import Settings

Use LlamaParse to Parse Text and Images

from llama_parse import LlamaParse

print(f"Parsing slide deck...")

Parsing PDF file...

# Commitment to Disciplined Reinvestment Rate

Disciplined Reinvestment Rate is the Foundation for Superior Returns

| Period | Industry Growth Focus | ConocoPhillips Strategy Reset |

*Chart shows ConocoPhillips Average Annual Reinvestment Rate (%) over

Reinvestment rate and cash from operations (CFO) are non-GAAP

dict_keys(['page', 'md', 'images', 'items'])

Setup and Build Index

Get Text Nodes

# get pages loaded through llamaparse

from copy import deepcopy

# attach image metadata to the text nodes

image_files = _get_sorted_image_files(image_dir) if image_dir is

for idx, md_text in enumerate(md_texts):

# this will split into pages

Disciplined Reinvestment Rate is the Foundation for Superior Returns

| Period | Industry Growth Focus | ConocoPhillips Strategy Reset |

*Chart shows ConocoPhillips Average Annual Reinvestment Rate (%) over

Reinvestment rate and cash from operations (CFO) are non-GAAP

Build Query Engine

text: str = Field(..., description="The text for this block.")

file_path: str = Field(..., description="File path to the image.")

blocks: List[TextBlock | ImageBlock] = Field(

def render(self) -> None:

llm = OpenAI(model="gpt-4o", system_prompt=system_prompt)

You might also like