0% found this document useful (0 votes)

44 views18 pages

Prompt Caching in RAG Workflow For Financial Analysis

Uploaded by

readersince91

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views18 pages

Prompt Caching in RAG Workflow For Financial Analysis

Uploaded by

readersince91

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

PromptCaching_RAG_Workflow

October 27, 2024

1 Can OpenAI’s new prompt caching feature boost the effective-

ness of RAG applications?
� Prompt caching is automatically applied by OpenAI to detect and reuse identical prompts
previously sent to the API. This allows the system to use cached prompts rather than reprocessing
similar ones from scratch.
� The benefits from using Prompt caching include “reducing latency (up to 80%) and lower costs
(up to 50%) for longer prompts”.
� Constraints: However, for caching to work, prompts must be at least 1024 tokens and should
have static content such as instructions and examples at the beginning, with variable content at
the end for consistent cache hits. This caching remains active for 5 to 10 min of inactivity, and it
can persist up to one hour during off-peak period.
� Cache Hits occurs when the system finds a matching prompt prefix, enabling caching. In
contrast, a Cache Miss happens when no matching prefix is found, requiring the prompt to be
processed from scratch.
When there is no cache hits (either it’s your first call or simply no similarity found) the number of
caching tokens is equal to 0.
You can find this value in the completion response object returned by the API.
� Can prompt caching work in RAG apps? Key Takeaways:
� � Have a look at the end of the notebook � �
� To explore prompt caching in a RAG workflow:
• I analyzed Amazon’s 10-K report using the LlamaIndex framework.
• A simple reader/parser was used to extract data from the financial report.
• Instead of the query engine, built upon the vectore store, I directly accessed the template
prompts generated by LlamaIndex.
• I used this template, placing the retrieved context first and the user query at the end.
• The calls used OpenAI’s GPT-4o-mini.
• I gathered final answers, cached token counts, and calculated total tokens sent in each prompt.

� Discover the whole process : �

Hanane DUPOUY

1
[ ]: !pip install llama-index llama-index-core openai llama_index.embeddings.
↪huggingface -q

[3]: import nest_asyncio

nest_asyncio.apply()

from google.colab import userdata

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
LLAMAPARSE_API_KEY = userdata.get('LLAMACLOUD_API_KEY')

[ ]: !wget "https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/
↪c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf" -O amzn_2023_10k.pdf

2 VectorStore with embedding and SimpleDirectoryReader

[ ]: # from llama_parse import LlamaParse
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import nest_asyncio;
nest_asyncio.apply()

pdf_name = "amzn_2023_10k.pdf"
# use SimpleDirectoryReader to parse our file
documents = SimpleDirectoryReader(input_files=[pdf_name]).load_data()

embed_model = "local:BAAI/bge-small-en-v1.5" #https://huggingface.co/

↪collections/BAAI/bge-66797a74476eb1f085c7446d

vector_index_std = VectorStoreIndex(documents, embed_model = embed_model)

# chunk size 1048 per default, chunk_overlap = 40

2.1 Compute tokens in the retrieved documents

[6]: import tiktoken

[ ]: encoding = tiktoken.encoding_for_model("gpt-4o-mini")
print(encoding)

idx_keys = vector_index_std.storage_context.vector_stores['default'].data.
↪embedding_dict.keys()

for key in idx_keys:

text = vector_index_std.docstore.get_node(key).get_text()
tokens_integer=encoding.encode(text)
print(len(tokens_integer))

2
[ ]: # #To modify the chunking size
# from llama_index.core import Settings
# from llama_index.core.node_parser import SentenceSplitter

# Settings.text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)

2.2 LlamaIndex Prompts Template:

� To understand how the templates are built in llamaIndex, you can try this method (or go directly
to the documentation):
I created a query engine, built upon the vectore store index:
[ ]: from llama_index.llms.openai import OpenAI
llm_gpt4o_mini = OpenAI(model="gpt-4o-mini", api_key = OPENAI_API_KEY)
query_engine_gpt4o_mini = vector_index_std.as_query_engine(similarity_top_k=3,␣
↪llm=llm_gpt4o_mini)

[ ]: # Calling the LLM here isn't necessary, but I’m doing it to verify the␣
↪reliability of the chunking.

query1 = "What was the net income in 2023?"

response = query_engine_gpt4o_mini.query(query1)
print(str(response))

The net income for 2023 is not explicitly provided in the context information.
However, the income (loss) before income taxes for 2023 is reported as $37,557
million, and the provision for income taxes is $7,120 million. To determine the
net income, one would typically subtract the provision for income taxes from the
income before income taxes. Therefore, the net income for 2023 can be calculated
as follows:

Net Income = Income (loss) before income taxes - Provision for income taxes
Net Income = $37,557 million - $7,120 million = $30,437 million.

Thus, the net income for 2023 is approximately $30,437 million.

� From the query engine, you can get the prompts templates used by LlamaIndex for the QA and
the refine answer. We’ll use only the text_QA_template:
[ ]: query_engine_gpt4o_mini.get_prompts().keys()

[ ]: dict_keys(['response_synthesizer:text_qa_template',
'response_synthesizer:refine_template'])

2.2.1 text_qa_template

[ ]: query_engine_gpt4o_mini.get_prompts()['response_synthesizer:text_qa_template']

3
[ ]: SelectorPromptTemplate(metadata={'prompt_type': <PromptType.QUESTION_ANSWER:
'text_qa'>}, template_vars=['context_str', 'query_str'], kwargs={},
output_parser=None, template_var_mappings={}, function_mappings={},
default_template=PromptTemplate(metadata={'prompt_type':
<PromptType.QUESTION_ANSWER: 'text_qa'>}, template_vars=['context_str',
'query_str'], kwargs={}, output_parser=None, template_var_mappings=None,
function_mappings=None, template='Context information is
below.\n---------------------\n{context_str}\n---------------------\nGiven the
context information and not prior knowledge, answer the query.\nQuery:
{query_str}\nAnswer: '), conditionals=[(<function is_chat_model at
0x7c522deb88b0>, ChatPromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM:
'custom'>}, template_vars=['context_str', 'query_str'], kwargs={},
output_parser=None, template_var_mappings=None, function_mappings=None,
message_templates=[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content="You
are an expert Q&A system that is trusted around the world.\nAlways answer the
query using the provided context information, and not prior knowledge.\nSome
rules to follow:\n1. Never directly reference the given context in your
answer.\n2. Avoid statements like 'Based on the context, …' or 'The context
information …' or anything along those lines.", additional_kwargs={}),
ChatMessage(role=<MessageRole.USER: 'user'>, content='Context information is
below.\n---------------------\n{context_str}\n---------------------\nGiven the
context information and not prior knowledge, answer the query.\nQuery:
{query_str}\nAnswer: ', additional_kwargs={})]))])

[ ]: query_engine_gpt4o_mini.get_prompts()['response_synthesizer:text_qa_template'].
↪default_template.template#['default_template']

[ ]: 'Context information is
below.\n---------------------\n{context_str}\n---------------------\nGiven the
context information and not prior knowledge, answer the query.\nQuery:
{query_str}\nAnswer: '

[ ]: prompt_llamaindex = query_engine_gpt4o_mini.get_prompts()['response_synthesizer:
↪text_qa_template'].default_template.template

prompt_llamaindex

2.2.2 refine_template
I’m not using this template, but I’m showing in case you need it for your own project:
[ ]: query_engine_gpt4o_mini.get_prompts()['response_synthesizer:refine_template'].
↪default_template.template#['default_template']

4
[ ]: "The original query is as follows: {query_str}\nWe have provided an existing
answer: {existing_answer}\nWe have the opportunity to refine the existing answer
(only if needed) with some more context
below.\n------------\n{context_msg}\n------------\nGiven the new context, refine
the original answer to better answer the query. If the context isn't useful,
return the original answer.\nRefined Answer: "

2.3 Vector Store as retriver

� Create a retriver from the vector store index, so we can retrive the context related to our query
and use it later in the process:
[ ]: retriever = vector_index_std.as_retriever(similarity_top_k=3)
query_str = "What was the net income in 2023?"
response = retriever.retrieve(query_str)
print(str(response))

[8]: for res in response:

print(res.node.metadata)

{'page_label': '65', 'file_name': 'amzn_2023_10k.pdf', 'file_path':

'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598,
'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}
{'page_label': '28', 'file_name': 'amzn_2023_10k.pdf', 'file_path':
'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598,
'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}
{'page_label': '67', 'file_name': 'amzn_2023_10k.pdf', 'file_path':
'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598,
'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}
Collecting the labled pages:
[ ]: page_labels = []
for res in response:
if res.node.metadata!={}:
print(res.node.metadata['page_label'])
page_labels.append(res.node.metadata['page_label'])

[ ]: page_labels

[ ]: ['65', '28', '67']

Showing the context:

[ ]: context_str=""
for resp in response:
text = resp.node.get_text()
print(text)

5
context_str += text + " \n\n"

2.4 Simple call to the chat completion to see cached token:

2.4.1 First query:

[ ]: query_str = "What was the net income in 2023?"

[ ]: prompt_llamaindex = f"""Context information is below.

---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: """

[ ]: from openai import OpenAI

client = OpenAI(api_key = OPENAI_API_KEY)

completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a financial analyst expert."},
{"role": "user", "content": prompt_llamaindex}
]
)

print(completion.choices[0].message)

ChatCompletionMessage(content='To calculate the net income for the year 2023, we

can start with the income (loss) before income taxes and adjust it by the
provision (benefit) for income taxes.\n\nFrom the provided information:\n-
Income (loss) before income taxes for 2023: $37,557 million\n- Provision
(benefit) for income taxes, net for 2023: $7,120 million\n\nThe formula for net
income is:\n\n\\[ \\text{Net Income} = \\text{Income (loss) before income taxes}
- \\text{Provision (benefit) for income taxes} \\]\n\nSubstituting in the
values:\n\n\\[ \\text{Net Income} = 37,557 - 7,120 \\]\n\nCalculating this
gives:\n\n\\[ \\text{Net Income} = 30,437 \\]\n\nThus, the net income for 2023
was **$30,437 million** or **$30.437 billion**.', refusal=None,
role='assistant', audio=None, function_call=None, tool_calls=None)

[ ]: print(completion.choices[0].message.content)

To calculate the net income for the year 2023, we can start with the income
(loss) before income taxes and adjust it by the provision (benefit) for income
taxes.

From the provided information:

6
- Income (loss) before income taxes for 2023: $37,557 million
- Provision (benefit) for income taxes, net for 2023: $7,120 million

The formula for net income is:

\[ \text{Net Income} = \text{Income (loss) before income taxes} -

\text{Provision (benefit) for income taxes} \]

Substituting in the values:

\[ \text{Net Income} = 37,557 - 7,120 \]

Calculating this gives:

\[ \text{Net Income} = 30,437 \]

Thus, the net income for 2023 was **$30,437 million** or **$30.437 billion**.

[ ]: completion.usage.prompt_tokens_details #==> cahed_tokens = 0 ==> first call ==>␣

↪normal

[ ]: PromptTokensDetails(audio_tokens=None, cached_tokens=0)

2.4.2 Second query:

[ ]: query_str = "What was the revenue in 2023?"

prompt_llamaindex = f"""Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: """

[ ]: completion2 = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a financial analyst expert."},
{"role": "user", "content": prompt_llamaindex}
]
)

print(completion2.choices[0].message.content)

The provided context information does not include details about revenue for the
year 2023. Therefore, I cannot determine the revenue for that year based on the
information given. If additional data on revenue is available, it would be

7
necessary to review that information to provide an answer.

[ ]: completion2.usage.prompt_tokens_details #==> cahed_tokens = 2688 ==> second␣

↪call. So we have used 2688 tokens.

[ ]: PromptTokensDetails(audio_tokens=None, cached_tokens=2688)

[ ]: completion2.usage.prompt_tokens

[ ]: 2851

3 All together: Caching Tokens

[11]: import tiktoken

[14]: MODEL = "gpt-4o-mini"

encoding = tiktoken.encoding_for_model(MODEL)
print(encoding)

from openai import OpenAI

client = OpenAI(api_key = OPENAI_API_KEY)

[15]: def get_retrieved_context(query_str,retriver):

response = retriever.retrieve(query_str)
context_str=""
for resp in response:
text = resp.node.get_text()
context_str += text + " \n\n"

page_labels = []
for res in response:
if res.node.metadata!={}:
# print(res.node.metadata['page_label'])
page_labels.append(res.node.metadata['page_label'])
return context_str, page_labels

def get_template(query_str,context_str):
prompt_llamaindex = f"""Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: """

8
return prompt_llamaindex

def call_gpt_4o(prompt):
completion = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a financial analyst expert."},
{"role": "user", "content": prompt}
]
)

llm_answer = completion.choices[0].message.content
cached_tokens_nbr = completion.usage.prompt_tokens_details.cached_tokens
# prompt_input_nbr_tokens = completion.usage.prompt_tokens
return llm_answer, cached_tokens_nbr

def compute_nb_tokens(text):
tokens_integer=encoding.encode(text)
return len(tokens_integer)

def get_final_answer(query_str,retriever):
context_str, page_labels = get_retrieved_context(query_str,retriever)
prompt_llamaindex = get_template(query_str,context_str)
llm_answer, cached_tokens_nbr = call_gpt_4o(prompt_llamaindex)
prompt_nbr_tokens = compute_nb_tokens(prompt_llamaindex) #You can also use:␣
↪completion.usage.prompt_tokens in call_gpt_4o method

return llm_answer, cached_tokens_nbr, prompt_nbr_tokens, page_labels

With SimpleDirectory Retriver

In the following, I’ll be asking different questions to see if the prompt caching is enabled:

3.1 Query 1:
First call, we’ll see 0 caching tokens:
[16]: queries_list = ["What was the net income in 2023?","What was the revenue in␣
↪2023?", \

"What are the operating income in 2022?", "What are the␣

↪operating expenses in 2021?"]

for query in queries_list :

resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

print(f"response:\n{resp}" +"\n\n")
print(f"nbr_tokens in the prompt = {prompt_nbr_tokens}" +"\n")
print(f"cached_tokens = {cached_tokens}" +"\n")

9
print(f"page_labels = {page_labels}" +"\n")
print("--"*50)

query:
What was the net income in 2023?

response:
To calculate the net income for the year 2023, we need to start with the income
before income taxes and subtract the provision (benefit) for income taxes.

From the provided information for the year ended December 31, 2023:

- Income before income taxes: $37,557 million

- Provision for income taxes: $7,120 million

Now, we can calculate the net income as follows:

Net Income = Income before income taxes - Provision for income taxes
Net Income = $37,557 million - $7,120 million
Net Income = $30,437 million

Therefore, the net income in 2023 was $30,437 million.

nbr_tokens in the prompt = 2840

cached_tokens = 0

page_labels = ['65', '28', '67']

--------------------------------------------------------------------------------
--------------------
query:
What was the revenue in 2023?

response:
The context provided does not explicitly state the total revenue for the year
2023. To determine the revenue, we would typically look for specified financial
results in the company's income statement or performance reports, which is
missing in the provided information. Based on this context alone, I cannot
provide a specific figure for the revenue in 2023.

nbr_tokens in the prompt = 2599

10
cached_tokens = 0

page_labels = ['51', '67', '66']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating income in 2022?

response:
The operating income in 2022 was $12.2 billion.

nbr_tokens in the prompt = 2079

cached_tokens = 0

page_labels = ['25', '26', '28']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating expenses in 2021?

response:
The operating expenses in 2021 were not provided in the context information.
Only the operating expenses for the years 2022 and 2023 were included.
Therefore, based on the information available, we cannot determine the operating
expenses for 2021.

nbr_tokens in the prompt = 2121

cached_tokens = 0

page_labels = ['26', '55', '37']

--------------------------------------------------------------------------------
--------------------

3.2 Query 2
In the second call, caching tokens will appear because by modifying only the years in the queries,
this leads to retrieve the same (almost) context, thus leading to utilize cached tokens:

11
[17]: queries_list = ["What was the net income in 2023?","What was the revenue in␣
↪2023?", \

"What are the operating income in 2022?", "What are the␣

↪operating expenses in 2021?"]

for query in queries_list :

resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

print(f"response:\n{resp}" +"\n\n")
print(f"nbr_tokens in the prompt = {prompt_nbr_tokens}" +"\n")
print(f"cached_tokens = {cached_tokens}" +"\n")
print(f"page_labels = {page_labels}" +"\n")
print("--"*50)

query:
What was the net income in 2023?

response:
To calculate the net income for 2023, we need to consider the income (loss)
before income taxes and the provision for income taxes.

From the data provided:

- Income (loss) before income taxes in 2023: $37,557 million
- Provision (benefit) for income taxes in 2023: $7,120 million

Net income can be calculated as follows:

Net Income = Income (loss) before income taxes - Provision for income taxes
Net Income = $37,557 million - $7,120 million
Net Income = $30,437 million

Therefore, the net income in 2023 was $30,437 million.

nbr_tokens in the prompt = 2840

cached_tokens = 2688

page_labels = ['65', '28', '67']

--------------------------------------------------------------------------------
--------------------
query:
What was the revenue in 2023?

12
response:
The provided context does not explicitly state the total revenue for the year
2023. However, it mentions that $12.4 billion of unearned revenue was recognized
as revenue during the year ended December 31, 2023. To determine the total
revenue for 2023, additional information regarding other revenue streams or
total revenue figures for the year would be needed, which is not included in the
provided context. Therefore, based solely on the information present, I cannot
provide the total revenue for 2023.

nbr_tokens in the prompt = 2599

cached_tokens = 2432

page_labels = ['51', '67', '66']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating income in 2022?

response:
The operating income in 2022 for each segment is as follows (in millions):

- North America: $(2,847)

- International: $(7,746)
- AWS: $22,841

The consolidated operating income for the entire company in 2022 was $12,248
million.

nbr_tokens in the prompt = 2079

cached_tokens = 1920

page_labels = ['25', '26', '28']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating expenses in 2021?

response:
The operating expenses for the year ended December 31, 2021, can be derived from

13
the information provided. However, the specific breakdown of operating expenses
for 2021 is not included in the context you provided.

The operating expenses mentioned for 2022 and 2023 are as follows:

- For 2022, the total operating expenses are $501,735 million.

- For 2023, the total operating expenses are $537,933 million.

To answer your query accurately, we would need the specific operating expenses
for 2021, which are not part of the provided context.

If you have any additional information on the operating expenses for 2021 or
need further assistance, please let me know!

nbr_tokens in the prompt = 2121

cached_tokens = 1920

page_labels = ['26', '55', '37']

--------------------------------------------------------------------------------
--------------------

3.3 Query 3
In this query, I’m asking completely different questions than Query 1 and Query 2: First time the
context is retrieved, thus the caching tokens is 0:
[18]: queries_list = ["What are the total assets in 2022?","What are the current␣
↪liabilities in 2023?"]

for query in queries_list :

resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

query:
What are the total assets in 2022?

response:
The total assets in 2022 are $462,675 million.

14
nbr_tokens in the prompt = 2634

cached_tokens = 0

page_labels = ['70', '40', '23']

--------------------------------------------------------------------------------
--------------------
query:
What are the current liabilities in 2023?

response:
The current liabilities in 2023 are $164,917 million.

nbr_tokens in the prompt = 2359

cached_tokens = 0

page_labels = ['67', '40', '66']

--------------------------------------------------------------------------------
--------------------

3.4 Query 4
Even if I modified only the years, the retrieved context here (page_label) does not follow the same
order than the query before (Query 3), leading to different context, thus to no caching tokens.

[19]: queries_list = ["What are the total assets in 2023?","What are the current␣
↪liabilities in 2022?"]

for query in queries_list :

resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

query:
What are the total assets in 2023?

15
response:
The total assets in 2023 amount to $527,854 million.

nbr_tokens in the prompt = 2159

cached_tokens = 0

page_labels = ['70', '66', '40']

--------------------------------------------------------------------------------
--------------------
query:
What are the current liabilities in 2022?

response:
The current liabilities for the year ended December 31, 2022, were $155,393
million.

nbr_tokens in the prompt = 2320

cached_tokens = 0

page_labels = ['67', '63', '40']

--------------------------------------------------------------------------------
--------------------

[ ]: queries_list = ["What was the net income in 2023?","What was the revenue in␣
↪2023?", \

"What are the operating income in 2022?", "What are the␣

↪operating expenses in 2021?"]

for query in queries_list :

resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

16
query:
What was the net income in 2023?

response:
To determine the net income for the year 2023, we can use the provided income
before income taxes and the provision (benefit) for income taxes.

Given:

- Income before income taxes for 2023: $37,557 million

- Provision for income taxes for 2023: $7,120 million

Net income is calculated as follows:

Net Income = Income before income taxes - Provision for income taxes

Thus:

Net Income = $37,557 million - $7,120 million = $30,437 million

Therefore, the net income in 2023 was $30,437 million.

cached_tokens = 2688

nbr_tokens in the prompt = 2840

--------------------------------------------------------------------------------
--------------------
query:
What was the revenue in 2023?

response:
The context information provided does not specify the total revenue for 2023.
However, it does mention that $12.4 billion of unearned revenue was recognized
as revenue during the year ended December 31, 2023. To obtain the total revenue
for 2023, we would need additional information or financial statements detailing
the overall revenue figure for that year.

cached_tokens = 0

nbr_tokens in the prompt = 2599

--------------------------------------------------------------------------------
--------------------

17
query:
What are the operating income in 2022?

response:
The operating income in 2022 was $12.2 billion.

cached_tokens = 1920

nbr_tokens in the prompt = 2079

--------------------------------------------------------------------------------
--------------------
query:
What are the operating expenses in 2021?

response:
The operating expenses for the year ended December 31, 2021, are not explicitly
listed in the provided information. However, a breakdown of the total operating
expenses and specific categories for the years 2022 and 2023 are provided. To
answer the query, we would need the data for 2021, which is not included in the
context. Therefore, we cannot provide the operating expenses for 2021 based on
the information available.

cached_tokens = 1920

nbr_tokens in the prompt = 2121

--------------------------------------------------------------------------------
--------------------

4 Key Takeaways:
� Can prompt caching work in RAG apps? Key Takeaways:
• It depends on the prompt. If it begins with lengthy, static instructions or examples (e.g.,
few-shot learning), caching can be effective.
• For prompts with brief instructions followed by dynamic retrieved context and user-specific
queries, caching is unlikely, as the context changes per query (for Small RAG apps, not widely
used).
• Caching could work if users repeatedly ask similar questions that pull the same context.
• For shared RAG systems, especially in organizations, caching frequent queries can help reduce
latency and costs.

Berryman
No ratings yet
Berryman
24 pages
LangChain and LlamaIndex Projects Lab Book Hooking Large Language Models Up To The Real World (Mark Watson) (Z-Library)
No ratings yet
LangChain and LlamaIndex Projects Lab Book Hooking Large Language Models Up To The Real World (Mark Watson) (Z-Library)
86 pages
OpenAI Official Prompt Engineering Guide
No ratings yet
OpenAI Official Prompt Engineering Guide
17 pages
Case 01 Harnischfeger Q&A 20100313
100% (1)
Case 01 Harnischfeger Q&A 20100313
13 pages
ESP - Application Form
No ratings yet
ESP - Application Form
9 pages
Qualifying Exam Reviewer 2017 - FAR
No ratings yet
Qualifying Exam Reviewer 2017 - FAR
18 pages
Partnership Final Accounts PDF
75% (4)
Partnership Final Accounts PDF
97 pages
Retail Buying-Session 1&2
No ratings yet
Retail Buying-Session 1&2
96 pages
Financial Accounting Notes
No ratings yet
Financial Accounting Notes
127 pages
Advanced Financial Accounting CH 6 Notes
No ratings yet
Advanced Financial Accounting CH 6 Notes
18 pages
Chemilite Case Study
100% (3)
Chemilite Case Study
12 pages
Introduction To Accounting and Business: Nama: Nisa Aulia Firnanda (DB) Matkul: PA 1 NIM: 151910613067
No ratings yet
Introduction To Accounting and Business: Nama: Nisa Aulia Firnanda (DB) Matkul: PA 1 NIM: 151910613067
4 pages
Chandigarh Project Report ON RELIANCE INDUSTRIES LTD: Ratio and Trend Analysis Submitted By: Mouktar Idriss Liban Mohamed Salinder Atma Singh. SA1
100% (1)
Chandigarh Project Report ON RELIANCE INDUSTRIES LTD: Ratio and Trend Analysis Submitted By: Mouktar Idriss Liban Mohamed Salinder Atma Singh. SA1
73 pages
Ch18 Answer Key
No ratings yet
Ch18 Answer Key
20 pages
CMA Data Analysis - Branch
No ratings yet
CMA Data Analysis - Branch
9 pages
Acct 2121 Fall 2008 Quiz 1 Ver A
No ratings yet
Acct 2121 Fall 2008 Quiz 1 Ver A
8 pages
P1 - Winding Up
70% (10)
P1 - Winding Up
23 pages
Ratio - of - Kohinoor Chemical Company (Bangladesh) Limited
No ratings yet
Ratio - of - Kohinoor Chemical Company (Bangladesh) Limited
4 pages
Personal Financial Planning Guide
No ratings yet
Personal Financial Planning Guide
59 pages
Build Python Chatbots with LangChain
No ratings yet
Build Python Chatbots with LangChain
18 pages
Problem 9-14 (45 Minutes) : Month April May June Quarter
No ratings yet
Problem 9-14 (45 Minutes) : Month April May June Quarter
8 pages
Financial Statements PDF
100% (1)
Financial Statements PDF
14 pages
Langchain Onepager
No ratings yet
Langchain Onepager
1 page
GPT Index Readthedocs Io en Latest
No ratings yet
GPT Index Readthedocs Io en Latest
292 pages
Pre-Feasibility Study: Light Weight Roof Tiles Manufacturing Unit
No ratings yet
Pre-Feasibility Study: Light Weight Roof Tiles Manufacturing Unit
23 pages
Thesis RAG Retrieval Augmented Generation For The IR-Anthology
No ratings yet
Thesis RAG Retrieval Augmented Generation For The IR-Anthology
83 pages
How I Stay Up To Date On The Latest AI Science News - YouTube
No ratings yet
How I Stay Up To Date On The Latest AI Science News - YouTube
2 pages
12 Accountancy English 2020 21
100% (1)
12 Accountancy English 2020 21
464 pages
Advanced Search Techniques Guide
No ratings yet
Advanced Search Techniques Guide
16 pages
Everything I'll Forget About Prompting LLMs
No ratings yet
Everything I'll Forget About Prompting LLMs
36 pages
Understanding York Antwerp Rules
No ratings yet
Understanding York Antwerp Rules
22 pages
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
No ratings yet
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
8 pages
PROMPTS
No ratings yet
PROMPTS
20 pages
A Step-By-step Guide To Building A Chatbot Based On Your Own Documents With GPT - by Guodong (Troy) Zhao - Bootcamp
No ratings yet
A Step-By-step Guide To Building A Chatbot Based On Your Own Documents With GPT - by Guodong (Troy) Zhao - Bootcamp
16 pages
Master Prompt Engineering Like Pro
No ratings yet
Master Prompt Engineering Like Pro
31 pages
LLM Review
No ratings yet
LLM Review
31 pages
Meta Releases Prompt Engineering Guide
No ratings yet
Meta Releases Prompt Engineering Guide
11 pages
Article 1
No ratings yet
Article 1
6 pages
How To Build Your Own Custom ChatGPT Bot With Custom Knowledge Base - Better Programming
No ratings yet
How To Build Your Own Custom ChatGPT Bot With Custom Knowledge Base - Better Programming
8 pages
Grade 12 Manufacturing Learner Book July 2024 - Final
No ratings yet
Grade 12 Manufacturing Learner Book July 2024 - Final
49 pages
LangChain Talk (Aug-Sep'23)
No ratings yet
LangChain Talk (Aug-Sep'23)
47 pages
ASHS Financial Aid Application - 2025-2026
No ratings yet
ASHS Financial Aid Application - 2025-2026
6 pages
Prompt Engineering 2
No ratings yet
Prompt Engineering 2
6 pages
Links
No ratings yet
Links
3 pages
PythonAI LLMs ForSharing
100% (2)
PythonAI LLMs ForSharing
47 pages
Introducing Multimodal Llama 3.2
No ratings yet
Introducing Multimodal Llama 3.2
29 pages
Automate Tasks with AI Agent Teams
No ratings yet
Automate Tasks with AI Agent Teams
8 pages
Chapter 1
No ratings yet
Chapter 1
45 pages
Prompt Caching in The API - OpenAI
No ratings yet
Prompt Caching in The API - OpenAI
6 pages
Slides
No ratings yet
Slides
63 pages
Prompt Engineering Guide
No ratings yet
Prompt Engineering Guide
9 pages
Effective Prompt Engineering For LLMs - A Developer's Guide To Advanced AI Techniques - by Pankaj - Nov, 2024 - Medium
No ratings yet
Effective Prompt Engineering For LLMs - A Developer's Guide To Advanced AI Techniques - by Pankaj - Nov, 2024 - Medium
16 pages
S12 24+Tally+Prime+-+payroll+Steps
No ratings yet
S12 24+Tally+Prime+-+payroll+Steps
3 pages
Maximizing Productivity With Chatgpt
100% (1)
Maximizing Productivity With Chatgpt
130 pages
Lab Experiment 1
No ratings yet
Lab Experiment 1
3 pages
Prompt Engineering Training
No ratings yet
Prompt Engineering Training
7 pages
10 21105 Joss 07489-3
No ratings yet
10 21105 Joss 07489-3
1 page
Zref
No ratings yet
Zref
8 pages
Chatgpt 4o Mini
No ratings yet
Chatgpt 4o Mini
3 pages
Static Prompting: Micro-Course
No ratings yet
Static Prompting: Micro-Course
4 pages
ChatGPT Part 27
No ratings yet
ChatGPT Part 27
2 pages
GPT-4.1 Prompting Guide - OpenAI Cookbook
No ratings yet
GPT-4.1 Prompting Guide - OpenAI Cookbook
30 pages
Course 1 - Chatgpt Prompt Engineering For Developers Guidelines For Prompting Clear and Specific Instructions
No ratings yet
Course 1 - Chatgpt Prompt Engineering For Developers Guidelines For Prompting Clear and Specific Instructions
7 pages
Chatgpt-4 5
No ratings yet
Chatgpt-4 5
4 pages
GPT-4.1 Prompting Guide OpenAI Cookbook
No ratings yet
GPT-4.1 Prompting Guide OpenAI Cookbook
28 pages
365careers - AI - Eng - Bootcamp, Ai, 365careers, Udemy
No ratings yet
365careers - AI - Eng - Bootcamp, Ai, 365careers, Udemy
89 pages
TAXL8412 Seen Exam Brief
No ratings yet
TAXL8412 Seen Exam Brief
16 pages
Basic Accounting Software LAB File SEM 3
No ratings yet
Basic Accounting Software LAB File SEM 3
74 pages
Anas Anwer
No ratings yet
Anas Anwer
2 pages
GPT-4.1 Prompting Guide
No ratings yet
GPT-4.1 Prompting Guide
31 pages
GPT-4.1 Prompting Guide - OpenAI Cookbook en
No ratings yet
GPT-4.1 Prompting Guide - OpenAI Cookbook en
28 pages
Questions Ch21
No ratings yet
Questions Ch21
2 pages
Deductions - SU4
No ratings yet
Deductions - SU4
19 pages
Hope To Skills: Lecture# 02 Irfan Malik, Dr. Sheraz Naseer
No ratings yet
Hope To Skills: Lecture# 02 Irfan Malik, Dr. Sheraz Naseer
38 pages
01 03 Task 1 Use Chatgpt To Create Application Structure - en
No ratings yet
01 03 Task 1 Use Chatgpt To Create Application Structure - en
3 pages
Prompt Engineer by OPENAI
No ratings yet
Prompt Engineer by OPENAI
17 pages
Illustration 1: Corporate Accounting
No ratings yet
Illustration 1: Corporate Accounting
36 pages
GenAI Curriculum
No ratings yet
GenAI Curriculum
64 pages
Mohan 641
No ratings yet
Mohan 641
1 page
Adobe Scan Aug 01, 2025
No ratings yet
Adobe Scan Aug 01, 2025
10 pages
Gpt-5 Prompting Guide
No ratings yet
Gpt-5 Prompting Guide
17 pages
Generative AI Index
No ratings yet
Generative AI Index
9 pages
LLM Module3 4 All QA
No ratings yet
LLM Module3 4 All QA
6 pages
Rag
No ratings yet
Rag
4 pages
AI Prompt Engineering Research
No ratings yet
AI Prompt Engineering Research
4 pages
Prompt Engineering & Ai
No ratings yet
Prompt Engineering & Ai
22 pages
LLMs in Python Free Course by Inder P Singh
No ratings yet
LLMs in Python Free Course by Inder P Singh
28 pages
Source Code Analysis Using Generative AI
No ratings yet
Source Code Analysis Using Generative AI
3 pages
Data Engineer Generative Ai
No ratings yet
Data Engineer Generative Ai
17 pages

Prompt Caching in RAG Workflow For Financial Analysis

Uploaded by

Prompt Caching in RAG Workflow For Financial Analysis

Uploaded by

PromptCaching_RAG_Workflow

October 27, 2024

1 Can OpenAI’s new prompt caching feature boost the effective-

� Discover the whole process : �

[3]: import nest_asyncio

from google.colab import userdata

2 VectorStore with embedding and SimpleDirectoryReader

embed_model = "local:BAAI/bge-small-en-v1.5" #https://huggingface.co/

vector_index_std = VectorStoreIndex(documents, embed_model = embed_model)

2.1 Compute tokens in the retrieved documents

for key in idx_keys:

# Settings.text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)

2.2 LlamaIndex Prompts Template:

query1 = "What was the net income in 2023?"

Thus, the net income for 2023 is approximately $30,437 million.

2.3 Vector Store as retriver

[8]: for res in response:

{'page_label': '65', 'file_name': 'amzn_2023_10k.pdf', 'file_path':

[ ]: ['65', '28', '67']

Showing the context:

2.4 Simple call to the chat completion to see cached token:

[ ]: query_str = "What was the net income in 2023?"

[ ]: prompt_llamaindex = f"""Context information is below.

[ ]: from openai import OpenAI

ChatCompletionMessage(content='To calculate the net income for the year 2023, we

From the provided information:

The formula for net income is:

\[ \text{Net Income} = \text{Income (loss) before income taxes} -

Substituting in the values:

\[ \text{Net Income} = 37,557 - 7,120 \]

Calculating this gives:

\[ \text{Net Income} = 30,437 \]

[ ]: completion.usage.prompt_tokens_details #==> cahed_tokens = 0 ==> first call ==>␣

2.4.2 Second query:

[ ]: query_str = "What was the revenue in 2023?"

[ ]: completion2.usage.prompt_tokens_details #==> cahed_tokens = 2688 ==> second␣

3 All together: Caching Tokens

[14]: MODEL = "gpt-4o-mini"

from openai import OpenAI

[15]: def get_retrieved_context(query_str,retriver):

return llm_answer, cached_tokens_nbr, prompt_nbr_tokens, page_labels

With SimpleDirectory Retriver

"What are the operating income in 2022?", "What are the␣

for query in queries_list :

- Income before income taxes: $37,557 million

Now, we can calculate the net income as follows:

Therefore, the net income in 2023 was **$30,437 million**.

nbr_tokens in the prompt = 2840

page_labels = ['65', '28', '67']

nbr_tokens in the prompt = 2599

page_labels = ['51', '67', '66']

nbr_tokens in the prompt = 2079

page_labels = ['25', '26', '28']

nbr_tokens in the prompt = 2121

page_labels = ['26', '55', '37']

"What are the operating income in 2022?", "What are the␣

for query in queries_list :

From the data provided:

Net income can be calculated as follows:

Therefore, the net income in 2023 was **$30,437 million**.

nbr_tokens in the prompt = 2840

page_labels = ['65', '28', '67']

nbr_tokens in the prompt = 2599

page_labels = ['51', '67', '66']

- North America: $(2,847)

nbr_tokens in the prompt = 2079

page_labels = ['25', '26', '28']

- For 2022, the total operating expenses are $501,735 million.

nbr_tokens in the prompt = 2121

page_labels = ['26', '55', '37']

for query in queries_list :

page_labels = ['70', '40', '23']

nbr_tokens in the prompt = 2359

page_labels = ['67', '40', '66']

for query in queries_list :

nbr_tokens in the prompt = 2159

Therefore, the net income in 2023 was $30,437 million.

Therefore, the net income in 2023 was $30,437 million.