Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
44 views18 pages

Prompt Caching in RAG Workflow For Financial Analysis

Uploaded by

readersince91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views18 pages

Prompt Caching in RAG Workflow For Financial Analysis

Uploaded by

readersince91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

PromptCaching_RAG_Workflow

October 27, 2024

1 Can OpenAI’s new prompt caching feature boost the effective-


ness of RAG applications?
� Prompt caching is automatically applied by OpenAI to detect and reuse identical prompts
previously sent to the API. This allows the system to use cached prompts rather than reprocessing
similar ones from scratch.
� The benefits from using Prompt caching include “reducing latency (up to 80%) and lower costs
(up to 50%) for longer prompts”.
� Constraints: However, for caching to work, prompts must be at least 1024 tokens and should
have static content such as instructions and examples at the beginning, with variable content at
the end for consistent cache hits. This caching remains active for 5 to 10 min of inactivity, and it
can persist up to one hour during off-peak period.
� Cache Hits occurs when the system finds a matching prompt prefix, enabling caching. In
contrast, a Cache Miss happens when no matching prefix is found, requiring the prompt to be
processed from scratch.
When there is no cache hits (either it’s your first call or simply no similarity found) the number of
caching tokens is equal to 0.
You can find this value in the completion response object returned by the API.
� Can prompt caching work in RAG apps? Key Takeaways:
� � Have a look at the end of the notebook � �
� To explore prompt caching in a RAG workflow:
• I analyzed Amazon’s 10-K report using the LlamaIndex framework.
• A simple reader/parser was used to extract data from the financial report.
• Instead of the query engine, built upon the vectore store, I directly accessed the template
prompts generated by LlamaIndex.
• I used this template, placing the retrieved context first and the user query at the end.
• The calls used OpenAI’s GPT-4o-mini.
• I gathered final answers, cached token counts, and calculated total tokens sent in each prompt.

� Discover the whole process : �


Hanane DUPOUY

1
[ ]: !pip install llama-index llama-index-core openai llama_index.embeddings.
↪huggingface -q

[3]: import nest_asyncio


nest_asyncio.apply()

from google.colab import userdata


OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
LLAMAPARSE_API_KEY = userdata.get('LLAMACLOUD_API_KEY')

[ ]: !wget "https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/
↪c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf" -O amzn_2023_10k.pdf

2 VectorStore with embedding and SimpleDirectoryReader


[ ]: # from llama_parse import LlamaParse
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import nest_asyncio;
nest_asyncio.apply()

pdf_name = "amzn_2023_10k.pdf"
# use SimpleDirectoryReader to parse our file
documents = SimpleDirectoryReader(input_files=[pdf_name]).load_data()

embed_model = "local:BAAI/bge-small-en-v1.5" #https://huggingface.co/


↪collections/BAAI/bge-66797a74476eb1f085c7446d

vector_index_std = VectorStoreIndex(documents, embed_model = embed_model)


# chunk size 1048 per default, chunk_overlap = 40

2.1 Compute tokens in the retrieved documents


[6]: import tiktoken

[ ]: encoding = tiktoken.encoding_for_model("gpt-4o-mini")
print(encoding)

idx_keys = vector_index_std.storage_context.vector_stores['default'].data.
↪embedding_dict.keys()

for key in idx_keys:


text = vector_index_std.docstore.get_node(key).get_text()
tokens_integer=encoding.encode(text)
print(len(tokens_integer))

2
[ ]: # #To modify the chunking size
# from llama_index.core import Settings
# from llama_index.core.node_parser import SentenceSplitter

# Settings.text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)

2.2 LlamaIndex Prompts Template:


� To understand how the templates are built in llamaIndex, you can try this method (or go directly
to the documentation):
I created a query engine, built upon the vectore store index:
[ ]: from llama_index.llms.openai import OpenAI
llm_gpt4o_mini = OpenAI(model="gpt-4o-mini", api_key = OPENAI_API_KEY)
query_engine_gpt4o_mini = vector_index_std.as_query_engine(similarity_top_k=3,␣
↪llm=llm_gpt4o_mini)

[ ]: # Calling the LLM here isn't necessary, but I’m doing it to verify the␣
↪reliability of the chunking.

query1 = "What was the net income in 2023?"


response = query_engine_gpt4o_mini.query(query1)
print(str(response))

The net income for 2023 is not explicitly provided in the context information.
However, the income (loss) before income taxes for 2023 is reported as $37,557
million, and the provision for income taxes is $7,120 million. To determine the
net income, one would typically subtract the provision for income taxes from the
income before income taxes. Therefore, the net income for 2023 can be calculated
as follows:

Net Income = Income (loss) before income taxes - Provision for income taxes
Net Income = $37,557 million - $7,120 million = $30,437 million.

Thus, the net income for 2023 is approximately $30,437 million.


� From the query engine, you can get the prompts templates used by LlamaIndex for the QA and
the refine answer. We’ll use only the text_QA_template:
[ ]: query_engine_gpt4o_mini.get_prompts().keys()

[ ]: dict_keys(['response_synthesizer:text_qa_template',
'response_synthesizer:refine_template'])

2.2.1 text_qa_template

[ ]: query_engine_gpt4o_mini.get_prompts()['response_synthesizer:text_qa_template']

3
[ ]: SelectorPromptTemplate(metadata={'prompt_type': <PromptType.QUESTION_ANSWER:
'text_qa'>}, template_vars=['context_str', 'query_str'], kwargs={},
output_parser=None, template_var_mappings={}, function_mappings={},
default_template=PromptTemplate(metadata={'prompt_type':
<PromptType.QUESTION_ANSWER: 'text_qa'>}, template_vars=['context_str',
'query_str'], kwargs={}, output_parser=None, template_var_mappings=None,
function_mappings=None, template='Context information is
below.\n---------------------\n{context_str}\n---------------------\nGiven the
context information and not prior knowledge, answer the query.\nQuery:
{query_str}\nAnswer: '), conditionals=[(<function is_chat_model at
0x7c522deb88b0>, ChatPromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM:
'custom'>}, template_vars=['context_str', 'query_str'], kwargs={},
output_parser=None, template_var_mappings=None, function_mappings=None,
message_templates=[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content="You
are an expert Q&A system that is trusted around the world.\nAlways answer the
query using the provided context information, and not prior knowledge.\nSome
rules to follow:\n1. Never directly reference the given context in your
answer.\n2. Avoid statements like 'Based on the context, …' or 'The context
information …' or anything along those lines.", additional_kwargs={}),
ChatMessage(role=<MessageRole.USER: 'user'>, content='Context information is
below.\n---------------------\n{context_str}\n---------------------\nGiven the
context information and not prior knowledge, answer the query.\nQuery:
{query_str}\nAnswer: ', additional_kwargs={})]))])

[ ]: query_engine_gpt4o_mini.get_prompts()['response_synthesizer:text_qa_template'].
↪default_template.template#['default_template']

[ ]: 'Context information is
below.\n---------------------\n{context_str}\n---------------------\nGiven the
context information and not prior knowledge, answer the query.\nQuery:
{query_str}\nAnswer: '

[ ]: prompt_llamaindex = query_engine_gpt4o_mini.get_prompts()['response_synthesizer:
↪text_qa_template'].default_template.template

prompt_llamaindex

[ ]: 'Context information is
below.\n---------------------\n{context_str}\n---------------------\nGiven the
context information and not prior knowledge, answer the query.\nQuery:
{query_str}\nAnswer: '

2.2.2 refine_template
I’m not using this template, but I’m showing in case you need it for your own project:
[ ]: query_engine_gpt4o_mini.get_prompts()['response_synthesizer:refine_template'].
↪default_template.template#['default_template']

4
[ ]: "The original query is as follows: {query_str}\nWe have provided an existing
answer: {existing_answer}\nWe have the opportunity to refine the existing answer
(only if needed) with some more context
below.\n------------\n{context_msg}\n------------\nGiven the new context, refine
the original answer to better answer the query. If the context isn't useful,
return the original answer.\nRefined Answer: "

2.3 Vector Store as retriver


� Create a retriver from the vector store index, so we can retrive the context related to our query
and use it later in the process:
[ ]: retriever = vector_index_std.as_retriever(similarity_top_k=3)
query_str = "What was the net income in 2023?"
response = retriever.retrieve(query_str)
print(str(response))

[8]: for res in response:


print(res.node.metadata)

{'page_label': '65', 'file_name': 'amzn_2023_10k.pdf', 'file_path':


'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598,
'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}
{'page_label': '28', 'file_name': 'amzn_2023_10k.pdf', 'file_path':
'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598,
'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}
{'page_label': '67', 'file_name': 'amzn_2023_10k.pdf', 'file_path':
'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598,
'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}
Collecting the labled pages:
[ ]: page_labels = []
for res in response:
if res.node.metadata!={}:
print(res.node.metadata['page_label'])
page_labels.append(res.node.metadata['page_label'])

[ ]: page_labels

[ ]: ['65', '28', '67']

Showing the context:


[ ]: context_str=""
for resp in response:
text = resp.node.get_text()
print(text)

5
context_str += text + " \n\n"

2.4 Simple call to the chat completion to see cached token:


2.4.1 First query:

[ ]: query_str = "What was the net income in 2023?"

[ ]: prompt_llamaindex = f"""Context information is below.


---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: """

[ ]: from openai import OpenAI


client = OpenAI(api_key = OPENAI_API_KEY)

completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a financial analyst expert."},
{"role": "user", "content": prompt_llamaindex}
]
)

print(completion.choices[0].message)

ChatCompletionMessage(content='To calculate the net income for the year 2023, we


can start with the income (loss) before income taxes and adjust it by the
provision (benefit) for income taxes.\n\nFrom the provided information:\n-
Income (loss) before income taxes for 2023: $37,557 million\n- Provision
(benefit) for income taxes, net for 2023: $7,120 million\n\nThe formula for net
income is:\n\n\\[ \\text{Net Income} = \\text{Income (loss) before income taxes}
- \\text{Provision (benefit) for income taxes} \\]\n\nSubstituting in the
values:\n\n\\[ \\text{Net Income} = 37,557 - 7,120 \\]\n\nCalculating this
gives:\n\n\\[ \\text{Net Income} = 30,437 \\]\n\nThus, the net income for 2023
was **$30,437 million** or **$30.437 billion**.', refusal=None,
role='assistant', audio=None, function_call=None, tool_calls=None)

[ ]: print(completion.choices[0].message.content)

To calculate the net income for the year 2023, we can start with the income
(loss) before income taxes and adjust it by the provision (benefit) for income
taxes.

From the provided information:

6
- Income (loss) before income taxes for 2023: $37,557 million
- Provision (benefit) for income taxes, net for 2023: $7,120 million

The formula for net income is:

\[ \text{Net Income} = \text{Income (loss) before income taxes} -


\text{Provision (benefit) for income taxes} \]

Substituting in the values:

\[ \text{Net Income} = 37,557 - 7,120 \]

Calculating this gives:

\[ \text{Net Income} = 30,437 \]

Thus, the net income for 2023 was **$30,437 million** or **$30.437 billion**.

[ ]: completion.usage.prompt_tokens_details #==> cahed_tokens = 0 ==> first call ==>␣


↪normal

[ ]: PromptTokensDetails(audio_tokens=None, cached_tokens=0)

2.4.2 Second query:

[ ]: query_str = "What was the revenue in 2023?"


prompt_llamaindex = f"""Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: """

[ ]: completion2 = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a financial analyst expert."},
{"role": "user", "content": prompt_llamaindex}
]
)

print(completion2.choices[0].message.content)

The provided context information does not include details about revenue for the
year 2023. Therefore, I cannot determine the revenue for that year based on the
information given. If additional data on revenue is available, it would be

7
necessary to review that information to provide an answer.

[ ]: completion2.usage.prompt_tokens_details #==> cahed_tokens = 2688 ==> second␣


↪call. So we have used 2688 tokens.

[ ]: PromptTokensDetails(audio_tokens=None, cached_tokens=2688)

[ ]: completion2.usage.prompt_tokens

[ ]: 2851

3 All together: Caching Tokens


[11]: import tiktoken

[14]: MODEL = "gpt-4o-mini"


encoding = tiktoken.encoding_for_model(MODEL)
print(encoding)

from openai import OpenAI


client = OpenAI(api_key = OPENAI_API_KEY)

<Encoding 'o200k_base'>

[15]: def get_retrieved_context(query_str,retriver):


response = retriever.retrieve(query_str)
context_str=""
for resp in response:
text = resp.node.get_text()
context_str += text + " \n\n"

page_labels = []
for res in response:
if res.node.metadata!={}:
# print(res.node.metadata['page_label'])
page_labels.append(res.node.metadata['page_label'])
return context_str, page_labels

def get_template(query_str,context_str):
prompt_llamaindex = f"""Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: """

8
return prompt_llamaindex

def call_gpt_4o(prompt):
completion = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a financial analyst expert."},
{"role": "user", "content": prompt}
]
)

llm_answer = completion.choices[0].message.content
cached_tokens_nbr = completion.usage.prompt_tokens_details.cached_tokens
# prompt_input_nbr_tokens = completion.usage.prompt_tokens
return llm_answer, cached_tokens_nbr

def compute_nb_tokens(text):
tokens_integer=encoding.encode(text)
return len(tokens_integer)

def get_final_answer(query_str,retriever):
context_str, page_labels = get_retrieved_context(query_str,retriever)
prompt_llamaindex = get_template(query_str,context_str)
llm_answer, cached_tokens_nbr = call_gpt_4o(prompt_llamaindex)
prompt_nbr_tokens = compute_nb_tokens(prompt_llamaindex) #You can also use:␣
↪completion.usage.prompt_tokens in call_gpt_4o method

return llm_answer, cached_tokens_nbr, prompt_nbr_tokens, page_labels

With SimpleDirectory Retriver


In the following, I’ll be asking different questions to see if the prompt caching is enabled:

3.1 Query 1:
First call, we’ll see 0 caching tokens:
[16]: queries_list = ["What was the net income in 2023?","What was the revenue in␣
↪2023?", \

"What are the operating income in 2022?", "What are the␣


↪operating expenses in 2021?"]

for query in queries_list :


resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

print(f"response:\n{resp}" +"\n\n")
print(f"nbr_tokens in the prompt = {prompt_nbr_tokens}" +"\n")
print(f"cached_tokens = {cached_tokens}" +"\n")

9
print(f"page_labels = {page_labels}" +"\n")
print("--"*50)

query:
What was the net income in 2023?

response:
To calculate the net income for the year 2023, we need to start with the income
before income taxes and subtract the provision (benefit) for income taxes.

From the provided information for the year ended December 31, 2023:

- Income before income taxes: $37,557 million


- Provision for income taxes: $7,120 million

Now, we can calculate the net income as follows:

Net Income = Income before income taxes - Provision for income taxes
Net Income = $37,557 million - $7,120 million
Net Income = $30,437 million

Therefore, the net income in 2023 was **$30,437 million**.

nbr_tokens in the prompt = 2840

cached_tokens = 0

page_labels = ['65', '28', '67']

--------------------------------------------------------------------------------
--------------------
query:
What was the revenue in 2023?

response:
The context provided does not explicitly state the total revenue for the year
2023. To determine the revenue, we would typically look for specified financial
results in the company's income statement or performance reports, which is
missing in the provided information. Based on this context alone, I cannot
provide a specific figure for the revenue in 2023.

nbr_tokens in the prompt = 2599

10
cached_tokens = 0

page_labels = ['51', '67', '66']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating income in 2022?

response:
The operating income in 2022 was $12.2 billion.

nbr_tokens in the prompt = 2079

cached_tokens = 0

page_labels = ['25', '26', '28']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating expenses in 2021?

response:
The operating expenses in 2021 were not provided in the context information.
Only the operating expenses for the years 2022 and 2023 were included.
Therefore, based on the information available, we cannot determine the operating
expenses for 2021.

nbr_tokens in the prompt = 2121

cached_tokens = 0

page_labels = ['26', '55', '37']

--------------------------------------------------------------------------------
--------------------

3.2 Query 2
In the second call, caching tokens will appear because by modifying only the years in the queries,
this leads to retrieve the same (almost) context, thus leading to utilize cached tokens:

11
[17]: queries_list = ["What was the net income in 2023?","What was the revenue in␣
↪2023?", \

"What are the operating income in 2022?", "What are the␣


↪operating expenses in 2021?"]

for query in queries_list :


resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

print(f"response:\n{resp}" +"\n\n")
print(f"nbr_tokens in the prompt = {prompt_nbr_tokens}" +"\n")
print(f"cached_tokens = {cached_tokens}" +"\n")
print(f"page_labels = {page_labels}" +"\n")
print("--"*50)

query:
What was the net income in 2023?

response:
To calculate the net income for 2023, we need to consider the income (loss)
before income taxes and the provision for income taxes.

From the data provided:


- Income (loss) before income taxes in 2023: $37,557 million
- Provision (benefit) for income taxes in 2023: $7,120 million

Net income can be calculated as follows:

Net Income = Income (loss) before income taxes - Provision for income taxes
Net Income = $37,557 million - $7,120 million
Net Income = $30,437 million

Therefore, the net income in 2023 was **$30,437 million**.

nbr_tokens in the prompt = 2840

cached_tokens = 2688

page_labels = ['65', '28', '67']

--------------------------------------------------------------------------------
--------------------
query:
What was the revenue in 2023?

12
response:
The provided context does not explicitly state the total revenue for the year
2023. However, it mentions that $12.4 billion of unearned revenue was recognized
as revenue during the year ended December 31, 2023. To determine the total
revenue for 2023, additional information regarding other revenue streams or
total revenue figures for the year would be needed, which is not included in the
provided context. Therefore, based solely on the information present, I cannot
provide the total revenue for 2023.

nbr_tokens in the prompt = 2599

cached_tokens = 2432

page_labels = ['51', '67', '66']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating income in 2022?

response:
The operating income in 2022 for each segment is as follows (in millions):

- North America: $(2,847)


- International: $(7,746)
- AWS: $22,841

The consolidated operating income for the entire company in 2022 was $12,248
million.

nbr_tokens in the prompt = 2079

cached_tokens = 1920

page_labels = ['25', '26', '28']

--------------------------------------------------------------------------------
--------------------
query:
What are the operating expenses in 2021?

response:
The operating expenses for the year ended December 31, 2021, can be derived from

13
the information provided. However, the specific breakdown of operating expenses
for 2021 is not included in the context you provided.

The operating expenses mentioned for 2022 and 2023 are as follows:

- For 2022, the total operating expenses are $501,735 million.


- For 2023, the total operating expenses are $537,933 million.

To answer your query accurately, we would need the specific operating expenses
for 2021, which are not part of the provided context.

If you have any additional information on the operating expenses for 2021 or
need further assistance, please let me know!

nbr_tokens in the prompt = 2121

cached_tokens = 1920

page_labels = ['26', '55', '37']

--------------------------------------------------------------------------------
--------------------

3.3 Query 3
In this query, I’m asking completely different questions than Query 1 and Query 2: First time the
context is retrieved, thus the caching tokens is 0:
[18]: queries_list = ["What are the total assets in 2022?","What are the current␣
↪liabilities in 2023?"]

for query in queries_list :


resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

print(f"response:\n{resp}" +"\n\n")
print(f"nbr_tokens in the prompt = {prompt_nbr_tokens}" +"\n")
print(f"cached_tokens = {cached_tokens}" +"\n")
print(f"page_labels = {page_labels}" +"\n")
print("--"*50)

query:
What are the total assets in 2022?

response:
The total assets in 2022 are $462,675 million.

14
nbr_tokens in the prompt = 2634

cached_tokens = 0

page_labels = ['70', '40', '23']

--------------------------------------------------------------------------------
--------------------
query:
What are the current liabilities in 2023?

response:
The current liabilities in 2023 are $164,917 million.

nbr_tokens in the prompt = 2359

cached_tokens = 0

page_labels = ['67', '40', '66']

--------------------------------------------------------------------------------
--------------------

3.4 Query 4
Even if I modified only the years, the retrieved context here (page_label) does not follow the same
order than the query before (Query 3), leading to different context, thus to no caching tokens.

[19]: queries_list = ["What are the total assets in 2023?","What are the current␣
↪liabilities in 2022?"]

for query in queries_list :


resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

print(f"response:\n{resp}" +"\n\n")
print(f"nbr_tokens in the prompt = {prompt_nbr_tokens}" +"\n")
print(f"cached_tokens = {cached_tokens}" +"\n")
print(f"page_labels = {page_labels}" +"\n")
print("--"*50)

query:
What are the total assets in 2023?

15
response:
The total assets in 2023 amount to $527,854 million.

nbr_tokens in the prompt = 2159

cached_tokens = 0

page_labels = ['70', '66', '40']

--------------------------------------------------------------------------------
--------------------
query:
What are the current liabilities in 2022?

response:
The current liabilities for the year ended December 31, 2022, were $155,393
million.

nbr_tokens in the prompt = 2320

cached_tokens = 0

page_labels = ['67', '63', '40']

--------------------------------------------------------------------------------
--------------------

[ ]: queries_list = ["What was the net income in 2023?","What was the revenue in␣
↪2023?", \

"What are the operating income in 2022?", "What are the␣


↪operating expenses in 2021?"]

for query in queries_list :


resp, cached_tokens, prompt_nbr_tokens, page_labels =␣
↪get_final_answer(query,retriever)

print(f"query:\n{query}"+"\n\n")

print(f"response:\n{resp}" +"\n\n")
print(f"nbr_tokens in the prompt = {prompt_nbr_tokens}" +"\n")
print(f"cached_tokens = {cached_tokens}" +"\n")
print(f"page_labels = {page_labels}" +"\n")
print("--"*50)

16
query:
What was the net income in 2023?

response:
To determine the net income for the year 2023, we can use the provided income
before income taxes and the provision (benefit) for income taxes.

Given:

- Income before income taxes for 2023: $37,557 million


- Provision for income taxes for 2023: $7,120 million

Net income is calculated as follows:

Net Income = Income before income taxes - Provision for income taxes

Thus:

Net Income = $37,557 million - $7,120 million = $30,437 million

Therefore, the net income in 2023 was $30,437 million.

cached_tokens = 2688

nbr_tokens in the prompt = 2840

--------------------------------------------------------------------------------
--------------------
query:
What was the revenue in 2023?

response:
The context information provided does not specify the total revenue for 2023.
However, it does mention that $12.4 billion of unearned revenue was recognized
as revenue during the year ended December 31, 2023. To obtain the total revenue
for 2023, we would need additional information or financial statements detailing
the overall revenue figure for that year.

cached_tokens = 0

nbr_tokens in the prompt = 2599

--------------------------------------------------------------------------------
--------------------

17
query:
What are the operating income in 2022?

response:
The operating income in 2022 was $12.2 billion.

cached_tokens = 1920

nbr_tokens in the prompt = 2079

--------------------------------------------------------------------------------
--------------------
query:
What are the operating expenses in 2021?

response:
The operating expenses for the year ended December 31, 2021, are not explicitly
listed in the provided information. However, a breakdown of the total operating
expenses and specific categories for the years 2022 and 2023 are provided. To
answer the query, we would need the data for 2021, which is not included in the
context. Therefore, we cannot provide the operating expenses for 2021 based on
the information available.

cached_tokens = 1920

nbr_tokens in the prompt = 2121

--------------------------------------------------------------------------------
--------------------

4 Key Takeaways:
� Can prompt caching work in RAG apps? Key Takeaways:
• It depends on the prompt. If it begins with lengthy, static instructions or examples (e.g.,
few-shot learning), caching can be effective.
• For prompts with brief instructions followed by dynamic retrieved context and user-specific
queries, caching is unlikely, as the context changes per query (for Small RAG apps, not widely
used).
• Caching could work if users repeatedly ask similar questions that pull the same context.
• For shared RAG systems, especially in organizations, caching frequent queries can help reduce
latency and costs.

18

You might also like