A simple demonstration of using Redis Vector Database for semantic caching of LLM responses, creating a more efficient and cost-effective approach to working with language models.
llm-semantic-cache/
├── .env # Environment variables (API keys)
├── .gitignore # Git ignore file
├── main.py # Main application code
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Before running this application, you need:
- Python 3.8 or higher
- An OpenAI API key
- Docker (for running Redis Stack)
- Clone this repository:
git clone https://github.com/yourusername/llm-semantic-cache.git
cd llm-semantic-cache- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up your environment variables:
Copy the .env.example file to .env and add your OpenAI API key:
cp .env.example .env
# Then edit .env and add your API key- Start Redis Stack with Docker:
docker run -d --name redis -p 6379:6379 -p 8001:8001 redis/redis-stack:latestRun the main script:
python main.pyThis will:
- Try to find a semantically similar cached answer for "What is the capital of France?"
- If not found, query the OpenAI API
- Store the result in the semantic cache
Traditional caching systems rely on exact key matches. Semantic caching instead:
- Converts user prompts to vector embeddings
- Checks if similar questions (by vector distance) were already asked
- Returns cached responses for semantically similar questions
- Only calls the LLM API when truly novel questions are asked
Benefits:
- Reduced API costs
- Lower latency for repeated or similar queries
- Consistent responses for similar questions
You can modify the following parameters in main.py:
distance_threshold: Lower values require closer semantic matchesttl: How long cached entries remain valid (in seconds)- Change the embedding model or LLM model as needed
MIT