I hacked this project together in a few hours. It's a simple RAG system that uses Weaviate and OpenAI to answer questions about the recently released JFK assassination files.
This project uses the JFK Files dataset originally compiled by Amjad Masad.
- Python 3.8+
- Weaviate Cloud account
- OpenAI API key
-
Clone the repository:
git clone https://github.com/yourusername/jfk-files-rag.git cd jfk-files-rag -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use: venv\Scripts\activate -
Install the dependencies:
pip install -r requirements.txt -
Set up your environment variables:
- Copy
.env.exampleto.env - Fill in your Weaviate and OpenAI credentials:
WEAVIATE_URL="your_weaviate_url" WEAVIATE_API_KEY="your_weaviate_api_key" WEAVIATE_COLLECTION_NAME="JFKFiles" OPENAI_API_KEY="your_openai_api_key" OPENAI_MODEL="gpt-4.5-preview" # or any other OpenAI model. 4.5 was very impressive for this.
- Copy
- Create a Weaviate Cloud account
- Create a new cluster, sandbox is fine
- Get your cluster URL and API key
- Add them to your
.envfile
- Sign up for an OpenAI API account
- Generate an API key
- Add the key to your
.envfile
-
Make sure you've activated your virtual environment and configured your
.envfile -
Run the application:
python main.py -
On first run, the application will:
- Create a Weaviate collection
- Process and chunk the JFK files
- Index the chunks in the vector database
-
After initialization, you can start asking getting your tinfoil hat on with your personal AI conspiracy theorist.
You can adjust the following settings in the .env file:
DELETE_COLLECTION: Set to 1 to recreate the Weaviate collection on each runCHUNK_SIZE_WORDS: Size of text chunks in words (default: 200)OVERLAP_SIZE_WORDS: Overlap between adjacent chunks (default: 50)
This project is licensed under the MIT License - see the LICENSE file for details.