RAG pipeline for climate literature using open models (Llama 3), local embeddings (FAISS), and semantic HTML chunking. Built as part of my BA thesis.
-
RAG.ipynb and onlyLLM.ipynb are the two main workflows
-
graphRAG.dot and graphLLM.dot contain the .dot code for both the workflow graphics included in the BA thesis
-
requirements.txt contains all libraries used in this code and can be installed via pip
-
goldstandard_answers.md contains the hand-written reference answers for testing the LLM with and without RAG
-
simplified_example.txt contains an example of what is passed to the LLM in the RAG workflow (retrieved context chunks, pre-set prompt, user prompt)
-
faiss_index contains the generated index
-
html contains all individual chapter files downloaded from #semanticClimate (in XHTML)
-
txt contains the tagged and filtered former xhtml paragraphs
-
and eval contains both the LLM-only as well as the RAG log
This code is licensed for noncommercial use only.
You may use, share, and adapt it freely for research, academic, or personal projects.
Commercial use is strictly prohibited without written permission.
See LICENSE for full terms.