webist_cep

This repository contains code and resources for evaluating the performance of language models in a Retrieval-Augmented Generation (RAG) setting, specifically focusing on question answering with context and hallucination detection. The project encompasses prompt augmentation, model inference, and comprehensive statistical evaluation.

Project Structure

The repository is structured as follows:

augmented_prompt.py: Generates augmented prompts by enriching questions with relevant context retrieved from a ChromaDB instance. It leverages sentence embeddings for semantic search.
results_4o-mini.py: Executes inference using the OpenAI API with the gpt-4o-mini-2024-07-18 model. It processes augmented prompts and saves the model's responses. Includes rate limiting to avoid API errors.
results_slim_raft.py: Performs inference with a locally hosted TeenyTinyLlama-160m-CEP-ft model, utilizing the Hugging Face transformers library.
evaluate.ipynb: A Jupyter Notebook designed for in-depth evaluation of model outputs. It calculates key metrics such as quality, agreement, accuracy, and hallucination rate. Statistical significance is assessed using Kruskal-Wallis and Dunn's post-hoc tests.
evaluate_simple.ipynb: A streamlined Jupyter Notebook for extracting evaluation scores and justifications directly from model-generated JSON outputs.
Amostra100AvalCEP.txt: Contains the original set of prompts and baseline answers used for prompt augmentation.
augmented_prompt.csv: Stores the augmented prompts generated by augmented_prompt.py, combining the original prompts with retrieved context.
final_evaluation.csv: Stores the final evaluation results, including all calculated metrics.
results/: A directory to store the output CSV files from the model inference scripts.
chroma_db/: (Potentially) A directory containing the persistent ChromaDB database files. This depends on your ChromaDB configuration.

Setup and Installation

Clone the repository:

git clone <repository_url>
cd webist_cep

Create and activate a virtual environment (recommended):

python3 -m venv .venv
source .venv/bin/activate  # Linux/macOS
.venv\Scripts\activate  # Windows

Install dependencies:
```
pip install -r requirements.txt
```
Configure environment variables:

Create a .env file in the project root with the following content:
```
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
```
Replace YOUR_OPENAI_API_KEY with your actual OpenAI API key. Ensure the scripts correctly reference this .env file.
Download the TeenyTinyLlama-160m-CEP-ft model (if using):

If you plan to use results_slim_raft.py, download the model from Hugging Face and update the model_path variable in the script.
```
huggingface-cli download vinidiol/TeenyTinyLlama-160m-CEP-ft --local-dir /path/to/your/model/directory --local-dir-use-symlinks False
```
ChromaDB Setup:
- Ensure ChromaDB is installed and running. The simplest way is to run it in-process.
- The augmented_prompt.py script connects to a ChromaDB collection named "cep". Make sure this collection exists and is populated with relevant data. The data should be text chunks suitable for retrieval given the prompts in Amostra100AvalCEP.txt. The embedding model used to embed the data in ChromaDB must be the same as the one used in augmented_prompt.py (currently sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).

Usage

Generate Augmented Prompts:

Run augmented_prompt.py to create augmented prompts and save them to augmented_prompt.csv.
```
python augmented_prompt.py
```
Run Model Inference:
- gpt-4o-mini: Execute results_4o-mini.py to generate responses using the OpenAI API. This script includes basic rate limiting.
```
python results_4o-mini.py
```
- TeenyTinyLlama-160m-CEP-ft: Run results_slim_raft.py for local inference with the TeenyTinyLlama model.
```
python results_slim_raft.py
```
The generated results will be saved as CSV files in the results/ directory.
Evaluate Model Performance:

Open and run either evaluate.ipynb (for detailed analysis) or evaluate_simple.ipynb (for a simplified view) to evaluate the model outputs.
```
jupyter notebook evaluate.ipynb
```
or
```
jupyter notebook evaluate_simple.ipynb
```

Key Files

augmented_prompt.py: Generates augmented prompts by combining questions with context from ChromaDB.
results_4o-mini.py: Performs inference with OpenAI's gpt-4o-mini-2024-07-18 model.
results_slim_raft.py: Runs inference locally with the TeenyTinyLlama-160m-CEP-ft model.
evaluate.ipynb: Provides a comprehensive evaluation of model performance, including statistical analysis.
evaluate_simple.ipynb: Offers a simplified evaluation approach focused on JSON parsing.
Amostra100AvalCEP.txt: Contains the original prompts and baselines.
augmented_prompt.csv: Stores the generated augmented prompts.
final_evaluation.csv: Stores the final evaluation metrics.
requirements.txt: Lists the Python package dependencies.

Contributing

Contributions are welcome! Please submit pull requests with detailed descriptions of your changes. Consider adding unit tests for new functionality.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

webist_cep

Project Structure

Setup and Installation

Usage

Key Files

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
results		results
.gitignore		.gitignore
Amostra100AvalCEP.txt		Amostra100AvalCEP.txt
README.md		README.md
augmented_prompt.csv		augmented_prompt.csv
augmented_prompt.py		augmented_prompt.py
evaluate.ipynb		evaluate.ipynb
evaluate_simple.ipynb		evaluate_simple.ipynb
final_evaluation.csv		final_evaluation.csv
final_evaluation_simple.csv		final_evaluation_simple.csv
logradouro.csv		logradouro.csv
populate_chromadb.py		populate_chromadb.py
requirements.txt		requirements.txt
results_4o-mini.py		results_4o-mini.py
results_slim_raft.py		results_slim_raft.py
results_ttllama.py		results_ttllama.py

pcbrom/webist_cep

Folders and files

Latest commit

History

Repository files navigation

webist_cep

Project Structure

Setup and Installation

Usage

Key Files

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages