Thanks to visit codestin.com
Credit goes to github.com

Skip to content

pcbrom/webist_cep

Repository files navigation

webist_cep

DOI

This repository contains code and resources for evaluating the performance of language models in a Retrieval-Augmented Generation (RAG) setting, specifically focusing on question answering with context and hallucination detection. The project encompasses prompt augmentation, model inference, and comprehensive statistical evaluation.

Project Structure

The repository is structured as follows:

  • augmented_prompt.py: Generates augmented prompts by enriching questions with relevant context retrieved from a ChromaDB instance. It leverages sentence embeddings for semantic search.
  • results_4o-mini.py: Executes inference using the OpenAI API with the gpt-4o-mini-2024-07-18 model. It processes augmented prompts and saves the model's responses. Includes rate limiting to avoid API errors.
  • results_slim_raft.py: Performs inference with a locally hosted TeenyTinyLlama-160m-CEP-ft model, utilizing the Hugging Face transformers library.
  • evaluate.ipynb: A Jupyter Notebook designed for in-depth evaluation of model outputs. It calculates key metrics such as quality, agreement, accuracy, and hallucination rate. Statistical significance is assessed using Kruskal-Wallis and Dunn's post-hoc tests.
  • evaluate_simple.ipynb: A streamlined Jupyter Notebook for extracting evaluation scores and justifications directly from model-generated JSON outputs.
  • Amostra100AvalCEP.txt: Contains the original set of prompts and baseline answers used for prompt augmentation.
  • augmented_prompt.csv: Stores the augmented prompts generated by augmented_prompt.py, combining the original prompts with retrieved context.
  • final_evaluation.csv: Stores the final evaluation results, including all calculated metrics.
  • results/: A directory to store the output CSV files from the model inference scripts.
  • chroma_db/: (Potentially) A directory containing the persistent ChromaDB database files. This depends on your ChromaDB configuration.

Setup and Installation

  1. Clone the repository:

    git clone <repository_url>
    cd webist_cep
  2. Create and activate a virtual environment (recommended):

    python3 -m venv .venv
    source .venv/bin/activate  # Linux/macOS
    .venv\Scripts\activate  # Windows
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure environment variables:

    Create a .env file in the project root with the following content:

    OPENAI_API_KEY=YOUR_OPENAI_API_KEY
    

    Replace YOUR_OPENAI_API_KEY with your actual OpenAI API key. Ensure the scripts correctly reference this .env file.

  5. Download the TeenyTinyLlama-160m-CEP-ft model (if using):

    If you plan to use results_slim_raft.py, download the model from Hugging Face and update the model_path variable in the script.

    huggingface-cli download vinidiol/TeenyTinyLlama-160m-CEP-ft --local-dir /path/to/your/model/directory --local-dir-use-symlinks False
  6. ChromaDB Setup:

    • Ensure ChromaDB is installed and running. The simplest way is to run it in-process.
    • The augmented_prompt.py script connects to a ChromaDB collection named "cep". Make sure this collection exists and is populated with relevant data. The data should be text chunks suitable for retrieval given the prompts in Amostra100AvalCEP.txt. The embedding model used to embed the data in ChromaDB must be the same as the one used in augmented_prompt.py (currently sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).

Usage

  1. Generate Augmented Prompts:

    Run augmented_prompt.py to create augmented prompts and save them to augmented_prompt.csv.

    python augmented_prompt.py
  2. Run Model Inference:

    • gpt-4o-mini: Execute results_4o-mini.py to generate responses using the OpenAI API. This script includes basic rate limiting.

      python results_4o-mini.py
    • TeenyTinyLlama-160m-CEP-ft: Run results_slim_raft.py for local inference with the TeenyTinyLlama model.

      python results_slim_raft.py

    The generated results will be saved as CSV files in the results/ directory.

  3. Evaluate Model Performance:

    Open and run either evaluate.ipynb (for detailed analysis) or evaluate_simple.ipynb (for a simplified view) to evaluate the model outputs.

    jupyter notebook evaluate.ipynb

    or

    jupyter notebook evaluate_simple.ipynb

Key Files

  • augmented_prompt.py: Generates augmented prompts by combining questions with context from ChromaDB.
  • results_4o-mini.py: Performs inference with OpenAI's gpt-4o-mini-2024-07-18 model.
  • results_slim_raft.py: Runs inference locally with the TeenyTinyLlama-160m-CEP-ft model.
  • evaluate.ipynb: Provides a comprehensive evaluation of model performance, including statistical analysis.
  • evaluate_simple.ipynb: Offers a simplified evaluation approach focused on JSON parsing.
  • Amostra100AvalCEP.txt: Contains the original prompts and baselines.
  • augmented_prompt.csv: Stores the generated augmented prompts.
  • final_evaluation.csv: Stores the final evaluation metrics.
  • requirements.txt: Lists the Python package dependencies.

Contributing

Contributions are welcome! Please submit pull requests with detailed descriptions of your changes. Consider adding unit tests for new functionality.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published