This project provides a framework designed for executing positive red-teaming experiments on large language models. More information about the nature of this project can be found in our paper.
-
Python Version: Ensure you have Python 3.9 installed on your machine.
-
Dependencies: Install the required libraries using the following command:
pip install -r requirements.txt
-
API Key: Ensure the
key.txtfile is located in the root directory of the project. This file should house the OpenAI API key.
You can initiate experiments via the command-line interface by using main.py with appropriate arguments.
-
--name: Specifies the name of the experiment. This creates a folder with results. By default, it uses the current date and time. -
--contexts: Determines the contexts for the experiments. Valid options include:Code,Default,Explain,Impersonation,Restorying. By default, theDefaultcontext is chosen. -
--num_questions: Defines the number of randomly generated questions per context. If only one number is provided, it assigns that number to each context. The default value is 100. -
--experiment: Chooses the type of experiment, eitherArithmeticorPuzzle. The default isArithmetic. -
--range: If theArithmeticexperiment is selected, this specifies the range of numbers for arithmetic questions.
-
Default Call:
python main.py
This initiates the script with default argument values.
-
Custom Call:
python main.py --name first_experiment --contexts Code Explain --num_questions 100 200 --experiment Arithmetic --range 5 100
This example demonstrates the script execution with custom arguments.
-
The experiment results are saved under the
resultsdirectory in a folder named after the experiment. -
The data is stored as JSON files:
- Each experiment directory contains a JSON file for every context type. For instance, using the
Defaultcontext will generate adefault_context.jsonfile. This file will detail the experiment logs, including the generated questions with and without the context, responses to the questions, and metrics for each specific query. - The
results.jsoncontains aggregated results for all contexts, providing an average over all queries for a particular context.
- Each experiment directory contains a JSON file for every context type. For instance, using the
Important: If the puzzle experiment is chosen, the context and no-context keys refer to the queries with and without examples respectively. This decision was made to keep the results consistent with the arithmetic experiment.
main.py: Initiates the experiments.
Directory dedicated to generating various contexts.
context.py: Base class for all contexts.code_context.py: Prompts the LLM to generate code for a given input before responding.explain_context.py: Directs the LLM to elucidate its reasoning for the posed query.impersonation_context.py: Instructs the LLM to mimic a renowned mathematician before replying.restorying_context.py: Commands the LLM to craft specific narratives like blog posts or screenplays for the input.default_context.py: Represents the standard context.
Directory responsible for the experiment's execution.
experiment.py: Base class for all experiments.arithmetic_experiment.py: Manages arithmetic questions, from generation to execution.puzzle_experiment.py: Manages puzzle-related questions and their execution.
Directory with utility functions.
metrics.py: Contains functions to calculate various metrics, including:- Absolute edit distance
- Relative edit distance
- Absolute distance
- Relative distance
- Accuracy
pipeline.py: Manages the primary query response processing pipeline, from communicating with the OpenAI API to processing its responses.puzzles.py: Core logic for puzzle question generation.question_generation.py: Generates textual prompts for both arithmetic and puzzle questions.
To alter the OpenAI language model and its parameters, edit the config.json file. For a deeper dive into the parameters, consult the OpenAI API documentation.
The framework is designed for adaptability and expansion. Here are some ideas to broaden its capabilities:
Create a new class for your experiment that adheres to the Experiment interface. Further, update the Experiment enum class within main.py. Inspect the existing experiments for implementation insights.
Create a new context class that complies with the Context interface. Also, remember to update the Context enum class in main.py.
Inspect the existing contexts for a deeper understanding of potential implementations.