Edit: Won Best Social Good Hack at VTHacks 2024!
This project demonstrates two chatbots: a Jailbroken chatbot and a Secure chatbot. The Jailbroken chatbot processes adversarial queries, while the Secure chatbot sanitizes inputs using novel approaches to ensure security. Both chatbots interact with a backend API, with the Jailbroken chatbot generating an adversarial string and responding, and the Secure chatbot responding after sanitization.
-
Jailbroken Chatbot:
- Accepts user input and generates an adversarial string.
- Processes the adversarial query using a jailbroken LLM model.
- Provides a response to the adversarial query.
-
Secure Chatbot:
- Removes adversarial strings from user input.
- Processes the cleaned query using a secure LLM model.
- Provides a response based on the cleaned query.
-
Jailbroken Chatbot:
- User submits a potentially malicious query.
- The system generates an adversarial string.
- The adversarial string is processed through the Jailbroken LLM.
- The chatbot returns a response.
-
Secure Chatbot:
- After the Jailbroken chatbot finishes, the Secure chatbot sanitizes the adversarial string.
- The sanitized input is processed through the Secure LLM.
- The chatbot returns a safe response.
- Clone the repository:
git clone https://github.com/your-repo/secureai cd secureai - Install dependencies:
npm install
- Start the development server:
npm start
- Open the app in your browser:
- The app should now be available at:
http://localhost:3000.
- The app should now be available at:
- Navigate to the backend folder:
cd backend - Install required Python packages:
pip install -r requirements.txt
- Start the Flask backend:
flask run
- The backend server will run on
http://127.0.0.1:5000by default.
- Method: POST
- Description: Generates an adversarial string based on user input.
- Request Payload:
{ "prompt": "<user input>" } - Response:
{ "adversarialString": "<generated adversarial string>" }
- Method: POST
- Description: Runs the jailbroken LLM model with the adversarial string.
- Request Payload:
{ "adversarialString": "<generated adversarial string>" } - Response:
{ "jailbreakResponse": "<response from jailbroken model>" }
- Method: POST
- Description: Sanitizes the adversarial string to remove malicious content.
- Request Payload:
{ "inputText": "<adversarial string>" } - Response:
{ "sanitizedString": "<cleaned input>" }
- Method: POST
- Description: Runs the secure LLM model with the sanitized input.
- Request Payload:
{ "sanitizedString": "<cleaned input>" } - Response:
{ "safeResponse": "<response from secure model>" }
secureai/
├── src/
│ ├── components/
│ │ ├── Demo.js # Frontend logic for demo chatbots
│ │ └── other components...
│ ├── index.js # Main entry point for the React app
│ └── App.js # Main App component
├── backend/
│ ├── app.py # Flask backend serving the API
│ ├── pipeline.py # Functions handling adversarial string and LLM processing
│ └── requirements.txt # Python dependencies for the backend
└── README.md # Project documentation
-
Frontend:
- A user submits a query in the input box on the demo page.
- This input is sent to the Flask backend using Axios.
- The Jailbroken chatbot displays the adversarial string and response.
- After the Jailbroken chatbot finishes, the Secure chatbot sanitizes the input and generates a response.
-
Backend:
- Flask routes handle requests from the frontend.
- The
/api/generate-adversarialendpoint generates adversarial strings. - The
/api/run-jailbroken-modelendpoint runs the LLM using the adversarial string. - The
/api/sanitizeendpoint cleans the adversarial string. - The
/api/run-secure-modelendpoint runs the cleaned input through a secure LLM.
For backend tests, ensure that Flask is installed and run:
pytest- Integration with a real LLM model for processing adversarial queries.
- Enhancing the sanitization logic with additional security measures.
- Improve the UI for better visualization and performance.
- This project is licensed under the MIT License - see the LICENSE file for details.