An AI-powered tool for retrieving and interpreting CDISC controlled terminology codelists through natural language queries.
The CDISC AI Assistant combines the power of a CDISC controlled terminology retrieval tool with advanced AI capabilities to:
- Retrieve standardized CDISC codelists across multiple standards (SDTM, ADaM, CDASH, SEND)
- Interpret natural language queries about codelists
- Provide enhanced responses about codelist extensibility and valid term membership
- Present results through both a command-line interface and a web UI
This tool is designed to simplify access to CDISC controlled terminology for clinical data professionals, allowing them to quickly find and understand standardized terms.
- Natural Language Query Processing: Ask questions in plain English about CDISC codelists
- Multi-Standard Support: Access terminology from SDTM, ADaM, CDASH, and SEND
- Advanced AI Interpretation: Get specific answers about:
- Codelist extensibility (Can I add my own terms?)
- Term membership (Is X a valid term in this codelist?)
- Comprehensive codelist information
- Dual Interface: Use via command-line or web browser
- OpenRouter Integration: Leverages the Mistral AI model for natural language understanding
- Python 3.6 or higher
- An OpenRouter API key (for AI capabilities)
git clone https://github.com/yourusername/cdisc-ai-assistant.git
cd cdisc-ai-assistantpip install -r requirements.txtCreate a .env file in the project root directory with your OpenRouter API key:
OPENROUTER_API_KEY=your_openrouter_api_key_here
You can obtain an API key by signing up at OpenRouter.
The command-line interface allows you to quickly query CDISC codelists:
python cdisc_ai_assistant.py --query "Is DECADE a valid term in the AGEU codelist?"You can also run in interactive mode:
python cdisc_ai_assistant.pyTo start the web interface:
python simple_web.pyThen open a browser and navigate to:
http://localhost:5001
Here are some examples of questions you can ask:
- "Show me the SEX codelist"
- "What are the valid terms in the AGEU codelist?"
- "Is the RACE codelist extensible?"
- "Is OTHER a valid term in the SEX codelist?"
- "Get DTYPE from ADaM standard"
The CDISC AI Assistant consists of three main components:
-
CDISC Codelist Retrieval Tool (
cdisc_codelist.py): A Python implementation of a tool for retrieving CDISC controlled terminology, similar to the SAS macro GetCDISCCodelist -
AI Assistant (
cdisc_ai_assistant.py): Uses OpenRouter's Mistral AI model to:- Extract parameters from natural language queries
- Run the codelist retrieval tool
- Analyze the results to provide enhanced responses
-
Web Interface (
simple_web.py): A Flask web application that provides a user-friendly chat interface
The AI assistant will provide:
- A direct answer to specific questions (e.g., "Yes, 'OTHER' is a valid term in the RACE codelist")
- Additional context for invalid terms:
- For extensible codelists: "This term is not valid, but the codelist is extensible so you could use it with proper documentation"
- For non-extensible codelists: "This term is not valid, and the codelist is not extensible. You must use one of these values: X, Y, Z"
- The complete codelist output for reference
cdisc-ai-assistant/
├── cdisc_codelist.py # Core CDISC codelist retrieval tool
├── cdisc_ai_assistant.py # AI-enhanced interface to the tool
├── simple_web.py # Web UI implementation
├── templates/ # HTML templates for web UI
│ └── index.html # Main chat interface
├── .env # Environment variables (API key)
├── requirements.txt # Python dependencies
└── README.md # This documentation
A: Yes, the AI capabilities require an OpenRouter API key. The base codelist retrieval functionality will work without it, but you'll lose the natural language processing features.
A: The tool supports SDTM, ADaM, CDASH, and SEND standards.
A: Update the DEFAULT_VERSIONS dictionary in cdisc_codelist.py with the new version dates.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- CDISC for providing standardized controlled terminology for clinical research
- OpenRouter for API access to powerful language models
Created with ❤️ for the clinical data standards community