VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities

VISION (Virtual Scientific Companion) is an AI-driven assistant designed to streamline operations at synchrotron beamlines and scientific facilities through natural language interaction. Built on a modular architecture, VISION is an assembly of Cognitive Blocks (Cogs)—specialized AI components tailored for tasks like transcription, classification, code generation, data analysis, and scientific querying. These cogs operate in predefined workflows, enabling seamless communication between users and complex instrumentation.

Key workflows include natural language-controlled (audio, text, or both) beamline operations, where commands are classified, converted to executable code, and deployed for data acquisition or analysis; custom function addition, where custom functions defined by the user in natural language are dynamically integrated into the system; and a domain-specific chatbot, capable of answering scientific queries with precision.

Demo video (with audio) https://www.youtube.com/watch?v=NiMLmYVKiQA

Figure 1: Overview of the VISION architecture

Figure 2: VISION deployment at NSLS-II 11-BM CMS: GUI was launched at the beamline workstation, with backend processing performed on HAL. LLM-based cog results are displayed to prompt user confirmation, followed by execution on Bluesky or other software.

EnvTrace

To see an implementation of the EnvTrace method (written before the creation of the package), please see backend/tests/test_op_cog.py. This file contains the new testing methodology for our operator cog, using a beamline simulator, allowing us to track PV changes to generate a more accurate coding score.

Dataset and Experiments

The experiments generated by evaluating LLMs on the VISION datasets (backend/tests/datasets/op_cog_dataset.json and backend/tests/datasets/archived/original_op_cog_dataset.json) using EnvTrace are uploaded on https://zenodo.org/records/17526264 (10.5281/zenodo.17526264). These results are analyzed and described in the paper "EnvTrace: Simulation-Based Semantic Evaluation of LLM Code via Execution Trace Alignment - Demonstrated at Synchrotron Beamlines" (pending release).

EnvTrace Package

For the publicly usable package, which is written to be modular, please visit: https://github.com/CFN-softbio/EnvTrace

MLST Paper Code

Looking for the project's public release as it was during the MLST 2025 release? Please switch to the 2025-mlst branch.

Directory Structure

Highlight of the most important files:

📁 VISION/
├── 📁 backend/
│   ├── 📁 src/
│   │   └── 📁 hal_beam_com/
│   │       └── cog_manager.py      (main entry-point for backend)
│   └── 📁 tests/                   (contains testing framework to generate evaluate LLMs on our datasets)
└── 📁 frontend/
    └── 📁 UI/
        └── 📁 program/
            └── executable.py        (main entry-point for front-end)

Getting Started Guide

This guide includes I. Installation, II. Running the System, III. Configuration, and IV. Adding Functionality.

I. Installation

Python Requirements

Both frontend and backend require Python 3.12.7. Different versions might work but are mostly untested.

Frontend Installation

Navigate to the frontend directory:
```
cd frontend
```
Create a virtual environment:
```
python -m venv .venv
```

Activate the virtual environment:

Windows:

.\.venv\Scripts\Activate.ps1  # or .\.venv\Scripts\activate.bat

Linux/MacOS:
```
source .venv/bin/activate
```

Install required packages:
```
pip install -r requirements.txt
```
For beamline key insertion functionality (Linux only):
```
sudo apt-get install xdotool
```

Backend Installation

We have created a Docker container to make this process easier. The container will take longer to set-up, but does not require supervision. It will also be much easier to get the simulator working this way, which is required if you want to press the simulate button in the UI. Please look at backend\README_SIMULATOR.md for simulator information.

If you do not care about running simulations, and just want to try out the tool, the below guide will get you up and running much faster:

Navigate to the backend directory:
```
cd backend
```
Create a virtual environment:
```
python -m venv .venv
```

Activate the virtual environment:

Windows:

.\.venv\Scripts\Activate.ps1  # or .\.venv\Scripts\activate.bat

Linux/MacOS:
```
source .venv/bin/activate
```

Install required packages:
```
pip install -r requirements.txt
```

Set up the Anthropic API Key:

Windows:
```
setx ANTHROPIC_API_KEY "your_secret_key_value"
```
Restart your IDE for changes to take effect.

Linux/MacOS:

echo 'export ANTHROPIC_API_KEY="your_secret_key_value"' >> ~/.bashrc
source ~/.bashrc  # or ~/.zshrc for zsh users

Alternative Model Configuration: In this public repository, Claude's models are used by default. This is so one does not need a GPU to run large open-source models locally. If you prefer not to use Claude's Anthropic model, you can switch the model setup by changing the ACTIVE_CONFIG in ./backend/src/hal_beam_com/utils.py. Though, most models (except GPT-4o) will require having Ollama installed.

II. Running the System

Starting the Frontend

After installation, run the frontend with:

cd frontend
python ./UI/program/executable.py

Starting the Backend

After installation, run the backend with:

cd backend
python ./src/hal_beam_com/cog_manager.py

III. Configuration Options

Connection Between Frontend and Backend

VISION supports three methods for the communication between frontend and backend:

Local Filesystem (Default)
- The simplest option for local testing and development
- This is the default setting in this public repository. No additional configuration required
- Data is transferred through files in a shared filesystem
- If the commands become desynced, you can clear the temp folder to resolve it
- Works with the backend docker container (recommended)
SSH Connection
- Untested with the new MultiAgentQueue update, might require some debugging
- For connecting to a remote backend server
- Requires SSH access to the backend machine
- All communication goes through the server, credentials should be that of the server in both files
- To enable, replace CustomS3.py with SSH<Backend|Frontend>CustomS3.py in both frontend and backend
- Configure in both frontend and backend CustomS3.py files:
```
username = "your_username"
endpoint = "backend_server_address"
password = "your_password"  # Or use SSH keys instead
```
Minio S3
- Untested with the new MultiAgentQueue abstraction, might require some debugging
- For scalable, production deployments
- Requires a Minio S3 server
- To enable, replace CustomS3.py with MinioCustomS3.py in both frontend and backend
- Configure connection details in the MinioCustomS3.py files

Connection Between Frontend and Instruments

Currently, VISION uses keyboard injection for simple deployment; proper integration to the instrument control framework is recommended (work in progress).

Keyboard Injection (xdotool)
- Simulates keyboard inputs to control existing instrument interfaces
- Requires xdotool on Linux (installed with sudo apt-get install xdotool)
- Used by default in the executeCommand method in frontend/UI/program/executable.py

However, integration to the instrument control framework should be utilized, for example,

Direct API Integration
- For instruments with programmable interfaces
- Configure custom command handlers in the backend
- Requires specific knowledge of the instrument's API

IV. Adding/Extending Functionality

Adding Instrument/Command Support

To add support for new instruments or add new functions to the current instrument, you just need to create/modify JSON entries (in /backend/src/hal_beam_com/beamline_prompts /11BM/command_examples.json) that define their respective commands. For each instrument, a corresponding folder and json is needed, e.g. /11BM/command_examples.json, /12ID/command_examples.json.

Define a new function in JSON format:

{
     "class": "Sample Measurement Commands",
     "title": "Snap (Measure Sample without Saving)",
     "function": "sam.snap(exposure_time)",
     "params": [
         {
             "name": "exposure_time",
             "type": "int",
             "description": "seconds"
         }
     ],
     "notes": [
         "This command measures the sample but does not save the data."
     ],
     "usage": [
         {
             "input": "Measure sample for 5 seconds but don't save the data.",
             "code": "sam.snap(5)"
         }
     ],
     "example_inputs": [
         "Measure sample 2 seconds, no save."
     ],
     "default": true,
     "cog": "Op"
}

Only the title and function fields are required. If default is False, the functions will be put under "Miscellaneous Commands". However, using the fields offered allows for more specificity, which in turn can yield higher performance. The above JSON entry is formatted as the following in the prompt:

- **Sample Measurement Commands:**
    - **Snap (Measure Sample without Saving):**
     `sam.snap(exposure_time)`

     - Params:
         - exposure_time: int (seconds)

     - Notes:
         - This command measures the sample but does not save the data.

     - Usage:
         - "Measure sample for 5 seconds but don't save the data."
             - `sam.snap(5)`

     - Example phrases:
         - "Measure sample 2 seconds, no save."

Using the dynamic function creation workflow:
- You can also add functions through the natural language UI. This is for quick-adding functionality during online operation. Note that for deploying VISION to a new instrument, it is recommended to create/modify command_examples.json instead of using the UI.
- Describe the function and its parameters in natural language, and then the system will generate the appropriate JSON and code implementations.
- Cons: sophisticated format not supported, and requires access to a specific model (GPT-4o, used for JSON support)

Contact

Feel free to reach out to us for any questions or installation issues. Suggestions/feedback are also appreciated! Esther Tsai ([email protected])

Citation

@article{Mathur_2025,
    doi = {10.1088/2632-2153/add9e4},
    url = {https://dx.doi.org/10.1088/2632-2153/add9e4},
    year = {2025},
    month = {jun},
    publisher = {IOP Publishing},
    volume = {6},
    number = {2},
    pages = {025051},
    author = {Mathur, Shray and der Vleuten, Noah van and Yager, Kevin G and Tsai, Esther H R},
    title = {VISION: a modular AI assistant for natural human-instrument interaction at scientific user facilities},
    journal = {Machine Learning: Science and Technology},
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
backend		backend
frontend		frontend
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities

EnvTrace

Dataset and Experiments

EnvTrace Package

MLST Paper Code

Directory Structure

Getting Started Guide

I. Installation

Python Requirements

Frontend Installation

Backend Installation

II. Running the System

Starting the Frontend

Starting the Backend

III. Configuration Options

Connection Between Frontend and Backend

Connection Between Frontend and Instruments

IV. Adding/Extending Functionality

Adding Instrument/Command Support

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

CFN-softbio/VISION

Folders and files

Latest commit

History

Repository files navigation

VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities

EnvTrace

Dataset and Experiments

EnvTrace Package

MLST Paper Code

Directory Structure

Getting Started Guide

I. Installation

Python Requirements

Frontend Installation

Backend Installation

II. Running the System

Starting the Frontend

Starting the Backend

III. Configuration Options

Connection Between Frontend and Backend

Connection Between Frontend and Instruments

IV. Adding/Extending Functionality

Adding Instrument/Command Support

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages