GNX CLI is a next-generation AI agent capable of perceiving and manipulating real-world interfaces. Built on a modular architecture, it combines Native Tool Calling (Llama 4 Scout/Groq) for rapid logic with a specialized Vision Agent (Qwen3-VL/Novita) for high-fidelity UI automation on both desktop and mobile. Developed by Gokulbarath.
GNX_CLI_MOBILE_DEMO.mov
This clip shows GNX CLI running a full mobile automation sequence from the latest build.
GNX_CLI_COM_DEMO.mov
- 🧠 Hybrid Intelligence: Fast orchestrator LLM (Llama 4, Gemini, or GLM) plus specialized VLM (Qwen3-VL) for sight.
- 👁️ Autonomous Vision Agent: Sub-agent loop that can see screens, reason about UI, and act (click, swipe, type).
- 🔌 MCP Support: Works with the Model Context Protocol (GitHub, Filesystem, Memory servers).
- 📱 Mobile Automation: Deep ADB integration for taps, swipes, and text input.
- 💻 Desktop Automation: Mouse/keyboard control via PyAutoGUI with visual feedback loops.
- 📁 Modular Tooling: Atomic tools for file ops, web search, system control, and UI automation.
graph TD;
User[User Input] --> Engine[GNX Engine];
Engine -->|Selects Tool| Router{Tool Router};
subgraph "Standard Tools"
Router -->|File Ops| Files[FileSystem / Search];
Router -->|Web| Web[DuckDuckGo / Jina];
Router -->|MCP| MCP[MCP Servers];
end
subgraph "Automation & Vision"
Router -->|Simple| Atomic[Atomic Actions];
Atomic --> Desktop[Desktop Control];
Atomic --> Mobile[Mobile/ADB];
Router -->|Complex UI Tasks| Handoff[activate_vision_agent];
Handoff --> VisionLoop((Vision Agent Loop));
end
Files --> Output[Result];
Web --> Output;
MCP --> Output;
Desktop --> Output;
Mobile --> Output;
VisionLoop --> Output;
Output --> Engine;
Engine --> User;
When activate_vision_agent is called, the system switches to a VLM-driven feedback loop:
graph TD;
Start([Task Received]) --> Capture[Capture High-Res Screenshot];
Capture --> VLM[Qwen3-VL Analysis];
VLM -->|Reasoning + JSON| Decision{Decision};
Decision -->|Action| Executor[Execute Action];
Executor -->|Wait for UI| Capture;
Decision -->|Terminate| Success([Task Complete]);
Decision -->|Error| Fail([Report Failure]);
- Python 3.10+
- Windows
- For mobile: ADB (Android Debug Bridge) and a connected Android device
git clone https://github.com/Gokulbarath/GNX-CLI.git
cd "GNX CLI"
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
# Configure environment
copy .env.example .env Start the CLI:
python main.py- General reasoning & files
- "List all python files in src/tools and tell me what they do."
- Web search
- "Search for the latest features in Python 3.13."
- Vision Agent (mobile)
- Ensure your Android device is connected via ADB.
- "Open Settings, find 'Display', and turn on Dark Mode." (Agent navigates, scrolls, and taps based on visual cues.)
- Vision Agent (desktop)
- "Open Calculator, calculate 55 * 12, and tell me the result."
GNX CLI/
├── main.py # Entry point
├── requirements.txt # Dependencies
├── README.md # This file
├── imgs/ # Assets (demo, architecture, LAMx)
├── src/
│ ├── agents/
│ │ └── vision/ # Vision agent loop & prompts
│ ├── gnx_engine/ # Orchestrator, adapters, prompts
│ ├── mcp/ # Model Context Protocol client
│ ├── tools/
│ │ ├── desktop/ # Mouse/keyboard/screenshot
│ │ ├── mobile/ # ADB/touch/system
│ │ ├── handoff/ # Sub-agent triggers
│ │ ├── file_ops.py # File operations
│ │ ├── filesystem.py # Directory listing
│ │ ├── system.py # System utilities
│ │ ├── search.py # File search
│ │ ├── todos.py # TODO management
│ │ ├── web_search.py # Web search
│ │ └── ui_automation.py # UI automation helpers
│ ├── ui/ # Display utilities
│ ├── utils/ # Logging, token counting
│ └── vision_client/ # VLM API client and types
└── .env.example # Environment template
# GNX CLI Environment Variables
# Groq API Key (primary orchestrator)
GROQ_API_KEY=your_groq_api_key_here
# Google Gemini API Key (fallback/alternative)
GOOGLE_API_KEY=your_google_api_key_here
# HuggingFace Token (for V_action vision model)
HF_TOKEN=your_huggingface_token_here
# ZhipuAI API Key (GLM-4.5 text-only series)
ZHIPUAI_API_KEY=your_zhipuai_api_key_here
# Default provider: glm | groq | gemini
GNX_DEFAULT_PROVIDER=glm
# Optional model overrides
# GROQ_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
# GEMINI_MODEL=gemini-1.5-flash
# GLM_MODEL=glm-4.5
- Web UI dashboard
- Performance optimization and caching
- Personalization
GNX CLI is a rewritten and evolved version of Axolot OS, now optimized as a core component of the LAMx project—an integrated ecosystem for general AI-powered intelligence.
Contributions are welcome! Please open an issue or submit a PR.
MIT License — see the LICENSE file for details.
Built with ❤️ after a lot of 💔