🤖 AskUI Vision Agent

Enable AI agents to control your desktop (Windows, MacOS, Linux), mobile (Android, iOS) and HMI devices

📖 Introduction

AskUI Vision Agent is a powerful automation framework that enables you and AI agents to control your desktop, mobile, and HMI devices and automate tasks. With support for multiple AI models, multi-platform compatibility, and enterprise-ready features,

AskUI_VisionAgentsforEnterprise.1.mp4

🎯 Key Features

Support for Windows, Linux, MacOS, Android and iOS device automation (Citrix supported)
Support for single-step UI automation commands (RPA like) as well as agentic intent-based instructions
In-background automation on Windows machines (agent can create a second session; you do not have to watch it take over mouse and keyboard)
Flexible model use (hot swap of models) and infrastructure for reteaching of models (available on-premise)
Secure deployment of agents in enterprise environments

📦 Installation

AskUI Python Package

pip install askui[all]

Requires Python >=3.10

AskUI Agent OS

Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system. It is installed on a Desktop OS but can control also mobile devices and HMI devices connected.

It offers powerful features like

multi-screen support,
support for all major operating systems (incl. Windows, MacOS and Linux),
process visualizations,
real Unicode character typing
and more exciting features like application selection, in background automation and video streaming are to be released soon.

Windows

AMD64

AskUI Installer for AMD64

ARM64

AskUI Installer for ARM64

Linux

⚠️ Warning: Agent OS currently does not work on Wayland. Switch to XOrg to use it.

AMD64

curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run
bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run

ARM64

curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run
bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run

MacOS

⚠️ Warning: Agent OS currently does not work on MacOS with Intel chips (x86_64/amd64 architecture). Switch to a Mac with Apple Silicon (arm64 architecture), e.g., M1, M2, M3, etc.

ARM64

curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run
bash /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run

🚀 Quickstart

🧑 Control your devices

Double click where-ever the cursor is currently at:

from askui import VisionAgent

with VisionAgent() as agent:
    agent.click(button="left", repeat=2)

By default, the agent works within the context of a display that is selected which defaults to the primary display.

Run the script with python <file path>, e.g python test.py to see if it works.

🤖 Let AI agents control your devices

In order to let AI agents control your devices, you need to be able to connect to an AI model (provider). We host some models ourselves and support several other ones, e.g. Anthropic, OpenRouter, Hugging Face, etc. out of the box. If you want to use a model provider or model that is not supported, you can easily plugin your own (see Custom Models).

For this example, we will us AskUI as the model provider to easily get started.

🔐 Sign up with AskUI

Sign up at hub.askui.com to:

Activate your free trial by signing up (no credit card required)
Get your workspace ID and access token

⚙️ Configure environment variables

Linux & MacOS

export ASKUI_WORKSPACE_ID=<your-workspace-id-here>
export ASKUI_TOKEN=<your-token-here>

Windows PowerShell

$env:ASKUI_WORKSPACE_ID="<your-workspace-id-here>"
$env:ASKUI_TOKEN="<your-token-here>"

💻 Example

from askui import VisionAgent

with VisionAgent() as agent:
    # Give complex instructions to the agent (may have problems with virtual displays out of the box, so make sure there is no browser opened on a virtual display that the agent may not see)
    agent.act(
        "Look for a browser on the current device (checking all available displays, "
        "making sure window has focus),"
        " open a new window or tab and navigate to https://docs.askui.com"
        " and click on 'Search...' to open search panel. If the search panel is already "
        "opened, empty the search field so I can start a fresh search."
    )
    agent.type("Introduction")
    # Locates elements by text (you can also use images, natural language descriptions, coordinates, etc. to
    # describe what to click on)
    agent.click(
        "Documentation > Tutorial > Introduction",
    )
    first_paragraph = agent.get(
        "What does the first paragraph of the introduction say?"
    )
    print("\n--------------------------------")
    print("FIRST PARAGRAPH:\n")
    print(first_paragraph)
    print("--------------------------------\n\n")

Run the script with python <file path>, e.g python test.py.

If you see a lot of logs and the first paragraph of the introduction in the console, congratulations! You've successfully let AI agents control your device to automate a task! If you have any issues, please check the documentation or join our Discord for support.

📚 Further Documentation

Aside from our official documentation, we also have some additional guides and examples under the docs folder that you may find useful, for example:

Chat - How to interact with agents through a chat
Direct Tool Use - How to use the tools, e.g., clipboard, the Agent OS etc.
Extracting Data - How to extract data from the screen and documents
MCP - How to use MCP servers to extend the capabilities of an agent
Observability - Logging and reporting
Telemetry - Which data we gather and how to disable it
Using Models - How to use different models including how to register your own custom models

🤝 Contributing

We'd love your help! Contributions, ideas, and feedback are always welcome. A proper contribution guide is coming soon—stay tuned!

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 818 Commits
.cursor		.cursor
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
scripts		scripts
src/askui		src/askui
tests		tests
.cursorrules		.cursorrules
.env.template		.env.template
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE		LICENSE
README.md		README.md
mypy.ini		mypy.ini
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

🤖 AskUI Vision Agent

Table of Contents

📖 Introduction

📦 Installation

AskUI Python Package

AskUI Agent OS

AMD64

ARM64

AMD64

ARM64

ARM64

🚀 Quickstart

🧑 Control your devices

🤖 Let AI agents control your devices

🔐 Sign up with AskUI

⚙️ Configure environment variables

💻 Example

📚 Further Documentation

🤝 Contributing

📜 License

About

Uh oh!

Releases 72

Uh oh!

Contributors 7

Uh oh!

Languages

Uh oh!

License

Uh oh!

askui/vision-agent

Folders and files

Latest commit

History

Repository files navigation

🤖 AskUI Vision Agent

Table of Contents

📖 Introduction

📦 Installation

AskUI Python Package

AskUI Agent OS

AMD64

ARM64

AMD64

ARM64

ARM64

🚀 Quickstart

🧑 Control your devices

🤖 Let AI agents control your devices

🔐 Sign up with AskUI

⚙️ Configure environment variables

💻 Example

📚 Further Documentation

🤝 Contributing

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 72

Uh oh!

Contributors 7

Uh oh!

Languages