PromptOps

PromptOps is a local-first LLM automation system that mimics how humans operate a computer—via reasoning, vision, and keystrokes. It interprets natural language goals and executes them like a real user would, using keyboard inputs and visual feedback to interact with applications.

🚀 What It Does

PromptOps takes natural language prompts, plans the necessary steps, and simulates human-like actions—typing, scrolling, reading screen content—to execute the task on a desktop autonomously.

🛠️ How We Built It

We used Python for core logic, integrating pyautogui/pynput for UI simulation and Gemini for LLM reasoning. The system includes a planner, a skill execution engine, and a vision layer that parses screen content to guide decisions.

🧗 Challenges We Ran Into

Reliable UI control without clicking
Parsing dynamic screen content contextually
Balancing flexibility with deterministic execution
Designing prompt interpretation without rigid skill trees

🏆 Accomplishments That We're Proud Of

A modular LLM-agent pipeline with screen-grounded actions
Local-first design with no external APIs required
Real-time execution based on visible UI context
Planner that adapts actions based on outcomes

📚 What We Learned

LLMs can simulate goal-directed human behavior when grounded in visual input
Skill-based design is brittle early on; prompt-based planning is more flexible
Abstracting actions into reusable modules improves maintainability and growth potential

🔮 What’s Next for PromptOps

Add support for dynamic skill generation using LLMs
Integrate full vision-based UI navigation
Build memory and long-term goal management
Extend to goal-based software creation from prompts

⚙️ Architecture Overview

main.py: Entry point that loads model and initializes all agents and controller
PlannerAgent: Converts user prompt into a structured plan (dict of steps)
EvaluatorAgent: Validates execution outcomes and identifies failures
FixerAgent: Attempts to replan or fix issues if execution fails
ClarifierAgent: Requests clarification from the user if the prompt is ambiguous
VisionAgent: Takes screenshots and interprets screen state using an LLM vision analyzer
Memory: Tracks plan steps, history, and prior context
Controller: Central executor coordinating planner, vision, and evaluator to run the task

✅ Example Use Cases

"Search Google for latest tech news"
"Write a three-line summary in Notepad"
"List all files in Downloads folder via terminal"

All executed via reasoning + keyboard, without direct UI automation or APIs.

🧠 Why It’s Different

Doesn’t rely on hardcoded scripts, XPath selectors, or app-specific APIs
Doesn’t use robotic mouse control—fully keyboard driven
Uses vision as a feedback mechanism to emulate human perception

📦 Tech Stack

Python (core logic)
OpenAI Vision or Gemini Vision (LLM-based screen reading)
PyAutoGUI / Pynput (keyboard control)
FastAPI for API hooks (optional)

🔭 Roadmap

Full multi-agent loop (planner, executor, evaluator, fixer)
File system awareness + context memory
Human-like web browsing & data extraction
Task persistence + retry logs

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
actions		actions
agents		agents
core		core
interfaces		interfaces
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

PromptOps

🚀 What It Does

🛠️ How We Built It

🧗 Challenges We Ran Into

🏆 Accomplishments That We're Proud Of

📚 What We Learned

🔮 What’s Next for PromptOps

⚙️ Architecture Overview

✅ Example Use Cases

🧠 Why It’s Different

📦 Tech Stack

🔭 Roadmap

📄 License

About

Uh oh!

Languages

Uh oh!

Uh oh!

rishabh3562/PromptOps

Folders and files

Latest commit

History

Repository files navigation

PromptOps

🚀 What It Does

🛠️ How We Built It

🧗 Challenges We Ran Into

🏆 Accomplishments That We're Proud Of

📚 What We Learned

🔮 What’s Next for PromptOps

⚙️ Architecture Overview

✅ Example Use Cases

🧠 Why It’s Different

📦 Tech Stack

🔭 Roadmap

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages