Thanks to visit codestin.com
Credit goes to github.com

Skip to content

PromptOps is a local-first automation framework that uses LLMs to mimic human actions—typing, clicking, reading screens—to execute complex tasks from natural language prompts.

rishabh3562/PromptOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PromptOps

PromptOps is a local-first LLM automation system that mimics how humans operate a computer—via reasoning, vision, and keystrokes. It interprets natural language goals and executes them like a real user would, using keyboard inputs and visual feedback to interact with applications.


🚀 What It Does

PromptOps takes natural language prompts, plans the necessary steps, and simulates human-like actions—typing, scrolling, reading screen content—to execute the task on a desktop autonomously.


🛠️ How We Built It

We used Python for core logic, integrating pyautogui/pynput for UI simulation and Gemini for LLM reasoning. The system includes a planner, a skill execution engine, and a vision layer that parses screen content to guide decisions.


🧗 Challenges We Ran Into

  • Reliable UI control without clicking
  • Parsing dynamic screen content contextually
  • Balancing flexibility with deterministic execution
  • Designing prompt interpretation without rigid skill trees

🏆 Accomplishments That We're Proud Of

  • A modular LLM-agent pipeline with screen-grounded actions
  • Local-first design with no external APIs required
  • Real-time execution based on visible UI context
  • Planner that adapts actions based on outcomes

📚 What We Learned

  • LLMs can simulate goal-directed human behavior when grounded in visual input
  • Skill-based design is brittle early on; prompt-based planning is more flexible
  • Abstracting actions into reusable modules improves maintainability and growth potential

🔮 What’s Next for PromptOps

  • Add support for dynamic skill generation using LLMs
  • Integrate full vision-based UI navigation
  • Build memory and long-term goal management
  • Extend to goal-based software creation from prompts

⚙️ Architecture Overview

  • main.py: Entry point that loads model and initializes all agents and controller
  • PlannerAgent: Converts user prompt into a structured plan (dict of steps)
  • EvaluatorAgent: Validates execution outcomes and identifies failures
  • FixerAgent: Attempts to replan or fix issues if execution fails
  • ClarifierAgent: Requests clarification from the user if the prompt is ambiguous
  • VisionAgent: Takes screenshots and interprets screen state using an LLM vision analyzer
  • Memory: Tracks plan steps, history, and prior context
  • Controller: Central executor coordinating planner, vision, and evaluator to run the task

architecture promptops


✅ Example Use Cases

  • "Search Google for latest tech news"
  • "Write a three-line summary in Notepad"
  • "List all files in Downloads folder via terminal"

All executed via reasoning + keyboard, without direct UI automation or APIs.


🧠 Why It’s Different

  • Doesn’t rely on hardcoded scripts, XPath selectors, or app-specific APIs
  • Doesn’t use robotic mouse control—fully keyboard driven
  • Uses vision as a feedback mechanism to emulate human perception

📦 Tech Stack

  • Python (core logic)
  • OpenAI Vision or Gemini Vision (LLM-based screen reading)
  • PyAutoGUI / Pynput (keyboard control)
  • FastAPI for API hooks (optional)

🔭 Roadmap

  • Full multi-agent loop (planner, executor, evaluator, fixer)
  • File system awareness + context memory
  • Human-like web browsing & data extraction
  • Task persistence + retry logs

📄 License

MIT License


About

PromptOps is a local-first automation framework that uses LLMs to mimic human actions—typing, clicking, reading screens—to execute complex tasks from natural language prompts.

Topics

Resources

Stars

Watchers

Forks

Languages