Lookout is an AI-powered screen guardian that detects scams the instant they appear. It runs entirely on-device, using OmniParser to understand what’s on screen and a quantized Phi 3 Mini model to reason about context. No cloud upload, no data sharing, just instant, private protection.
Online scams have become nearly indistinguishable from legitimate system alerts and popups. They are especially harmful to seniors and young users who trust what they see. We wanted to build a tool that could protect people in real time by understanding the screen the same way a human would.
Lookout continuously monitors the screen and uses AI to identify potential scams.
It parses text and layout using OmniParser and sends that structured data to Phi 3 Mini, which determines whether what the user is seeing is legitimate or malicious.
All processing happens fully on-device, keeping user data private while providing real-time alerts.
The backend is built in Flask using MLX for on-device inference on Apple Silicon.
OmniParser performs OCR and layout parsing, producing structured text and positional data.
This data is passed to a quantized Phi 3 Mini model, which runs locally to analyze and classify the visual context.
The frontend is built with Electron and React Native, styled with Tailwind CSS. The Electron app communicates with the Flask server to trigger scans, visualize results, and show detections instantly, all running locally on the user’s machine.
- Follow the instructions from the OmniParser repository to start the
omnitoolserver(disable Florence-2 descriptions). cd backendpython3 -m venv envsource env/bin/activatepip install -r requirements.txtpython src/app.pycd ../frontendnpm installnpm start
Enjoy private security monitoring.
At first, we relied only on basic OCR, which lacked layout context. We later adopted OmniParser for richer parsing, but it was initially slow.
We profiled it, identified bottlenecks, and experimented with different vision backbones before simplifying the pipeline to use YOLO for detection and OCR for text.
We also improved performance by filtering out empty text boxes and redundant regions. The system now runs much faster, though we are still refining it to prevent self-detection when Lookout reads its own warnings.
We are proud that Lookout can process a full screen in seconds while keeping all data on-device.
It performs OCR, layout parsing, and AI reasoning locally, giving users real-time protection without sacrificing privacy.
The seamless integration between the ML pipeline and user interface is something we are especially proud of.
We learned how to optimize multi-model AI pipelines for real-time, on-device performance.
Combining OmniParser, MLX, and quantized language models showed us how effective local inference can be.
We also gained a deeper understanding of the trade-offs between accuracy, latency, and privacy in security applications.
With continuous monitoring already in place, our next steps focus on expanding what Lookout can detect and understand.
- Cross-modal awareness: Combine on-screen data with system context like active apps or network activity for deeper insight.
- Adaptive protection: Learn user habits and recognize when something deviates from normal patterns.
- Multi-language detection: Support scams written in other languages or mixed-language interfaces.
- Lookout Lite: Develop a browser and mobile version with smaller models for lightweight, real-time protection anywhere.
Backend: Python, Flask, MLX, Transformers
Frontend: Electron, React Native, Tailwind CSS
AI and Tools: OmniParser, Phi 3 Mini, YOLO, OCR, Node.js