Codestin Search App

A conversation between a user and Agentic
A high level overview of our backend processes.

Inspiration

42.5M Americans live with disability. At the same time, 97.4% of websites fail WCAG (Web Content Accessibility Guidelines). This has resulted in disabled Americans being 3x more likely not to use the Internet.

And navigating the web isn't always straightforward -- especially if you're living with visual or physical impairments. Existing accessibility tools like JAWS or NVDA are useful -- but not intelligent. It's difficult for users to translate their intents into actions.

Agentic flips this paradigm on its head, enabling users to go from intent directly to action using natural language. Beyond text-to-text, text-to-image, and text-to-video, we believe Agentic represents the next step forward in AI: text-to-action.

What it does

Using just natural language, users are able to use Agentic to navigate the web, perform multi-step operations, and interact with webpages. Through a complex pipeline built around HTML (more on that below), raw HTML is transformed into a format AI can more easily interact with, allowing Agentic to truly "see" and "act" with HTML as the source of truth.

How we built it

On the Frontend, Agentic is built with React, Typescript, and Next.js. We leveraged OpenAI's Whisper with Transformers.js and ElevenLabs's API to create an immersive Speech to Text and Text to Speech interface.

On the Backend, we used Node.js, Selenium, and Google AI (Gemini Pro).

Challenges we ran into

Interacting with the large language model proved to be our largest challenge. Gemini would frequently hallucinate, respond to itself, and fail to follow instructions. This was mitigated by intensive prompt engineering the introduction of system checks to ensure proper operation and attempt to automatically correct catch and correct bad output. LLM interaction would have benefitted from more development time, but overall this system helps increase overall output quality.

Additionally, this was, for a majority of our team, our first (‼️) hackathon, so navigating this new environment was an absolute thrill as well.

Also, it got cold at times, so sleeping was not always very fun.

Accomplishments that we're proud of

Finishing a lot of candy.
Learning to like Subway.
Building a minimum viable model that demonstrates the incredible capabilities of Large Action Models.
1300+ mg of caffeine consumed in total

What's next for Agentic

While we're very proud of our work on Agentic, we know there's much more technical work to be done in service of creating a general purpose model that can take on every aspect of the web. But beyond advancements in the model, we plan to implement and maintain a much broader set of accessibility options (dyslexic font support, high contrast color options, easy font resizing, etc.) to further increase access.

Built With

chalk
javascript
kinde
next
prisma
react
selenium
typescript

Submitted to

SB Hacks X
- Winner Grand Prize

Created by

I worked on the Frontend with React, Typescript, and Next.js. I implemented client-side transcription using OpenAI's Whisper model running in the browser with Transformers.js. I also worked with ElevenLab's AI Voices for TTS and WebSockets to communicate with the backend.

Kevin Wu
Software Keveloper
I worked on the backend. I designed the LLM interaction system, browser connection, and WebSocket communication. I also added some WebSocket code to the frontend along with some minor UI changes.

Uno Pasadhika
Professional Oreo enjoyer
I worked on the web socket backend and performed extensive testing for our large action model.

Andrew Wang
andrew wang