Hi, I’m Christoph. For over 20 years I’ve been building open source tools that shape how millions of developers work, including projects I created like Jest, Metro, Yarn, jscodeshift and MooTools. I previously led JavaScript Infrastructure and managed the React Native team in Menlo Park and London.
Today, at Nakazawa Tech in Tokyo, I’m building new developer tooling and indie games in the open, including:
I write about Frontend Development, Engineering Management, Leadership, User Experience, and more.
2025 will be looked back on as the most transformative time in software engineering. Previously, LLMs could build simple toy apps but weren’t good enough to build anything substantial. I’m convinced that has now changed, and I’m sharing my thoughts and pro tips.1
People who doubt the value of AI are usually quick to point out that people building with AI haven’t shipped anything. Let’s get that out of the way first. Here is what I worked on in the past 12 months, sorted by recency:
People who know me will say I’ve always worked on multiple projects concurrently. With AI, I’m working on more projects concurrently. I worked fewer hours in 2025 compared to 2024, but the hours I worked were more intense.
I primarily use ChatGPT or Codex on the web. I used Claude Code and tried some open models, but I never vibed with any of them. For Claude Code specifically, I just cannot handle the way it communicates without rolling my eyes.2
My AI usage increased significantly in September. I was working on an app that used tRPC and a popular React data library that led to bugs, boilerplate, and other problematic patterns.5 I was frustrated and driven to build something new, but didn’t know much about tRPC or whether it was possible to build what I wanted: a data framework like Relay that works with tRPC. I started building fate6 with this exact prompt:
I have recently started working with trpc. I’m a heavy graphql and relay user as I was part of the team that created those technologies. However, many people use trpc nowadays. The problem with trpc is that they gave up on their own query client and all the other options suck imho. The genius of Relay is its normalized cache and query fragments that allow you to specify what data each component needs to render, and then with the Relay compiler it “hoists” everything up to the root query so only one graphql query is necessary at a time.
I would like to figure out what a trpc client inspired by Relay would look like. It would need normalized ids (probably ${objectType}-${object.id}) for a normalized cache, and some way to define fragments for JavaScript components. Can you research if this is possible, what syntax/dsl we could use for fragments, and whether a system like this is possible and lay out how to implement it?
Prior to GPT-5 I concluded that LLMs weren’t good enough for serious coding, and I actually almost gave up early while building fate! I pushed through, and open sourced the library after 205 Codex sessions, with more than half of them generating 4 solutions for each problem.
Some folks ask if I’m slower or faster with AI. That’s the wrong question to ask. Without LLMs, I wouldn’t even have started a side project like this!
My goal was to use Codex to build a Proof of Concept as quickly as possible to validate the idea, which was great, and then went terribly wrong. Here is a brief log of what happened early on and how it felt:
About 100 of the Codex sessions produced 4 solutions. Sometimes it was challenging to review and identify the best out of four solutions, but it almost always resulted in a better outcome.
Generating four solutions to a problem usually leads to key insights more quickly. For example, on hard problems one or two sessions would regularly give up, one would quit 10 minutes in, and another would go on for 40 minutes and come up with a great solution. Imagine asking for only one, you get one of the first three answers, and you give up completely.
Similarly, when two or three of the solutions look almost identical and elegant, that probably means they are the ideal solutions. And sometimes it’s just great to see different approaches to solving the same problem. This workflow actually made it hard for me to work with LLMs that don’t produce multiple solutions: How do I know we aren’t just stuck at a local maxima?
Once Codex and I were done with the code, examples and app template, I built the website and wrote the docs by hand, without LLMs. For building the docs site, I wanted to try VitePress, Paper Shaders and Squircles. It’s fun to build something new!
LLMs are great for adding inline API docs, but they lack clarity and craftsmanship when there is no way to easily verify correctness. Writing the docs for fate also helped me gain clarity, and allowed me to simplify and improve fate.
But there is something deeper. You can immediately tell if something was built with care by looking at the README of a project or if someone vibe coded both the design and content of their blog. I basically pay no attention to long-form AI content, similar to how I “don’t see” sponsored links on a Google search result page. If my brain pattern matches that something is written by AI, I will usually ignore it. After all, I can just ask my LLM to give me the same or better content.
Details matter, and artifacts like documentation or meeting notes need to be precise and correct. AI-generated content and code often look like they could be real, but tend to end up as nothing more than the noise they were generated from.
While the space for “fast-code” exists now, LLMs haven’t quite figured out craftsmanship and there is an equally big space for human editing to build great software.
The current setup is not ideal, and there are many gaps and pitfalls. There are obviously many better ways to work with LLMs, but this workflow is the one that works best for me so far.
Here are some of the issues I frequently run into:
pnpm build and pnpm test to verify your changes”, Codex often forgets to build, or only runs the linter when making TypeScript changes. It comes back with a checkmark emoji validating an unrelated subsystem, while the actual change is broken. I currently always explain the exact commands to run in the prompt, which usually works but is not a consistent fix either. Why do the models get confused about this?jest or use Jest CLI flags with Vitest. It’s funny, because it’s kinda my fault for building the industry-leading test framework, but man, Codex, can you just stick to the repo rules?An ideal approach might be one agent that orchestrates ~4 agents to solve the same problem independently, then have a set of models evaluate the solutions and only present the best one. I hope someone builds this!
If you have suggestions on how to improve my workflow, please reach out!
I assume everyone is observing these two types of people:
It seems binary, but it’s more like a spectrum. I see a few people moving across this spectrum faster than me, and a few are slower, but I barely know anyone who is not using LLMs at this point. Software engineering is accelerating rapidly and it’s no longer defensible to avoid LLMs if you want to have a future in software engineering.
As with any disruptive technology, I imagine large organizations will remain bloated for a while, but startups are now enabled to stay leaner for longer and hire more slowly. Going forward it’ll be the expectation that everyone is a 10x engineer – for the first time by literally orchestrating 10 agents at a time.
Right now, you still need experts next to the vibe coders. The code that is generated is often slop. You can create five years of tech debt in a few weeks. Maybe you can steer AI to fix that tech debt, but it’s exhausting. A good ratio might be one domain expert for every three vibe coders, and the expert’s job is to keep the vibe coders moving fast.
Here are a few dimensions that I’m thinking about:
I’m actually not convinced there is still value in building frameworks. Defaults matter more than ever, and I don’t know if it’s “too late” and “so over” for creating anything new.
After all, why adopt a new library written by somebody else instead of one-shotting a custom library perfect for the problem you are trying to solve? Are the existing popular programming languages and frameworks “the beginning of the end of software”?
I don’t think we are there yet, but it might be coming sooner than we think.
I’m most excited about true personal computing. Apple is great, but it’s so frustrating how locked up everything is and how you have to stick to a strict set of defaults. Sometimes I just wanna add a button to a closed source app. At the same time, open systems are still kind of a mess and require a lot of hacking and maintenance.
Can LLMs enable high-quality personal software on-demand and reshape the entire software industry?
Right now, most media comes in a single format: books, movies, video games, etc. I’m convinced that anything digital will be generable by LLMs eventually and in the future all entertainment will become multimodal.
You’ll be able to say: “Take this movie, and turn it into a top-down Game Boy Advance RPG”, “change this TV show into a book trilogy”, or “make a high-quality 90-minute movie about how Matt Damon got stuck on a remote planet, again.”
It’s not even that far off, just check out how much the AI-generated videos on this YouTube channel improved in the past 12 months.
I used to have about 50 ideas for blog posts based on my frontend, management or JS infra experience. Now that LLMs exist, 40 of them are no longer worth writing. If I were to write them, it wouldn’t be to teach people about my knowledge, but instead it would be about influencing based on my perspective.
I think about this a lot when I read somebody else’s posts: What am I learning here that an LLM couldn’t have also told me? Is it all just about stories and discoverability now?
I’m not sure it is solvable, but I wish there was an easier way to trace what is generated by people (and by who) compared to LLMs. I’m actually not too sure about this one: While I prefer to know if I’m talking to a human or AI, I might care more about consuming high-quality thoughtful content and it’s becoming harder to identify that.
I don’t know what happens next, but I do know that you are absolutely right!