Thanks to visit codestin.com
Credit goes to cpojer.net

Christoph Nakazawa

Hi, I’m Christoph. For over 20 years I’ve been building open source tools that shape how millions of developers work, including projects I created like Jest, Metro, Yarn, jscodeshift and MooTools. I previously led JavaScript Infrastructure and managed the React Native team in Menlo Park and London.

Today, at Nakazawa Tech in Tokyo, I’m building new developer tooling and indie games in the open, including:

I write about Frontend Development, Engineering Management, Leadership, User Experience, and more.

Latest Article

You are absolutely right!?

2025 will be looked back on as the most transformative time in software engineering. Previously, LLMs could build simple toy apps but weren’t good enough to build anything substantial. I’m convinced that has now changed, and I’m sharing my thoughts and pro tips.1

People who doubt the value of AI are usually quick to point out that people building with AI haven’t shipped anything. Let’s get that out of the way first. Here is what I worked on in the past 12 months, sorted by recency:

  • fate, a modern data library for React, 80% AI usage
  • JS Infra for an AI Startup, 3 months, 4 people, 10% AI usage
  • Relang.dev, a service to auto-translate JavaScript apps, private alpha, 50% AI usage
  • A React Native app built with Relay & GraphQL, 4 months, 3 people, 10% AI usage
  • Open Source across Nakazawa Tech (fbtee, templates, …) and Jest, 10% AI usage
  • Post-launch features for Athena Crisis, a turn-based strategy game, 10% AI usage

People who know me will say I’ve always worked on multiple projects concurrently. With AI, I’m working on more projects concurrently. I worked fewer hours in 2025 compared to 2024, but the hours I worked were more intense.

How I use LLMs

I primarily use ChatGPT or Codex on the web. I used Claude Code and tried some open models, but I never vibed with any of them. For Claude Code specifically, I just cannot handle the way it communicates without rolling my eyes.2

  • Multiple Solutions: The killer feature of Codex on web is the ability to specify how many versions of a solution you want. When I was building fate, I asked for 4 versions for anything slightly complex. As a result, I barely trust single-response LLMs anymore.
  • Fire and Forget: I don’t always know what a solution should look like, and I’m not a fan of specs. Code wins arguments. I just fire off a prompt, look at the results, and if it’s not what I wanted or doesn’t work, I start a fresh chat with a refined prompt, and usually ask for more versions for comparison. If I need to plan, I will usually do it in a regular ChatGPT chat with a non-Codex Thinking or Pro model.
  • Fun: I love building things. All my talks and blog posts are about building something! I don’t want AI to take that away. I fire off a codex session for anything that’s boring or slow like figuring out how to call some API or how to integrate Stripe billing so I can focus on the bits that are fun. I still write a lot of code by hand.
  • Sandbox: I obviously tried terminal agents, but besides the security implications, I just don’t like anyone touching files on my computer – human or AI. I like to stay in control, and I want my agents to work on their own computer and send me the results. I am wondering whether it’s worth getting a separate Mac mini that will be entirely agent-controlled.
  • Edit: A lot of people let AI write and commit all their code. I’m not there yet, and I am much closer to “merging with the machine” instead of letting the machine do work on its own. I heavily edit and rewrite AI code. I set up a strict project structure and Codex has gotten really good at following it, and of course there are times when I just take AI-generated code if it’s good, but I still uphold the highest quality standard for anything attached to my name. I don’t know for how much longer I’ll keep doing that.3
  • Parallelize: I usually fire off 3-5 Codex sessions when I’m about to step away from the computer, or before I go to sleep. I then work through each of the solutions one by one. When I’m done with a task, I archive the chat and move on. I rarely go back to old chats and wish I could just delete them in Codex. There is usually nothing in the chat history that is more relevant than what’s in the repo right now.
  • Now, or never: Anything that came to my mind used to go into a todo list. Now I consider whether I can have an agent do the work for me by dropping the thought into a prompt. Then, the issue never makes it into the todo list, and the change is shipped within an hour instead of never.
  • Context: Turn Memory off. I create a new chat for every question. When I’m handwriting code, it forces me to think deeply about the problem, gather the necessary context and ask for the concrete change I want.
  • Focus: An overwhelming number of “wrappers” and new tools get released constantly, and a lot of it feels like noise. I prefer to keep my head down and build with ChatGPT until something clearly better comes along that I can no longer ignore.
  • Familiarity: It takes time to become good at prompting a specific LLM, and the same style of prompt may not work as well with other LLMs. You tend to get a feeling for how good the model will be at a certain task, and if it takes more or less time than anticipated, it’s always worth investigating.
  • Debugging: Whenever I work on a mobile app and get an undecipherable production crash, or when a React Native app doesn’t start because of some Hermes crash, I just dump that directly into ChatGPT. This alone has saved me days.
  • One-Offs: All my throwaway code is now written by LLMs, usually one-shotted. I will probably never again carefully handwrite a JavaScript codemod in my life.4

Vibe Engineering fate, a modern React data library

My AI usage increased significantly in September. I was working on an app that used tRPC and a popular React data library that led to bugs, boilerplate, and other problematic patterns.5 I was frustrated and driven to build something new, but didn’t know much about tRPC or whether it was possible to build what I wanted: a data framework like Relay that works with tRPC. I started building fate6 with this exact prompt:

I have recently started working with trpc. I’m a heavy graphql and relay user as I was part of the team that created those technologies. However, many people use trpc nowadays. The problem with trpc is that they gave up on their own query client and all the other options suck imho. The genius of Relay is its normalized cache and query fragments that allow you to specify what data each component needs to render, and then with the Relay compiler it “hoists” everything up to the root query so only one graphql query is necessary at a time.

I would like to figure out what a trpc client inspired by Relay would look like. It would need normalized ids (probably ${objectType}-${object.id}) for a normalized cache, and some way to define fragments for JavaScript components. Can you research if this is possible, what syntax/dsl we could use for fragments, and whether a system like this is possible and lay out how to implement it?

Prior to GPT-5 I concluded that LLMs weren’t good enough for serious coding, and I actually almost gave up early while building fate! I pushed through, and open sourced the library after 205 Codex sessions, with more than half of them generating 4 solutions for each problem.

Some folks ask if I’m slower or faster with AI. That’s the wrong question to ask. Without LLMs, I wouldn’t even have started a side project like this!

My goal was to use Codex to build a Proof of Concept as quickly as possible to validate the idea, which was great, and then went terribly wrong. Here is a brief log of what happened early on and how it felt:

  • I set up the project structure and a client/server demo using an existing data framework. I had a plan and put Codex to work.
  • It was so great to build the PoC. Hell yeah, let’s go! I want to vibe code all night, I want to vibe code at the beach.
  • I pretty quickly hit the limits. I had something that kind of looked like what I wanted, but couldn’t support any complex use cases. I hit a wall: I got frustrated when I didn’t understand the code to fix the issues, and the LLM ended up doing the wrong things.
  • At this point, it was far less enjoyable compared to building from scratch. I had to build a mental model of the LLM-generated code and compare it to the mental model of how I would have built it.
  • I realized: This is a tough technical challenge. Some of the TypeScript types are quite complex. Maybe some of it is in the training set, but more advanced use cases and ideas aren’t, leading the LLM to take shortcuts.7
  • I ended up rewriting the core completely from scratch. I fixed all the bugs, eliminated the shortcuts, replaced a bunch of slow algorithms that the LLM wrote and were unnecessary, and that put the project on a stable foundation to build upon. When I say I rewrote it from scratch, I still used regular ChatGPT for various questions or to help me with complex TypeScript types.
  • After this step, Codex was able to fly in the fate codebase. I regularly had 3-5 sessions running concurrently, each producing 4 solutions to a problem. From that point until the release, I was moving way faster than I could have otherwise. I hit the usage limits multiple times and kept buying tokens.

About 100 of the Codex sessions produced 4 solutions. Sometimes it was challenging to review and identify the best out of four solutions, but it almost always resulted in a better outcome.

Generating four solutions to a problem usually leads to key insights more quickly. For example, on hard problems one or two sessions would regularly give up, one would quit 10 minutes in, and another would go on for 40 minutes and come up with a great solution. Imagine asking for only one, you get one of the first three answers, and you give up completely.

Similarly, when two or three of the solutions look almost identical and elegant, that probably means they are the ideal solutions. And sometimes it’s just great to see different approaches to solving the same problem. This workflow actually made it hard for me to work with LLMs that don’t produce multiple solutions: How do I know we aren’t just stuck at a local maxima?

Clarity, Intent & Craftsmanship

Once Codex and I were done with the code, examples and app template, I built the website and wrote the docs by hand, without LLMs. For building the docs site, I wanted to try VitePress, Paper Shaders and Squircles. It’s fun to build something new!

LLMs are great for adding inline API docs, but they lack clarity and craftsmanship when there is no way to easily verify correctness. Writing the docs for fate also helped me gain clarity, and allowed me to simplify and improve fate.

But there is something deeper. You can immediately tell if something was built with care by looking at the README of a project or if someone vibe coded both the design and content of their blog. I basically pay no attention to long-form AI content, similar to how I “don’t see” sponsored links on a Google search result page. If my brain pattern matches that something is written by AI, I will usually ignore it. After all, I can just ask my LLM to give me the same or better content.

Details matter, and artifacts like documentation or meeting notes need to be precise and correct. AI-generated content and code often look like they could be real, but tend to end up as nothing more than the noise they were generated from.

While the space for “fast-code” exists now, LLMs haven’t quite figured out craftsmanship and there is an equally big space for human editing to build great software.

What could be better

The current setup is not ideal, and there are many gaps and pitfalls. There are obviously many better ways to work with LLMs, but this workflow is the one that works best for me so far.

Here are some of the issues I frequently run into:

  • Forgetful: Even if the docs clearly state “run pnpm build and pnpm test to verify your changes”, Codex often forgets to build, or only runs the linter when making TypeScript changes. It comes back with a checkmark emoji validating an unrelated subsystem, while the actual change is broken. I currently always explain the exact commands to run in the prompt, which usually works but is not a consistent fix either. Why do the models get confused about this?
  • Wrong Tools: Even though fate uses Vitest and the docs clearly explain how to verify changes, Codex still tries to run jest or use Jest CLI flags with Vitest. It’s funny, because it’s kinda my fault for building the industry-leading test framework, but man, Codex, can you just stick to the repo rules?
  • Slow: The current models can work for up to an hour. They could be much faster. Sometimes, they will work for 9 minutes, get some stuff done, then they get scared, undo all changes and tell me “Sorry I couldn’t do that within the time available” while their sibling LLM in another container happily works for half an hour.
  • Comparing Solutions: It’s tedious and confusing to compare solutions when you ask for four versions. It would be great to just ask the LLM to rank the solutions based on some criteria.
  • Merge Conflicts: When you aggressively parallelize, you’ll run into merge conflicts all the time. Of course, this is the same as working with humans in a team, but the scale is different. If you run into one merge conflict a day with humans, you’ll run into ten a day with LLMs. I wish there was an easier way to say “Pull the latest changes and redo your exact changes” on Codex web. Sometimes it works, but sometimes it will just come up with a completely different solution instead and waste tokens.
  • Copying Changes: The web workflow allows creating a PR or downloading a patch file. I still heavily edit the AI-generated code, so I have to manually copy and paste the patches. Somebody should build a Chrome extension to directly patch files from Codex on the web into a local Git repo.
  • Push Back: I wish LLMs pushed back more often. They are so eager to please that they will happily go the wrong direction and jump off a cliff. Carol, we just want to make you happy. No, don’t. Tell me if I’m full of shit.
  • VS Code: I love Copilot for autocomplete, but somehow the chat in VS Code is basically useless, regardless of which model is used. The recent changes to the “Next Edit Suggestions” plugin are so visually distracting that I turned them off. I love VS Code but I’m worried they are losing the plot.

An ideal approach might be one agent that orchestrates ~4 agents to solve the same problem independently, then have a set of models evaluate the solutions and only present the best one. I hope someone builds this!

If you have suggestions on how to improve my workflow, please reach out!

The LLM Spectrum

I assume everyone is observing these two types of people:

  • Never-AI People: LLMs are dumb, I’m better and faster than them, they’ll never be good and slow me down.
  • AI-only: 100% of my code is written by AI. I haven’t seen the code. Let me call my agents.

It seems binary, but it’s more like a spectrum. I see a few people moving across this spectrum faster than me, and a few are slower, but I barely know anyone who is not using LLMs at this point. Software engineering is accelerating rapidly and it’s no longer defensible to avoid LLMs if you want to have a future in software engineering.

As with any disruptive technology, I imagine large organizations will remain bloated for a while, but startups are now enabled to stay leaner for longer and hire more slowly. Going forward it’ll be the expectation that everyone is a 10x engineer – for the first time by literally orchestrating 10 agents at a time.

Right now, you still need experts next to the vibe coders. The code that is generated is often slop. You can create five years of tech debt in a few weeks. Maybe you can steer AI to fix that tech debt, but it’s exhausting. A good ratio might be one domain expert for every three vibe coders, and the expert’s job is to keep the vibe coders moving fast.

Where is this going?

Here are a few dimensions that I’m thinking about:

On Frameworks & Defaults

I’m actually not convinced there is still value in building frameworks. Defaults matter more than ever, and I don’t know if it’s “too late” and “so over” for creating anything new.

After all, why adopt a new library written by somebody else instead of one-shotting a custom library perfect for the problem you are trying to solve? Are the existing popular programming languages and frameworks “the beginning of the end of software”?

I don’t think we are there yet, but it might be coming sooner than we think.

On Computing

I’m most excited about true personal computing. Apple is great, but it’s so frustrating how locked up everything is and how you have to stick to a strict set of defaults. Sometimes I just wanna add a button to a closed source app. At the same time, open systems are still kind of a mess and require a lot of hacking and maintenance.

Can LLMs enable high-quality personal software on-demand and reshape the entire software industry?

On Entertainment

Right now, most media comes in a single format: books, movies, video games, etc. I’m convinced that anything digital will be generable by LLMs eventually and in the future all entertainment will become multimodal.

You’ll be able to say: “Take this movie, and turn it into a top-down Game Boy Advance RPG”, “change this TV show into a book trilogy”, or “make a high-quality 90-minute movie about how Matt Damon got stuck on a remote planet, again.”

It’s not even that far off, just check out how much the AI-generated videos on this YouTube channel improved in the past 12 months.

On Knowledge

I used to have about 50 ideas for blog posts based on my frontend, management or JS infra experience. Now that LLMs exist, 40 of them are no longer worth writing. If I were to write them, it wouldn’t be to teach people about my knowledge, but instead it would be about influencing based on my perspective.

I think about this a lot when I read somebody else’s posts: What am I learning here that an LLM couldn’t have also told me? Is it all just about stories and discoverability now?

On Identity

I’m not sure it is solvable, but I wish there was an easier way to trace what is generated by people (and by who) compared to LLMs. I’m actually not too sure about this one: While I prefer to know if I’m talking to a human or AI, I might care more about consuming high-quality thoughtful content and it’s becoming harder to identify that.


I don’t know what happens next, but I do know that you are absolutely right!

Read More…

Essentials

Fastest Frontend Tooling
starter pack
The Perfect Development Environment
starter pack
Set up a new Mac, Fast
starter pack

Engineering

Frontend Engineer Archetypes
article
Building a JavaScript Bundler
guide
Building a JavaScript Testing Framework
guide
Rethinking JavaScript Infrastructure
article
Dependency Managers Don’t Manage Your Dependencies
article
Principles of Developer Experience
principles

Management & Leadership

Mastering Tech Lead Management
article
Inclusion in a Distributed World
article
The Nakazawa Management Starter Pack
starter pack

About

Athena Crisis is now Open Source
article
I'm Building a Company
article