Thanks to visit codestin.com
Credit goes to rhonabwy.com

Navigation Notes – Agentic coding

I don’t doubt that jobs associated with software engineering are undergoing a massive sea change. While I was far less certain as to its applicability a year ago, tools like Codex, Claude Code, and OpenCode are effective and – to me – can provide value. These tools are getting better at an extraordinarily high rate, but I don’t see it going up forever. Like any other advancement, the rate of advancement is an S-curve. We’re in the more-vertical bit of that curve right now.

Like others, I believe this is a sea change for software development, something that’s profound and that has enough value and impact to change quite a bit of how software development is done. Within that, I think the job of doing software development will and is changing, but unlike the hyperbolic assertions of some, software engineers will still be needed— and critically.

I spend a lot of time writing, teaching, sometimes explaining various technical topics to people. And from where I stand, good documentation is more important than ever for getting the best results out of agentic systems. I’ve been bouncing these ideas around for several months, and I want to share some of those thoughts. How I’ve been thinking about the topics, and sharing what’s worked for me. If you’re inclined to try it out, I hope this helps to provide a path for you.

Before I get into the specifics of what’s worked for me, and what hasn’t, I want to share some foundationals. These are my hypotheses on why I’m seeing the results I do, and hold up for me. I’ll leave it for you to judge for yourself.

LLMs don’t reason. They encode a tremendous amount of data/knowledge, but the fundamental nature of how an LLM works is “given prior words, predict the next word”.

I am confident in this – backed up by some really great research from Apple’s machine learning researchers: The Illusion of Thinking. It’s a formal research paper, very worth a read for the abstract and conclusion, even if the specifics are something you can’t easily follow.

Model capability jumps

Some surprising characteristics come out of this as you step up in model size. At the “smaller” end of the scale— 3 billion parameters or so— you’ve got good basic prediction. Enough that it’s sufficient for effectively predicting to fix spelling mistakes or complete a written sentence.

LLMs models can be loosely grouped by the number of “parameters” in the model – how “large” it is. There’s a whole thing here with “scaling laws” and that performance gets better as the models get bigger, and are trained longer, with diminishing returns at the far ends of those curves.

At 7 to 8 billion parameters, there’s enough encoded knowledge in the model that predicting the next word starts to provide new emergent capabilities: instruction following and the ability to consistently structure output— such as writing correct JSON, or the simplest of code.

At 40 billion parameters, there’s another jump – this is the size where, if it’s been in the training data, it can predict with some semblance of reasoning and improves notably when using “chains of thought”. This is also the point where if you generate content, then feed it back to the same LLM with prompts effectively asking “does this seem right”, it’ll consistently start to correct to better outputs – kind of the leading edge of where it can start to “hill climb” by following instructions.

At 100 to 120 billion parameters, the models start to usefully generalize. This is where GPT-3 was when it came out, and multi-lingual aspects of models are starting to get consistently good. World knowledge is solid, and (related to that) there’s enough fine-grained detail in the model that predicting on niche topics is far more solid.

I get a bit fuzzier after this, but I think somewhere around 400 billion parameters is where the models have enough detail in them, enough trained paths of “how to do things”, that it generalizes the predictions so that you can get effects such as multiple steps of reasoning, or basic planning. With recursive use of the model, it exhibits both planning and instruction following, which makes it the ideal starting point for something like agentic computing.

The latest “frontier” models are north of 1.2 trillion parameters today, maybe bigger – I don’t know where the latest ones are at. OpenAI, Anthropic, and Google don’t exactly share all the details at that leading edge. Even with the breadth of data, it can *only* predict patterns that it’s seen before. The generalization jump at 100 billion parameters means that it can start to apply patterns it’s seen in one place to another, but it still has to have seen it somewhere.

Memory and Probability

The other thing to know about LLMs is that they’re the ultimate “goldfish brain” – without you passing it in, they intrinsically have no memory of earlier conversations. They have a lot of knowledge embedded in that training, some of which is exposed easily. That is why “The capital of France is…” predicts Paris. And with more good training data, it gets better. But it still doesn’t reason – it doesn’t say “is this logically correct”, or apply anything akin to deductive or inductive reasoning today. Even the planning – you’ll see it in agents as a “thinking” mode – is using recursive calls back to the same LLM. First generating, then iterating on what they generated to improve the results – and that actually seems to work.

But for the love of all, please don’t think that it’s “reasoning through” anything – using any logical thought process. That space is a cutting edge of research – “neuro-symbolic” computing. What comes out as reasoning and planning today is reflecting what others have reasoned about, that’s been trained into the model. It probabilistically reflects those patterns out with generalization.

That last part – the probabilistically – is important. It means the story can (and likely will) change every time you ask. Some short predictions are solid enough that you’ll get consistent answers, but as you step “down the road of these tokens” – get to the finer-grained patterns – then it’ll follow down its model tracks, but fork at different places. The more context you give it – those up-front instructions, examples, etc. – help constrain that prediction. This is where the phrase “It’s all context engineering” started really popping up about a year ago in this space.

Agents

I mentioned this whole “chain of thought” thing earlier, and how you could recursively feed back generated content into an LLM to get improved output. So let’s use that – if you stand outside the model, control what context you feed it each time, and do that in a loop? Congratulations, that’s exactly what an agent is and does.

That ability to reliably act as an agent comes from the instruction-following tendencies of those larger models. Meaning that, if you give them instructions, the predicted output text tends to follow the pattern of looking like it’s following the instructions. The context of those instructions constrains the LLM prediction so that you get something really interesting.

The next thing agents added was an idea of “tool calling” – and that’s tightly fitted into how a model works. If you train the model to emit some specific text phrasing in a structured output, the agent code can look for that pattern and interpret that as “I should use this tool” – using the structured content it generated as arguments. The tools typically provide back natural language output, sometimes structured, that the agent uses as additional context and continues on. This pattern was standardized and generalized into what’s now MCP. It solidified the mix of exposing deterministic APIs and tools to these LLMs – which _aren’t_ deterministic.

The folks writing agents have refined what it means to call tools, tried out a bunch of different patterns, made mistakes, and found success; sometimes all at once. It’s a fair bit of work to create and maintain an MCP “server”, or those tools. But we’ll see more of both, because they give the agents a superpower – a way to deterministically get something done. With tool calling, asking for something like “2 + 2” will always equal 4 – where if you ran that through probabilistic generation, well… I wouldn’t expect that to work 100% of the time. Combine tool calling with the instruction following and a breadth of world knowledge to pull from, and you’re seeing genuinely useful results. This is where we are now – February 2026 – as I’m writing this.

The point at which I felt it was genuinely useful was when Claude Code added the notion of “skills” for agents. The idea is a variation on tool calling: including in the context that there is additional knowledge to access, and allow it to decide, based on its predicting planning output, when it should use that knowledge. The structure of skills is set up as sets of deeper knowledge, listed with short summaries of what’s contained within it, or why an agent would load it. This gives agents a sort of “progressive disclosure” that helps guide the predictive outputs. Skills are primarily instructions or descriptions, although they can also include scripts that an agent can write and run, to achieve any number of goals.

Okay – that’s a ton of background and my views/speculation/hypothesis on how this all works. Let me touch on what’s worked for me.

Suggestions from what’s working for me

So if you’re trying this agentic coding thing out, there are a couple key pieces of advice that made a huge difference for me.

(1) Ask me questions for anything ambiguous

Make sure that phrase is *always* in your instructions. If you don’t have that, and leave something vague, it’ll pick something. Maybe you’ll like it, maybe you won’t. What you get from each time you try it can be wildly different.

When you provide this up front, the agents do a much better job of asking for clarifications where they would have guessed and chosen a path randomly otherwise.

(2) Make a plan, then implement it

If you want to have it help you code, build a plan that helps constrain what it’ll generate. That whole instruction following / apparent reasoning thing works to your benefit here. You first create a plan to achieve what you’re after, work out the specifics of how it’ll achieve it with that plan up front, and when that’s evolved to your satisfaction, then let it implement.

I recommend providing instructions that it shouldn’t implement anything until you’ve approved the plan. Some of the earlier agent systems wouldn’t always anticipate that in their predictions, instead predicting that it should “just do it already”. They seem to be better about that today, but it’s worth keeping in mind.

Especially when working with an agent and a larger model (such as Codex or Claude), you can use the world knowledge of the model to help you create the plan. “Ask” the model for options, ask it to explain the pros and cons of choices, and with the “ask me questions for anything ambiguous” instruction already loaded, it’ll help to refine a plan to something pretty good.

Note: making a plan with an agent, and knowing how that agent will respond and run with a plan is a skill of yours. It’s something you’ll need to learn – so plan to try it out, plan on making mistakes, and learn from those mistakes. Making a better plan is the single best way to get better results out of an agentic coding assistant.

(3) Use deterministic feedback loops

Set up some structure that while you’re having the agent work on code, it can verify that what’s coming out is functional. The most obvious thing here are unit tests to me, but also linters – especially for interpreted, flexible languages such as Python, or TypeScript. In your instructions, make sure you have a line of instruction of “ensure the tests pass”, and maybe even give it the context of how to run the tests, or compile the code and run the tests. The world knowledge is good enough that if you’re using a standard project setup, it’ll often try the right thing, but if you tell it – it’s way more successful on that front.

The looping structure of agents will see the output from a failed compilation or failed test, recognize there’s a problem, predict that it should understand the problem and fix it, and then proceed to try and do that.

The agentic systems today – even more so than 6 months ago – are surprisingly good (surprising to me) at what AI researchers call “hill climbing”. The more deterministic things you can have it check against, with clear (AND CONSISTENT) “good/bad” scores, the better it’ll be able to use all the other systems it has to iterate and refine into what you’re after.

This is a space where the language you’re using can have a notable impact on how effective the agent is. TypeScript can get better results than JavaScript when it’s doing type checking and using that as a constraint. Likewise, a compiled language like Swift has even more benefits, with its layers of safety and guarantees that it checks – and provides warnings and errors when it can’t.

(4) Constrain what you’re doing

I’ve gotten much better results when I was really specific and kept what I was asking for to something as simple as possible, or – using that plan concept above, broke down the problem before letting the instructions following actions of an agent do their thing. I really wish I could tell you “how much” to constrain – but I can’t. For one, it’s changing pretty rapidly. The models can do a lot more today than they could 6 months ago, and WAY more than they could a year ago.

But the core of this breaks down to the simple idea – if it has to “reason” about what to do – and it’s more than a couple of steps or something obvious – there’s a higher likelihood that it’ll go awry. The agents I’ve used can appear to be pretty good at reasoning, but that probabilistic nature can catch up unless you’re keeping track of what’s happening with some deterministic thing outside of the LLM’s predictive output. Claude Code, for example, has a “to-do” tool where it’ll create todos for itself, and then check them off as it completes them, with the simple instructions to check the next to-do, then do it, mark it complete when done.

That comment about the goldfish brain here? Yeah – this is where it comes from. It does great with lists when it’s using tools, not so much when it isn’t.

(5) Find, use, and create skills

The libraries and frameworks you want to use, how to use them effectively, the patterns of software architecture you want – that’s all ambiguous to an LLM when you start. These are the perfect places to provide it with additional context. Sometimes it’s as simple as “use SwiftUI”, but the more detail that you can give it about what makes “good” use of your framework, the better.

I don’t honestly know all the boundaries around this particular advice, but it’s been incredibly useful for me. I’ve written a couple of skills for myself, and felt like the results have been better. That I need to rely on “feels” for knowing if it’s better or not is pricking under my skin – it’s driving me a bit nuts. There are some paths to “testing” if it’s better or not – but that rabbit hole is a space called “evals” in the Agentic/LLM world. And it’s a really, really deep rabbit hole. Maybe more on that in another post. Right now my subjective results are the evaluator, as messy and time-consuming as that is.

The real take-away here is this – if you’re finding yourself providing the same set of instructions more than once to different sessions using the agent to do something, capture it down. That collection will grow. Some of those instructions will be specific to a technology or task. Group those together into a file or set of files, and make a skill from it. The patterns and tools at OpenSkills provide some structure that you can use for any agent. And there are growing collections of skills that are pretty easy to find on GitHub.

Never use a skill without reading it!

That should go without saying, but a lot of folks just want something to work. I hate to be the bearer of bad news, but there are a lot of malicious intents out there with skills. Always read a skill before you just plop it into your collection and use it. You paying attention here is huge. You’re the auditor and guidance for all this – it should be what you want the agent to help you do and how – if you don’t agree, don’t use the skill. Or edit it, and try out your own variant with how you think it should work.

(6) You’re responsible for memory (for now)

As I’m writing this (February 2026), the agents are starting to step into the space where they’re retrieving data on their own – in effect, using memories that you provide. But they aren’t (yet?) self-sufficient enough to know _when_ to store things like skills as memories.

You can provide too many skills, or too many tools, to an agent. It can get “confused” (meaning it doesn’t consistently pick how it uses all those tools), and the number of tools it takes to confuse these models is surprisingly low.

Keep the context that it always needs to load and use concise, accurate, and confined to the project at hand. Change it if you’re changing what you’re doing, and you’ll get improved results. Tools and skills, where it wouldn’t be clear to you which you might choose, shouldn’t be offered together to an agent. That’s pretty much a guaranteed recipe for inconsistent use (aka “disaster”).

Discussions, research, a multitude of ideas, and implementations are on the leading edges of the development of agents. Because of that, I expect this space to evolve even in the next couple of iterations of software releases.

(7) Checkpoint and reset

Agents with code use patterns that are self-reinforcing. Far more than humans writing the same code. Speaking for myself, I’m used to multiple different patterns and efforts being used in the same codebase. It’s not great, and I don’t love it – but it doesn’t screw me up all that much. It’ll tend to screw up an agent. The more consistent your codebase is, the more consistent the agent can be in generating code that works with the same patterns.

These agents can generate code a lot faster than most of us can type. So the magical feeling is that you’ve got a bit of a superpower at your fingertips to try something out, see if it works. Do it. Try it. That’s the benefit, but know how to back out of it if it doesn’t work.

You can do some instructions about refactoring or re-adjusting how the code works to new patterns, but that’s one of the harder tasks for agents to do well. If you’re adding something to your codebase and trying something new, use git (or whatever source control) to be explicit about making a starting point and see where it goes with your plans and instructions. If you don’t like it, rather than trying to evolve it “back and to the left”, reset back to an earlier commit and try again with updated instructions and the lessons learned in your head.

I keep a notepad (okay, a text editor really) and copy/paste my instructions, tweaking them and trying them out. Sometimes I’ll do this on a branch, run it with the instructions, then reset to the head of the branch and give it another go with modified instructions. A different technology choice, a different pattern, and so on.

When I’m really interested in the problem space, I get excited about exploring, about poking it and seeing how different patterns work. Just make sure you do one at a time, and from a consistent starting point, as you’re exploring the possibilities.

I would not be surprised if this also was a space that changes as how agents work evolves. This whole area is where groups like Google’s DeepMind team really excel – mixing exploration (search) and planning, in with expectations or results and memory. Those ideas are a bit beyond the bleeding edge of what’s available for most agents today, but are being actively explored.

(8) Use the LLM where it’s good, and scripts or code where it isn’t

This is probably the flimsiest advice – I don’t have this pinned down to anything more concrete. I’ve run into this far more using agents to help me to do data analysis than code, but it applies just the same. If what you’re asking the LLM to do is reading and writing a ton of data, consistently, mechanistically – without translating it, summarizing, or such – then you’ll be WAY better off having the LLM write a script or code to process the data than running all the content into the LLM and back out as predicted text. In small pieces, it can work fine and be pretty seamless, but it’s fundamentally noisy and unpredictable. A deterministic script isn’t, and you’ll get consistent results with a script. You won’t with the output of an LLM.

You also tend to pay for “tokens” (words) that you send in and get out from an LLM. Not funneling MB (or more) of data through it when you don’t need to will save you bucks.

If you need to do something like this – summarizing content, for example – you get far better results when you take advantage of (4) above – keep it constrained. While you *can* dump way more content to these latest agent models, you get better results when you can process it in constrained, consistent chunks.

Keep this in mind when you’re making your plans or choosing how to use the agentic system. Choosing up front if something is better with a probabilistic path or a deterministic one ends up being both a skill, and sort of a fundamental one in using the agents effectively. (And yes, by calling this a skill, I mean I’m still barking my shins on this and learning myself.)

(9) Small, sharp tools

The phrase, as it relates to the philosophy behind Unix, goes back to the 70s at Bell Labs. In a too-short summary:

  • text is the primary interface
  • write programs that do one thing, and do it well
  • composable tools that read, process, and output text

For most coding agents, text is the “primary interface” that you use to communicate with the agent, and not surprisingly, the agents are being trained to (and are getting) tremendously good at using the classic “Unix tools” that focus on taking in text and sending it back out.

I think there’s a growing space here (reinforced by examples such as Playwright) where providing custom CLI tools to agents is another powerful way to expand how effective agents can be. Microsoft just released a CLI to control and run Playwright – a visible or headless browser that’s used for testing or interacting with sites on the internet like any other browser, just controlled by code in addition to any interactions you might make with it.

What stands out to me is that there are patterns in what and how you output things that work better with agents – concise, “token-efficient” output, clear and meaningful errors with instructions for alternative usage, output patterns that expect to have the most relevant information about progress within the last 5 lines of the output, and so on.

For now, text is the universal language – including some crazy emoji emphasis (for better or worse) – for communicating instructions to agents. There are some early paths for capturing and iterating based on screenshots with some agents, but I haven’t trodden down those roads as yet.

Note: If you started into computing well after graphical user interfaces were big – this may feel like an exceedingly awkward shift. I have the benefit of being old enough that I grew up with command-line interfaces and shells as the first point of contact with a computer. Even learning command line tools like you’ll find in a terminal/shell is its own skill, and takes some time and understanding. I’ve always found them extremely powerful and recommend learning how to use them. These are yet another thing to learn, which can be overwhelming if you’re already feeling underwater with learning the rest of this.

Sometimes just being aware that it is a skill in its own right makes approaching it a bit easier.

It’s a set of skills – give yourself grace to learn them

Wrapping up, I’ll just reiterate that using agents to help solve coding problems is a new skill in itself. How to use these tools is something we’re all learning. The rate of change for anything in computing has always been high, with software on the fastest iteration cycle. How to get effective results from agents, and how to arrange all the parts they need, is at the forward, bleeding edge of that rate of change.

I favor taking a view of exploration and even play where I can, and encourage you to do so as well. I recognize that I’m in a supremely privileged position to be able to think of it all this way. Give yourself leeway to try things, to screw it up, to get the feedback, and try again.

Like just about anything in technology engineering, it’s a big darn puzzle. And like most of the puzzles in our industry, there are facets that are easier to solve, and some that take a lot longer.

I think there’s more here, but I haven’t advanced enough in my own learning journey to suggest further steps or advice. The space that I’m looking forward to is what I’d generally term as “systems thinking” – working towards thoughtfully (and intentionally) composing systems, effectively using abstraction and encapsulation, understanding the interfaces and implications of assembling these pieces into new structures and how that’ll work – or not.

As a last note, let me tip my hat to Simon Willison, who provided a link to How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt – this is right in line with what I’m seeing.

The best thing I ever made

I’ve cobbled a lot of software over my career, led teams, coordinated large groups to both build and run products, to create (digital) infrastructure, even built a company and sold it. But I think the best thing I ever made – or honestly helped to make, as it never could have been myself alone – was a community. That group are the folks who identify as part of the broader Seattle Xcoders community. Not that it’s really about Seattle, or even Xcode – Apple’s IDE. It grew beyond both of those years ago.

I think the power of that group is that it’s not controlled – it’s an aggregate of individual people, all sharing and celebrating, sympathizing, empathizing, teasing and joking. But at the heart its not about the software, or the platform, or any of that crap. It’s the people. Supporting each other, celebrating the challenges we’ve all faced while being either in the tech industry or so adjacent to it that it makes no difference.

The reason I’ll make any claims to having made it, is that it’s something I wanted and led – not someone else’s idea I was supporting. I spent years offering my time, sharing the bits I learned, or cajoling folks into joining me to talk and learn with me. When it started, one of my excuses was working through the Big Nerd Ranch’s books on Cocoa (a development framework for building apps on macOS) and macOS programming. I think for the first half a dozen years it was maybe 6-8 people regularly chatting, sometimes surging to 20 or 30, but mostly a core of folks. Around 2008 the iPhone happened, and the follow-on surge when the iOS app store was announced was honestly kind of shocking. It was an AOL meets the Internet moment from the early 90’s. (Yeah, I’m dating myself) The group sloshed around in intent and focus, some folks forked off, replicas happened, but all in all the core continued, and with the same welcoming, sharing intent.

I stepped back from the group only a few years after that surge, letting myself get washed away in a focus of startups around 2012, determined to ride a wave I saw surging around cloud hosting and infrastructure, dropping in periodically, but I wasn’t pushing, organizing, or leading. The group swelled and excelled with the others who stepped in to lead and push there. That it didn’t fade away, even through the downturn and isolations of COVID, just makes me smile all the more.

I’m more active again with the group, although not always being consistent about making the monthly meetings. (I think I missed a good one on WASM that Geoff Pado did last week.) But when I look back at “what I’ve made” – libraries I’ve worked on, solutions I’ve created, open source projects that I’ve either built or contributed to, the thing that stands out isn’t the thing – it’s the community.

WWDC 2025

I recently found my bag from WWDC 2000. It’s stunning that was 25 years ago.

A photo of a satchel from Apple's World Wide Developer Conference held in 2000.
The satchel I got from WWDC 2000, held in San Jose, CA just prior to my birthday.

This year, I’m looking forward to WWDC from a different perspective – as a full time Apple employee. I contracted with Apple in 2020 for roughly 18 months. I had an opportunity to switch to full time then, but it didn’t quite align at the time. Fast forward a few years, and I ran across a position that was perfectly aligned for me, so I dove on it – and it worked out. As of March 2025, I’m employed by Apple in the Open Source Program Office. That has got to be the coolest part – not only for Apple, but almost everything I work on aligns with open source as well. Like I said, perfect for me.

As I’m writing this, it’s the Saturday before WWDC 2025, and I’m as excited as ever for what’s coming. I’m delighted with the swift.org redesign and its new tutorial for Swift on Server (The “getting started” link from Cloud Services on the swift.org website). And the neat tidbit? The tutorial, and the sample code for it, are open source: https://github.com/swiftlang/swift-server-todos-tutorial.

Looking back 25 years ago – before I picked up that bag – I was living in Columbia, MO. I worked as a staff member at the university in a central computing department, and was both trepidatious and hopeful for what Apple had in store as WWDC came around. WWDC was still quite small then – it hadn’t moved to the Moscone in San Francisco yet, or moved back. Steve Jobs had returned to Apple a couple years earlier, and my hopes were high. This was also about the time of the transition from macOS 9 to macOS X, and the overwhelming changes from that.

Later that year, I choose to take a flyer (leave of absence, technically) from the university, move to Seattle, live with some friends, and try out new opportunities. I’m glad I did, as Seattle has been great for both my wife and I, and we’ve been here since. That was right before the “dot.com” bust, but we weathered that – and the extractive economic insanity / recession in 2008 as well.

With the passing of Bill Atkinson, I’m also remembering the inspiration and excitement from years before due to Hypercard. It was one of the first tools that felt like an honest-to-god superpower. A lever that was “long enough to move the world”. I had similar feelings about the Cocoa frameworks and Objective-C. So many more amazing flowers of possibility have bloomed since then — Web Browsers and JavaScript, the iPhone and iOS, and more recently Swift and SwiftUI.

I wasn’t all that sure about Swift in its earliest years, but by the transition to Swift 4 it was very interesting. I think there’s a lot more potential there, and so many more things that Swift can enable; ideas it can power. I never imagined that I’d be working so closely with a programming language, as I spent most of my career working on (or with) backend and infrastructure services. I appreciate Swift for what it enables – and maybe more so for the people in the Swift community.

For good measure, I want to be clear that this blog still – as ever – represents my own voice, and my sometimes flawed ideas, expectations, explorations, or whatever. I don’t speak for my employer, or anyone else, here – never did.

Code Spelunking in DocC

Head’s up: this post is a technical deep dive into the code of DocC, the Swift language documentation system. Not that my content doesn’t tend to be heavily technical, but this goes even further than usual.

The Setup

While I was working on some documentation for the snippets feature in DocC, I ran into an issue with the mechanism to preview documentation. As soon as I added a snippet to an example project, the documentation would fail to preview about half the time. The command I use is:

swift package --disable-sandbox preview-documentation --target MyTarget

When I first started debugging this, I wasn’t sure what caused the issue. I opened a bug in swift-docc-plugin (spoiler: the bug wasn’t in swift-docc-plugin), thinking at first that it was always failing. As it turns out, it wasn’t always failing – my luck was just poor, and the issue intermittent. I had several commits in one of my side projects that added snippets, which I used to work through my documentation of the feature. In order to write up the issue with reasonable reproduction steps, I created a series of commands to verify the behavior I saw. The flow is pretty simple:

  1. clone the example project that illustrates the problem
  2. go to the commit that shows it working
  3. invoke the preview
  4. switch to the commit that shows it failing
  5. invoke the preview

At this point, I didn’t realize that the issue was intermittent, so I iterated back and forth between commits, cleaning the .build directory to see if that made a difference, and then ultimately noticed a change in behavior. At one point where I expected it to fail, it worked. Ah, glorious: A heisenbug. At least now I knew that I’d have to repeat process multiple times to get the issue to show. With that in mind, I was able to nail down the change in my project that started to illustrate the issue – it was when I added the first snippet.

There’s another project (the exemplar, really) that hosts snippets in its documentation – swift-markdown – that *never* exhibited this problem. That was a real head scratcher. But I did have a reliable reproduction, so I focused on that.

When I work on an intermittent bug, I try to get a debugger attached on the code that’s behaving badly. Because this was invoked through a SwiftPM plugin, I had my work cut out for me. Command plugins are separate executables that Swift package manager invokes internally. It is obnoxious to get a debugger attached to it. You can’t easily do it directly from within Xcode, because Xcode isn’t launching the executable. There’s a conversation about how to wrangle debugging a SwiftPM plugin on the Swift Forums that covers some of this. The way I resolved it this time is to put in a long sleep() command in the code of the plugin, run it through SwiftPM, use the terminal to hunt down the process ID that SwiftPM invoked, and attach the debugger to that ID. This is kind of a nasty manual process, so I used sleep(30) – I’m just not that fast at wrangling all the tools for this. I managed to get attached… and then realized I didn’t need to.

While I was looking at the process list through the terminal to get the process ID, I spotted that the process in question (the plugin) was invoking yet ANOTHER process in turn. I actually knew this previously, and just plain forgot. The swift package preview-documentation command is a light wrapper around docc’s preview command. While I wasted some time with the plugin, this made debugging significantly less painful. I could invoke an example using the docc binary directly. And yeah, it moved the target for what had the bug – it wasn’t in swift-docc-plugin.

Debugging preview in DocC

I closed the issue in the plugin and opened a new issue in swift-docc, summarizing what I’d learned and how to reproduce the issue. It was the end of a day when I got to this point, so I left things alone and came back the next morning. When I opened the issue, I verified the issue using the version of docc released with the Swift toolchain. In my case – that meant the version included in the toolchain that ships with Xcode 16.1.

When I jumped back in, I had a the intent to verify the same issue exhibits with the latest code – against the main branch. The issue request form asked for any issues to be verified against main in order to verify that it hasn’t already been resolved. There were also some comments in the issue – David referenced some other work pending that resolved some flaky tests, that – at a guess – might have an impact. So I buckled down to use the latest development branch of docc and repeat the process to verify the issue.

One of the quirks of verifying this issue is that docc is a separate project from the javascript single-page browser app (swift-docc-render) that displays the content. When you’re running docc from the main branch, it doesn’t know where that content lives – you need to tell it. Fortunately, that’s pretty easy. You set a specific environment variable and docc uses that to know where to load the content.

With that in place, and the example process invocation from my debugging the prior day, I had a way to run this directly. In the terminal, it looks something like:

export DOCC_HTML_DIR=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/share/docc/render/

/Users/heckj/src/swift-project/swift-docc/.build/debug/docc preview \
/Users/heckj/src/Voxels/Sources/Voxels/Documentation.docc \
--emit-lmdb-index \
--fallback-display-name Voxels \
--fallback-bundle-identifier Voxels \
--additional-symbol-graph-dir /Users/heckj/src/Voxels/.build/plugins/Swift-DocC\ Preview/outputs/.build/symbol-graphs/unified-symbol-graphs/Voxels-7 \
--output-path /Users/heckj/src/Voxels/.build/plugins/Swift-DocC\ Preview/outputs/Voxels.doccarchive

I prefer to use Xcode when debugging and fortunately that’s not too hard to arrange. In order to set it up, I opened the Package.swift file of the docc project with Xcode. It sets up the targets for you, and with a package like this, tends to default to the package target. I shifted the platform I was building for to “My Mac”, and the target to the docc executable. With those set, I opened “edit scheme” for that combination.

Xcode lets you set environment variables and pass arguments to the executable when you invoke “run”. That’s perfect for what I was doing – working to easily reproduce the issue where I could debug it.

The scheme editor in Xcode for the run mode, that shows arguments set and an environment variable to make it easier to debug docc.

I set the DOCC_HTML_DIR environment variable and set up the arguments from my example. One thing I had not caught when I first did this was that the path in one of the arguments included a space. Once I realized there was space, and run wasn’t working, I added a / character to escape the space in the name (within “Swift-DocC Preview/outputs/”). With that in place, I was able to run the code and see the results, as well as run the debugger. The issue was, indeed, repeating itself with the main branch.

Once I had that set up, checking the pull request that David mentioned in the issue was a piece of cake. I’ve had a poor time with Xcode handling changing a git branch underneath it, so I closed Xcode and updated the branch, using the gh executable as a helper. When you look at a pull request on GitHub, the CODE button provides you with a command line snippet that you can copy and paste to get it on your local machine. In this case:

gh pr checkout 1070

Once it was checked out, I re-opened Xcode, and was just a matter of running a few more times. Fortunately, scheme settings that Xcode uses when you tweak run arguments aren’t generally overwritten when you switch to another branch. I had some hopes this might solve the issue, but they were dashed pretty quickly. With 5 runs, I was able to verify that the code update didn’t make a difference in my example.

Verifying beyond swift-docc-plugin

Since I didn’t get the quick win with the pull request, it was time to dig further. Switching back to the main branch, I took it from the top. I start by looking at how code gets executed within docc.

The entrance point for docc (https://github.com/swiftlang/swift-docc/blob/main/Sources/docc/main.swift) very quickly leads to a swift argument parser setup (https://github.com/swiftlang/swift-docc/blob/main/Sources/SwiftDocCUtilities/Docc.swift). It is quick to see its broken up into subcommands, one of which is preview. Finding the code for that subcommand is less than obvious just scanning at the folders and files, but command-clicking in Xcode gets you right there: https://github.com/swiftlang/swift-docc/blob/main/Sources/SwiftDocCUtilities/ArgumentParsing/Subcommands/Preview.swift. The preview subcommand code, in turn, uses a PreviewAction that has a perform() function where the “work gets done”. The gist of which is:

  • run a convert action on the project
  • spin up a local HTML server and host the content that was just converted
  • display the details to view that server

When I first ran into this issue, what I saw from invoking a preview was an output that was missing the name of the module when it displayed the preview. I was reasonably familiar with what the contents of that directory should look like, so I ran it multiple times and captured a copy of the directory structure that it built when it worked, and another when it didn’t. Comparing the two, the difference was that the top-level module for my example project just wasn’t appearing. The directories included all the files for the symbols in my module, just not the module itself.

With that knowledge in hand, when I got to this “convert first, then display” setup, I knew the path to search down into was the convert action. I also knew that it had something to do with the top-level module, since all the symbols were there – it was missing the top level module in the output.

Spelunking into convert

If I had to pick a “heart of DocC”, it would be this conversion process.

The high-level workflow takes in two different kinds of data – one or more symbol graphs and a documentation catalog – and assembles a documentation archive from them. The result isn’t plain files with HTML inside them. Instead it’s a collection of the data that represent those symbols (in JSON) that can be rendered – into HTML, or really any other target. The rendering happens with a different bit of code (that’s the swift-docc-render project that I mentioned).

Symbol graphs are generated by the compiler (or other code, really – but generally by the compiler). But symbol graphs along don’t have all the details in a way that’s easy to collect and render them. The relationships between symbols, the type of symbol, and so on, gets cleaned and re-arranged in the convert process. It also mixes in the writing and resources that you provide in the docc catalog. This lets DocC override or add in content, as well as the provide things that don’t exist in the raw code symbols, such as articles and tutorials.

The code in ConvertAction is fairly complex as there’s a bit of abstraction there that makes it a little harder to parse. It abstracts the producer of data and the consumer, and has additional bits to support tracking documentation coverage, captures diagnostics for issues mixing the files together so they can be played back to tooling, and other options, such as building an index. All this is encapsulated in the _perform method. That method, in turn, runs this bit of code:

conversionProblems = try ConvertActionConverter.convert(
  bundle: bundle,
  context: context,
  outputConsumer: outputConsumer,
  sourceRepository: sourceRepository,
  emitDigest: emitDigest,
  documentationCoverageOptions: documentationCoverageOptions
)

While ConvertActionConverter is a jump around the project code, it’s encapsulated pretty well. It’s fairly straight forward to read and understand what’s happening. There’s an inner function with a lot of comments in the flow of that method that made it harder to for me to track what was happening, and where the function boundaries were. Once I realized it what it was, I read around it to again look for that “where’s the core work happening”.

The heart of the convert function is:

let entity = try context.entity(with: identifier)

guard let renderNode = converter.renderNode(for: entity) else {
    // No render node was produced for this entity, so just skip it.
    return
}
                    
try outputConsumer.consume(renderNode: renderNode)

This code is wrapped inside a function context.knownPages.concurrentPerform that iterates through the known pages. I wasn’t sure where it might be dropping the top-level module, so I started with good old fashioned printf debugging. That also exposed a bunch of new types to explore and learn about:

I started off with a breakpoint on the bit of code that sets the entity from the identifier (the ResolvedTopicReference). Pretty quickly I realized there were a LOT of these, even in my smaller sample project, and stepping through each iteration was kind of horrible. To work around this, I reverted to a variation on printf debugging. I started adding in code to see what was happening, and more specifically to look for what I was after – the node in the end result that represented the top-level module. My first printf debugging worked on printing the name (a String) of the entity.

if entity.name.plainText == "Voxels" {
    print("FOUND IT!: \(entity.name)")
}

The first run through just printed them all – and generated something just over 500 lines, so I took a bit of time and looked through them all. Sure enough, somewhere in the middle of that list (they’re not processed in alphabetical order) was the top-level module name that I was looking for.

I started tracking how and where that got created, and its properties set. I expanded my exploration code to only track what was in knownPages to see what it was providing.

for page in context.knownPages {
    //print(page.absoluteString)
    if page.url.absoluteString == "doc://Voxels/documentation/Voxels" {
        print("PING")
    }
}

The knownPages is a computed property, filtering what’s stored in the context:

public var knownPages: [ResolvedTopicReference] {
    return topicGraph.nodes.values
        .filter { !$0.isVirtual && $0.kind.isPage }
        .map { $0.reference }
}

What I didn’t track at the time, and found a bit later, was that the filter statement turned out to be important. I didn’t fully understand the details of what made up a “resolved topic node”, and didn’t know what it meant to be “virtual” or not. While kind being page was fairly obvious, virtual can have a lot of meanings and implications.

In either case, when it was working correctly, the known pages included the url, and when it wasn’t, the page didn’t appear to exist. Because of that, I knew the issue was somewhere in the execution flow prior to where I was looking. I read back to see where things get set up and initialized.

The convert action sets up the context with the following code:

let context = try DocumentationContext(bundle: bundle, dataProvider: dataProvider, diagnosticEngine: diagnosticEngine, configuration: configuration)

The DocumentationContext initializer is pretty lean, deferring the more complex setup to an internal register function. I continued to trace that back further, and the register function uses and ultimately references another type: SymbolGraphLoader.

I was repeating my printf debugging by dropping in that a of code that looked for the URL I wanted to find earlier and earlier, making sure it was there (or not) as I went. As I was getting to the registerSymbols function on DocumentationContext, I realized the data types didn’t include the URL. I needed to understand what was beneath. I started off looking for name on the underlying types, and quickly found a surprise – it was always there, even when the URL wasn’t.

That’s when I clued in that filter added on knownPages. I realized the key difference wasn’t if the node existed, but how it was set up. The node always existed, and when the code was working, the isVirtual property was false. When it failed, isVirtual was true.

What the hell is isVirtual, what’s it mean, and where does it come from?

I was a bit confused and frustrated at this point. Not that I couldn’t find a comment that spelled out that ‘isVirtual’ meant that it shouldn’t be rendered, but that I just didn’t understand the implications and where it all came from, what it meant across all those contexts, and why it was needed. Turns out I didn’t really need to know all that detail, but since it was what was different, I wanted to understand.

I took a bit of time to look at the raw JSON of a symbolgraph file, and found that isVirtual comes from the compiler itself, and is carried through, for most symbol nodes. In that same process, I also realized that the symbol graph from the compiler did not have a symbol for the module itself. So something in the code that was loading the symbol graphs was adding a node for the module, and setting its value, and sometimes incorrectly. I continued to have this hypothesis that it was some wacky race condition in the code that hadn’t been spotted. And it sort of was, but at the algorithm level, not the code-threading level.

As a side note, isVirtual in a documentation node fundamentally means “don’t render this”. The idea being that it’s only there to link things together – relationships, overlays, etc.

SymbolGraphLoader and SymbolKit

The symbol graph loader takes in one or more symbol graph files, merges them all together, cleans them up a smidge, and creates the nodes needed to represent the higher level connections and relationships. While I still didn’t fully grok the isVirtual property and its implications, I knew that how it was being set for the module level node was what I cared about.

When I was looking for that code, I found the following:

private static func moduleNameFor(_ symbolGraph: SymbolGraph, at url: URL) -> (String, Bool)

I hadn’t yet joined the dots to see where it was set up, but knowing if something was a “main” symbol graph or not sounded promising. I kept that in mind while I dug further. I kept digging and found where the loader was collecting the symbol graphs, annotating them, and merging them. It SymbolGraph loader fed them all into an instance of GraphCollector. And that code is from a different library: SymbolKit.

Scanning through that code, I found the same function: moduleNameFor. Same parameters and outputs, a public symbol in SymbolKit and private in Docc. I’m guessing it started in Swift Docc, and was later extracted out into the library. The end result was identical logic in two places, so I made a note to clean that up later.

The GraphCollector turned out to be key. It holds the source for the data that’s used to determine the top-level module node.

The mergeSymbolGraph method in the graph collector pulls everything together. Within the collector, the data about the graphs are stored in dictionaries keyed by the name of the module. In addition to providing a unified graph by the module name, it also keeps track of each modules it loads, and marks it as a primary module or an extension module. The “is it the primary module graph” setting uses the logic in moduleNameFor.

In this function, you provide a loaded symbol graph and the name of the url, and it returns back the name of the module described in the graph and if it is a primary module. The key logic that makes this determination was the following line:

let isMainSymbolGraph = !url.lastPathComponent.contains("@")

The presumption when this was written was that all symbol graphs would come from the compiler, and the ones that extend an existing symbol graph will have an @ symbol in the name. Snippets blow up the assumption. The snippet-extractor code, that creates the symbol graph from snippets, names the symbol graph file YourModule-snippets.symbols.json. Because the name didn’t include an @ symbol, snippet graph files were being regarded as “primary” graphs.

Back in the DocumentationContext, there’s an extension on SymbolGraphLoader that provides the URL for the module: mainModuleURL – and this is where the flaw exhibits. The extension’s method uses first on the list of graphLocations from the collector to get the primary module, which assumes there’s only one. When more than one exists, it returns a non-deterministic result. Sometimes it was returning the one referenced from the snippet symbol graph, and other times it was the “right” symbol graph.

It’s not a direct path to easily find where and how that’s used to set up the URL. The line of code that pulls this detail:

let fileURL = symbolGraphLoader.mainModuleURL(forModule: moduleName)

uses that later to mix together the topic graph:

addSymbolsToTopicGraph(symbolGraph: unifiedSymbolGraph, url: fileURL, symbolReferences: symbolReferences, moduleReference: moduleReference)

This is what ultimately mixes in the isVirtual property into the topic graph, and that’s what sets the module to isVirtual – assuming there’s only one and using the “first” one it grabs.

Fixing the issue

The work above took place over the course of 3 days and resulted in 3 issues reported, one each to swift-docc-plugin, swift-docc, and swift-docc-symbolkit. The first of those I closed as soon as I realized it had nothing to do with the issue.

I stripped back out my printf debugging code, and in the end there were two relatively small changes that I made into pull requests – one for the fix in SymbolKit, and a supplement that just cleaned things up a bit in DocC.

I’ve proposed a solution that changes the logic in SymbolKit to support the fundamental assumptions that “there can be only one” main symbol graph. I did think about trying to represent the snippets as a different type in the collector (other than primary and extension), but I spotted a number of other places in the code that had that idea heavily built-in. They also leveraged .first() to get at the primary module, if it existed. Since snippets were added a couple of years ago, and this hadn’t been identified and debugged, I wasn’t sure what was expected for the function returning the name and processing of snippet symbol graphs. I opted for a change that tweaks the logic in SymbolKit, adding an inspecting of the isVirtual property in the metadata of the module in the symbolgraph in addition to the verifying there wasn’t an @ symbol in the filename.

I also opened a supplemental PR in docc to de-duplicate that logic and keep it all in once place. The main issue (the comments in which are a summarized, short-form of this post) looks like it’ll be fully resolved by the change in SymbolKit. But since I’d done the digging, and noticed the duplication, I figured it wouldn’t hurt to help clean things up a bit.

My Favorite Swift 6 feature: static library compilation on Linux

There is a lot of great stuff coming in the Swift programming language. I love the focus and effort on validating data-race safety, and is probably the feature set that I’ll spend the most time with. But my favorite new tidbit? Swift 6 now supports a Linux SDK and the ability to compile a stand-alone, statically linked binary.

The real detail for all this is in the blog post on Swift.org: Getting Started with the Static Linux SDK. It’s a capability that makes deploying server-side applications to Linux much easier. I’ve been looking forward to this capability for quite a while.

Statically linked binaries are a standard Go language feature. To me, it was a huge enabling feature of seeing Go sweep the cloud-services open-source space over the past decade (mostly with the wave of Kubernetes). I hope that after Swift 6 is fully released, it helps serve the same purpose.

Beyond using Swift for server-side apps, there’s a whole realm of expansion on where and how you can use Swift. The other place that really calls it out this year is the project’s fantastic push “into the small” with an explicit “Embedded Swift” mode – a strict subset of the features that make it both possible, and effective, to take advantage of the Swift language safety features while deploying to extremely constrained compute – microcontrollers. Watch the video from this year’s WWDC: Go small with Embedded Swift, to get more details if that sounds interesting.

Class 5 Geomagnetic Storm

Images from adjacent to downtown Seattle (meaning a LOT of light pollution), from 11:20 to 11:50pm local time, May 10th.

Most of this was nearly impossible to see at this color with the naked eye. They seemed like wispy clouds, and only the very brightest tints of red or green would start to hint against the sky. Use an iPhone camera though…

Designing a Swift library with data-race safety

I cut an initial release (0.1.0-alpha) of the library automerge-repo-swift. A supplemental library to Automerge swift, it adds background networking for sync and storage capabilities. The library extends code I initially created in the Automerge demo app (MeetingNotes), and was common enough to warrant its own library. While I was extracting those pieces, I leaned into the same general pattern that was used in the Javascript library automerge-repo. That library provides largely the same functionality for Automerge in javascript. I borrowed the public API structure, as well as compatibility and implementation details for the Automerge sync protocol. One of my goals while assembling this new library was to build it fully compliant with Swift’s data-race safety. Meaning that it compiles without warnings when I use the Swift compiler’s strict-concurrency mode.

There were some notable challenges in coming up to speed with the concepts of isolation and sendability. In addition to learning the concepts, how to apply them is an open question. Not many Swift developers have embraced strict concurrency and talked about the trade-offs or implications for choices. Because of that, I feel that there’s relatively little available knowledge to understand the trade-offs to make when you protect mutable state. This post shares some of the stumbling blocks I hit, choices I made, and lessons I’ve learned. My hope is that it helps other developers facing a similar challenge.

Framing the problem

The way I try to learn and apply new knowledge to solve these kinds of “new fangled” problems is first working out how to think about the problem. I’ve not come up with a good way to ask other people how to do that. I think when I frame the problem with good first-principles in mind, trade-offs in solutions become easier to understand. Sometimes the answers are even self-obvious.

The foremost principle in strict-concurrency is “protect your mutable state”. The compiler warnings give you feedback about potential hazards and data-races. In Swift, protecting the state uses a concept of an “isolation domain”. My layman’s take on isolation is “How can the compiler verify that only one thread is accessing this bit of data at a time”. There are some places where the compiler infers the state of isolation, and some of them still changing as we progress towards Swift 6. When you’re writing code, the compiler knows what is isolated (and non-isolated) – either by itself or based on what you annotated. When the compiler infers an isolation domain, that detail is not (yet?) easily exposed to developers. It really only shows up when there’s a mismatch in your assumptions vs. what the compiler thinks and it issues a strict-concurrency warning.

Sendability is the second key concept. In my layman’s terms again, something that is sendable is safe to cross over thread boundaries. With Swift 5.10, the compiler has enough knowledge of types to be able to make guarantees about what is safe, and what isn’t.

The first thing I did was lean heavily into making anything and everything Sendable. In hindsight, that was a bit of a mistake. Not disastrous, but I made a lot more work for myself. Not everything needs to be sendable. Taking advantage of isolation, it is fine – sometimes notably more efficient and easier to reason about – to have and use non-sendable types within an isolation domain. More on that in a bit.

My key to framing up the problem was to think in terms of making explicit choices about what data should be in an isolation region along with how I want to pass information from one isolation domain to another. Any types I pass (generally) need to be Sendable, and anything that stays within an isolation domain doesn’t. For this library, I have a lot of mutable state: networking connections, updates from users, and a state machine coordinating it all. All of it is needed so a repository can store and synchronize Automerge documents. Automerge documents themselves are Sendable (I had that in place well before starting this work). I made the Automerge documents sendable by wrapping access and updates to anything mutable within a serial dispatch queue. (This was also needed because the core Automerge library – a Rust library accessed through FFI – was not safe for multi-threaded use).

Choosing Isolation

I knew I wanted to make at least one explicit isolation domain, so the first question was “Actor or isolated class?” Honestly, I’m still not sure I understand all the tradeoffs. Without knowing what the effect would be to start off with, I decided to pick “let’s use actors everywhere” and see how it goes. Some of the method calls in the design of the Automerge repository were easily and obviously async, so that seemed like a good first cut. I made the top-level repo an actor, and then I kept making any internal type that had mutable state also be it’s own actor. That included a storage subsystem and a network subsystem, both of which I built to let someone else provide the network or storage provider external to this project. To support external plugins that work with this library, I created protocols for the storage and network provider, as well as one that the network providers use to talk back to the repository.

The downside of that choice was two-fold – first setting things up, then interacting with it from within a SwiftUI app. Because I made every-darn-thing an actor, I hade to await a response, which meant a lot of potential suspension points in my code. That also propagated to imply even setup needed to be done within an async context. Sometimes that’s easy to arrange, but other times it ends up being a complete pain in the butt. More specifically, quite a few of the current Apple-provided frameworks don’t have or provide a clear path to integrate async setup hooks. The server-side Swift world has a lovely “set up and run” mechanism (swift-service-lifecycle) it is adopting, but Apple hasn’t provided a similar concept the frameworks it provides. The one that bites me most frequently is the SwiftUI app and document-based app lifecycle, which are all synchronous.

Initialization Challenges

Making the individual actors – Repo and the two network providers I created – initializable with synchronous calls wasn’t too bad. The stumbling block I hit (that I still don’t have a great solution to) was when I wanted to add and activate the network providers to a repository. To arrange that, I’m currently using a detached Task that I kick off in the SwiftUI App’s initializer:

public let repo = Repo(sharePolicy: .agreeable)
public let websocket = WebSocketProvider()
public let peerToPeer = PeerToPeerProvider(
    PeerToPeerProviderConfiguration(
        passcode: "AutomergeMeetingNotes",
        reconnectOnError: true,
        autoconnect: false
    )
)

@main
struct MeetingNotesApp: App {
    var body: some Scene {
        DocumentGroup {
            MeetingNotesDocument()
        } editor: { file in
            MeetingNotesDocumentView(document: file.document)
        }
        .commands {
            CommandGroup(replacing: CommandGroupPlacement.toolbar) {
            }
        }
    }

    init() {
        Task {
            await repo.addNetworkAdapter(adapter: websocket)
            await repo.addNetworkAdapter(adapter: peerToPeer)
        }
    }
}

Swift Async Algorithms

One of the lessons I’ve learned is that if you find yourself stashing a number of actors into an array, and you’re used to interacting with them using functional methods (filter, compactMap, etc), you need to deal with the asynchronous access. The standard library built-in functional methods are all synchronous. Because of that, you can only access non-isolated properties on the actors. For me, that meant working with non-mutable state that I set up during actor initialization.

The second path (and I went there) was to take on a dependency to swift-async-algorithms, and use its async variations of the functional methods. They let you “await” results for anything that needs to cross isolation boundaries. And because it took me an embarrasingly long time to figure it out: If you have an array of actors, the way to get to an AsyncSequence of them is to use the async property on the array after you’ve imported swift-async-algorithms. For example, something like the following snippet:

let arrayOfActors: [YourActorType] = []
let filteredResults = arrayOfActors.async.filter(...)

Rethinking the isolation choice

That is my first version of this library. I got it functional, then turned around and tore it apart again. In making everything an actor, I was making LOTS of little isolation regions that the code had to hop between. With all the suspension points, that meant a lot of possible re-ordering of what was running. I had to be extrodinarily careful not to assume a copy of some state I’d nabbed earlier was still the same after the await. (I still have to be, but it was a more prominent issue with lots of actors.) All of this boils down to being aware of actor re-entrancy, and when it might invalidate something.

I knew that I wanted at least one isolation region (the repository). I also want to keep mutable state in separate types to preserve an isolation of duties. One particular class highlighted my problems – a wrapper around NWConnection that tracks additional state with it and handles the Automerge sync protocol. It was getting really darned inconvenient with the large number of await suspension points.

I slowly clued in that it would be a lot easier if that were all synchronous – and there was no reason it couldn’t be. In my ideal world, I’d have the type Repo (my top-level repository) as an non-global actor, and isolate any classes it used to the same isolation zone as that one, non-global, actor. I think that’s a capability that’s coming, or at least I wasn’t sure how to arrange that today with Swift 5.10. Instead I opted to make a single global actor for the library and switch what I previously set up as actors to classes isolated to that global actor.

That let me simplify quite a bit, notably when dealing with the state of connections within a network adapter. What surprised me was that when I switched from Actor to isolated class, there were few warnings from the change. The changes were mostly warnings that calls dropped back to synchronous, and no longer needed await. That was quick to fix up; the change to isolated classes was much faster and easier than I anticipated. After I made the initial changes, I went through the various initializers and associated configuration calls to make more of it explicitly synchronous. The end result was more code that could be set up (initialized) without an async context. And finally, I updated how I handled the networking so that as I needed to track state, I didn’t absolutely have to use the async algorithsm library.

A single global actor?

A bit of a side note: I thought about making Repo a global actor, but I prefer to not demand a singleton style library for it’s usage. That choice made it much easier to host multiple repositories when it came time to run functional tests with a mock In-Memory network, or integration tests with the actual providers. I’m still a slight bit concerned that I might be adding to a long-term potential proliferation of global actors from libraries – but it seems like the best solution at the moment. I’d love it if I could do something that indicated “All these things need a single isolation domain, and you – developer – are responsible for providing one that fits your needs”. I’m not sure that kind of concept is even on the table for future work.

Recipes for solving these problems

If you weren’t already aware of it, Matt Massicotte created a GitHub repository called ConcurrencyRecipes. This is a gemstone of knowledge, hints, and possible solutions. I leaned into it again and again while building (and rebuilding) this library. One of the “convert it to async” challenges I encountered was providing an async interface to my own peer-to-peer network protocol. I built the protocol using the Network framework based (partially on Apple’s sample code), which is all synchronous code and callbacks. A high level, I wanted it to act similarly URLSessionWebSocketTask. This gist being a connection has an async send() and an async receive() for sending and receiving messages on the connection. With an async send and receive, you can readily assemble several different patterns of access.

To get there, I used a combination of CheckedContinuation (both the throwing and non-throwing variations) to work with what NWConnection provided. I wish that was better documented. How to properly use those APIs is opaque, but that is a digression for another time. I’m particular happy with how my code worked out, including adding a method on the PeerConnection class that used structured concurrency to handle a timeout mechanism.

Racing tasks with structured concurrency

One of the harder warnings for me to understand was related to racing concurrent tasks in order to create an async method with a “timeout”. I stashed a pattern for how to do this in my notebook with references to Beyond the basics of structured concurrency from WWDC23.

If the async task returns a value, you can set it up something like this (this is from PeerToPeerConnection.swift):

let msg = try await withThrowingTaskGroup(of: SyncV1Msg.self) { group in
    group.addTask {
        // retrieve the next message
        try await self.receiveSingleMessage()
    }

    group.addTask {
        // Race against the receive call with a continuous timer
        try await Task.sleep(for: explicitTimeout)
        throw SyncV1Msg.Errors.Timeout()
    }

    guard let msg = try await group.next() else {
        throw CancellationError()
    }
    // cancel all ongoing tasks (the websocket receive request, in this case)
    group.cancelAll()
    return msg
}

There’s a niftier version available in Swift 5.9 (which I didn’t use) for when you don’t care about the return value:

func run() async throws {
    try await withThrowingDiscardingTaskGroup { group in
        for cook in staff.keys {
            group.addTask { try await cook.handleShift() }
        }

        group.addTask { // keep the restaurant going until closing time
            try await Task.sleep(for: shiftDuration)
            throw TimeToCloseError()
        }
    }
}

With Swift 5.10 compiler, my direct use of this displayed a warning:

warning: passing argument of non-sendable type 'inout ThrowingTaskGroup<SyncV1Msg, any Error>' outside of global actor 'AutomergeRepo'-isolated context may introduce data races

guard let msg = try await group.next() else {
                          ^

I didn’t really understand the core of this warning, so I asked on the Swift forums. VNS (on the forums) had run into the same issue and helped explain it:

It’s because withTaskGroup accepts a non-Sendable closure, which means the closure has to be isolated to whatever context it was formed in. If your test() function is nonisolated, it means the closure is nonisolated, so calling group.waitForAll() doesn’t cross an isolation boundary.

The workaround to handle the combination of non-sendable closures and TaskGroup is to make the async method that runs this code nonisolated. In the context I was using it, the class that contains this method is isolated to a global actor, so it’s inheriting that context. By switching the method to be explicitly non-isolated, the compiler doesn’t complain about group being isolated to that global actor.

Sharing information back to SwiftUI

These components have all sorts of interesting internal state, some of which I wanted to export. For example, to provide information from the network providers to make a user interface (in SwiftUI). I want to be able to choose to connect to endpoints, to share what endpoints might be available (from the NWBrowser embedded in the peer to peer network provider), and so forth.

I first tried to lean into AsyncStreams. While they make a great local queue for a single point to point connection, I found they were far less useful to generally make a firehouse of data that SwiftUI knows how to read and react to. While I tried to use all the latest techniques, to handle this part I went to my old friend Combine. Some people are effusing that Combine is dead and dying – but boy it works. And most delightfully, you can have any number of endpoints pick up and subscribe to a shared publisher, which was perfect for my use case. Top that off with SwiftUI having great support to receive streams of data from Combine, and it was an easy choice.

I ended up using Combine publishers to make a a few feeds of data from the PeerToPeerProvider. They share information about what other peers were available, the current state of the listener (that accepts connections) and the browser (that looks for peers), and last a publisher that provides information about active peer to peer connctions. I feel that worked out extremely well. It worked so well that I made an internal publisher (not exposed via the public API) for tests to get events and state updates from within a repository.

Integration Testing

It’s remarkably hard to usefully unit test network providers. Instead of unit testing, I made a separate Swift project for the purposes of running integration tests. It sits in it’s own directory in the git repository and references automerge-repo-swift as a local dependency. A side effect is that it let me add in all sorts of wacky dependencies that were handy for the integration testing, but that I really didn’t want exposed and transitive for the main package. I wish that Swift Packages had a means to identify test-only dependencies that didn’t propagate to other packages for situations like this. Ah well, my solution was a separate sub-project.

Testing using the Combine publisher worked well. Although it took a little digging to figure out the correct way to set up and use expectations with async XCTests. It feels a bit exhausting to assemble the expectations and fulfillment calls, but its quite possible to get working. If you want to see this in operation, take a look at P2P+explicitConnect.swift. I started to look at potentially using the upcoming swift-testing, but with limited Swift 5.10 support, I decided to hold off for now. If it makes asynchronous testing easier down the road, I may well adopt it quickly after it’s initial release.

The one quirky place that I ran into with that API setup was that expectation.fulfill() gets cranky with you if you call it more than once. My publisher wasn’t quite so constrained with state updates, so I ended up cobbling a boolean latch variable in a sink when I didn’t have a sufficiently constrained closure.

The other quirk in integration testing is that while it works beautifully on a local machine, I had a trouble getting it to work in CI (using GitHub Actions). Part of the issue is that the current swift test defaults to running all possible tests at once, in parallel. Especially for integration testing of peer to peer networking, that meant a lot of network listeners, and browsers, getting shoved together at once on the local network. I wrote a script to list out the tests and run them one at a time. Even breaking it down like that didn’t consistently get through CI. I also tried higher wait times (120 seconds) on the expectations. When I run them locally, most of those tests take about 5 seconds each.

The test that was a real challenge was the cross-platform one. Automerge-repo has a sample sync server (NodeJS, using Automerge through WASM). I created a docker container for it, and my cross-platform integration test pushes and pulls documents to an instance that I can run in Docker. Well… Docker isn’t available for macOS runners, so that’s out for GitHub Actions. I have a script that spins up a local docker instance, and I added a check into the WebSocket network provider test – if it couldn’t find a local instance to work against, it skips the test.

Final Takeaways

Starting with a plan for isolating state made the choices of how and what I used a bit easier, and reaching for global-actor constrained classes made synchronous use of those classes much easier. For me, this mostly played out in better (synchronous) intializers and dealing with collections using functional programming patterns.

I hope there’s some planning/thinking in SwiftUI to update or extend the app structure to accomodate async hooks for things like setup and initialization (FB9221398). That should make it easier for a developer to run an async initializer and verify that it didn’t fail, before continuing into the normal app lifecycle. Likewise, I hope that the Document-based APIs gain an async-context to work with documents to likewise handle asynchronous tasks (FB12243722). Both of these spots are very awkward places for me.

Once you shift to using asynchronous calls, it can have a ripple effect in your code. If you’re looking at converting existing code, start at the “top” and work down. That helped me to make sure there weren’t secondary complications with that choice (such as a a need for an async initializer).

Better yet, step back and take the time to identify where mutable state exists. Group it together as best you can, and review how you’re interacting it, and in what isolation region. In the case of things that need to be available to SwiftUI, you can likely isolate methods appropriately (*cough* MainActor *cough*). Then make the parts you need to pass between isolation domains Sendable. Recognize that in some cases, it may be fine to do the equivalent of “Here was the state at some recent moment, if you might want to react to that”. There are several places where I pass back a summary snapshot of mutable state to SwiftUI to use in UI elements.

And do yourself a favor and keep Matt’s Concurrency Recipes on speed-dial.

Before I finished this post, I listened to episode 43 of the Swift Package Index podcast. It’s a great episode, with Holly Bora, compiler geek and manager of the Swift language team, on as a guest to talk about the Swift 6. A tidbit she shared was that they are creating a Swift 6 migration guide, to be published on the swift.org website. Something to look forward to, in addition to Matt’s collection of recipes!

Distributed Tracing with Testing on iOS and macOS

This weekend I was frustrated with my debugging, and just not up to digging in and carefully, meticulously analyzing what was happening. So … I took a left turn (at Alburquerque) and decided to explore an older idea to see if it was interesting and/or useful. My challenging debugging was all about network code, for a collaborative, peer to peer sharing thing; more about that effort some other time.

A bit of back story

A number of years ago when I was working with a solar energy manufacturer, I was living and breathing events, APIs, and running very distributed, sometimes over crap network connections, systems. One of the experiments I did (that worked out extremely well) was to enable distributed tracing across the all the software components, collecting and analyzing traces to support integration testing. Distributed tracing, and the now-popular CNCF OpenTelemetry project weren’t a big thing, but they were around – kind of getting started. The folks (Yuri Shkuro, at least) at Uber had released Jaeger, an open-source trace collector with web-based visualization, which was enough to get started. I wrote about that work back in 2019 (that post still gets some recurring traffic from search engines, although it’s pretty dated now and not entirely useful).

We spun up our services, enabled tracing, and ran integration tests on the whole system. After which, we had the traces available for visual review. It was useful enough that we ended up evolving it so that a single developer could stand up most of their pieces locally (with a sufficiently beefy machine), and capture and view the traces locally. That provided a great feedback loop as they could see performance and flows in the system while they were developing fixes, updates and features. I wanted to see, this time with an iOS/macOS focused library, how far I could get trying to replicate that idea (time boxed to the weekend).

The Experiment!

I’ve been loosely following the server-side swift distributed tracing efforts since it started, and it looked pretty clear that I could use it directly. Moritz Lang publishes swift-otel, which is a Swift native, concurrency supported library. With his examples, it was super quick to hack into my test setup. The library is set up to run with service-lifecycle pieces over SwiftNIO, so there’s a pile of dependencies that come in with it. To add to my library, I’d be a little hesitant, but an integration test thing, I’m totally good with that. There were some quirks to using it with XCTest, most of which I hacked around by shoving the tracer setup into a global actor and exposing an idempotent bootstrap call. With that in place, I added explicit traces into my tests, and then started adding more and more, including into my library, and could see the results in a locally running instance of Jaeger (running Jaeger using Docker).

Some Results

The following image is an overview of the traces generated by a single test (testCreate):

The code I’m working with is all pushing events over web sockets, so inside of the individual spans (which are async closures in my test) I’ve dropped in some span events, one of which is shown in detail below:

In a lot of respects, this is akin to dropping in os_signposts that you might view in Instruments, but it’s external to Xcode infrastructure. Don’t get me wrong, I love Instruments and what it does – it’s been amazing and really the gold standard in tooling for me for years – but I was curious how far this approach would get me.

Choices and Challenges

Using something like this in production – with live-running iOS or macOS apps – would be another great end-to-end scenario. More so if the infrastructure your app was working from also used tracing. There’s a separate tracing project at CNCF – OpenTelemetry Swift – that looks oriented towards doing just that. I seriously considered using it, but I didn’t see a way to use that package to instrument my library and not bring in the whole pile of dependencies. With the swift-distributed-tracing library, it’s an easy (and small) dependency add – and you only need to take the hit of the extra dependencies when you want to use the tracing.

And I’ll just “casually” mention that if you pair this with server-side swift efforts, the Hummingbird project has support for distributed tracing currently built in. I expect Vapor support isn’t too far off, and it’s a continued focus to add more distributed tracing support for a number of prevalent server-side swift libraries over this coming summer.

See for Yourself (under construction/YMMV/etc)

I’ve tossed up my hack-job of a wrapper for tracing during testing with iOS and macOS – DistributedTracer, if you want to experiment with this kind of thing yourself. Feel free to use it, although if you’re amazed with the results – ALL credit should go to Moritz, the contributors to his package and the contributors to swift-distributed-tracing, since they did the heavy lifting. The swift-otel library itself is undergoing some major API surface changes – so if you go looking, I worked from the current main branch rather than the latest release. Moritz shared with me that while the API was not completely solid yet, this is more of the pattern he wants to expose for an upcoming 1.0 release.

Onward from here

I might push the DistributedTracer package further in the future. I think there’s real potential there, but it is not without pitfalls. Some of the challenges stem from constantly exporting data from an iOS app, so there’s a privacy (and privacy manifest) bit that needs to be seriously considered. There are also challenges with collecting enough data (but not too much), related choices in sampling so that it aligns with traces generated from infrastructure, as well as how to reliably transfer it from device to an endpoint. Nothing that can’t be overcome, but it’s not a small amount of work either.

Weekend hacking complete, I’m calling this a successful experiment. Okay, now back to actually debugging my library…

Embedding a privacy manifest into an XCFramework

During WWDC 2023, Apple presented a number of developer-impacting privacy updates. One of the updates, introducing the concept of a privacy manifest, has a direct impact on the work I’ve been doing making the CRDT library Automerge available on Apple platforms. The two relevant sessions from WWDC 2023:

  • Get Started with Privacy Manifests (video) (notes)
  • Verify app dependencies with digital signatures (video) (notes)

During the sessions, the presenter shared that somewhere in the coming year (2024) Apple would start requiring privacy manifests in signed XCFrameworks. There was little concrete detail available then, and I’ve been waiting since for more information on how to comply. I expected documentation at least, and was hoping for an update in Xcode – specifically the xcodebuild command – to add an option that accepted a path to a manifest and included it appropriately. So far, nothing from Apple on that front.

About a week ago I decided to use a DTS ticket to get assistance on how to (properly) add privacy manifest to an XCFramework (and filed feedback: FB13626419). I hope that something is planned to make this easier, or at the minimum document a process, since it now appears to be an active requirement for new apps presented to the App Store. I highly doubt we’ll see anything between now and WWDC at this point. With any luck, we’ll see something this June (WWDC 24).

I have a hypothesis that, with the updates to enable signed binary dependencies, there could be “something coming” about a software bill-of-materials manifest. My over-active imagination thinks there are hints of that correlated with what swift is recording in Package.resolved, and seeming to start to take advantage of within the proposed new approach to swift testing. It would make a lot of sense to support better verification and clear knowledge of what you’re including in your apps, or depending on for your libraries (and extremely useful metadata for testing validation).

In the meantime, if you’re Creating an XCFramework and trying to figure out how to comply with Apple’s requests for embedded privacy manifests, hopefully this article helps you get there. As I mentioned at the top of this post, this is based on my open source work in Automerge-swift. I’m including the library and XCFramework (and show it off) in a demo application. I just finished working through the process of getting the archives validated and pushed to App Store Connect (with macOS and iOS deliverables). To be very clear, the person I worked with at DTS was both critical and super-helpful. Without this information I would have been wandering blindly for months trying to get this sorted. All credit to them for the assistance.

The gist of what needs to be done lines up with Apple’s general platform conventions for placing resources into bundles (detailed at Placing Content in a Bundle). The resource in this case is the file PrivacyInfo.xcprivacy, and the general pattern plays out as:

  • iOS and iOS simulator: place the resource at the root for that platform
  • macOS and Mac Catalyst: place the resource in a directory structure /Versions/A/Resources/

The additional quirk in this case is that with an XCFramework created from platform-specific static libraries, you also need to put that directory structure underneath the directory that is the platform signifier. (An example is shown below, illustrating this. I know it’s not super clear; I don’t either know, or have, the words to correctly describe these layers in the a directory structure.)

I do this with a bash script that copies the privacy manifest into the place relevant for each platform target. In the case of automerge-swift, we compile to support iOS, the iOS simulators (on x86 and arm architectures), macOS (on x86 and arm architectures), and Mac Catalyst (on x86 and arm architectures).

Once the files are copied into place, I code sign the bundle:

codesign --timestamp -v --sign "...my developer id..." ${FRAMEWORK_NAME}.xcframework

After which, compress it down using ditto, and compute the SHA256 checksum. That checksum is used to create a validation hash for a URL reference in a Package.swift. (If you want to see the scripts, have at – they’re on GitHub. The scripts are split at the end – one for CI that doesn’t sign, and one for release that does.)

Seeing the layout of the relevant files in an XCFramework was the most helpful piece for me to assemble this together, so let me share the directory structure of my XCFramework. The example below, called automergeFFI.xcframework, hopefully shows you the details without flooding you in extraneous files; it skips the header or code signature specific files:

automergeFFI.xcframework/
Info.plist
_CodeSignature/

macos-arm64_x86_64/Headers
macos-arm64_x86_64/libuniffi_automerge.a
Versions/A/Resources/
PrivacyInfo.xcprivacy

ios-arm64_x86_64-simulator
ios-arm64_x86_64-simulator/Headers
ios-arm64_x86_64-simulator/libuniffi_automerge.a
ios-arm64_x86_64-simulator/PrivacyInfo.xcprivacy

ios-arm64_x86_64-maccatalyst
ios-arm64_x86_64-maccatalyst/
Versions/A/Resources/
PrivacyInfo.xcprivacy
ios-arm64_x86_64-maccatalyst/Headers
ios-arm64_x86_64-maccatalyst/libuniffi_automerge.a

ios-arm64
ios-arm64/Headers
ios-arm64/libuniffi_automerge.a
ios-arm64/PrivacyInfo.xcprivacy

With this in place, signed and embedded as a normal dependency through Xcode, both the iOS demo app and the macOS demo app passed the pre-flight validation and moved on through to TestFlight.

A week on with a VisionPro

There are excellent reviews of the VisionPro “out there”, this post isn’t meant as another. It’s a record of my first experiences, thoughts, and scribbled notes for future me to look back on after a few iterations of the product.

I had been planning on getting a Vision Pro when it was first rumored. I put away funds from contracts and gigs, and when the time came and it was available for order, I still had sticker shock. When I bought one, I didn’t skimp, but I didn’t blow it out either. My goal is to learn this product – how it works and how to work with it, and to write apps that work beautifully on it. When the available-to-developers-only head-strap extension was announced, I grabbed it too. My prior experience with any headset is using an Oculus (now Meta) Quest 2, which was fun and illustrative – but I couldn’t use it more than a few hours before nausea would start to catch up with me.

Right off, the visual clarity of the Vision Pro blew me away. The displays are mind-bogglingly good, and the 3D effect is instantly crisp and clear. I found myself exploring the nooks and corners of the product that first evening, without a hint of nausea that I’d feared might happen. The two and a half hours of battery life came quickly.

Beyond the stunning visuals, I wanted to really understand and use the interaction model. From the API, I know it supports both indirect and direct interaction using hand-tracking. Most of the examples and interactions I had at the start were “indirect” – meaning that where I looked is where actions would trigger (or not) when I tapped my fingers together. It’s intuitive, easy to get started with very quickly, and (sometimes too) easy to forget it’s a control and accidentally invoke it.

In early window managers on desktop computers, there was a pattern of usage called “focus follows mouse” (which Apple pushed hard to move away from). The idea was that whichever window your mouse cursor was over is where keyboard input would be directed. The indirect interaction mode on Vision Pro is that on steroids, and it takes some getting used to. In several cases, I found myself looking away from the control while wanting to continue using it, with results that were messy – activating other buttons, etc.

Most of the apps (even iOS apps “just” running on Vision Pro) worked flawlessly and easily, and refreshingly didn’t feel as out of place as iOS designed apps feel on an iPad (looking at you Instagram). One of the most useful visual affordances is a slight sheen that the OS plays over areas that are clearly buttons or targeted controls, which makes a wonderful feedback loop so that you know you’re looking at the right control. The gaze tracking is astoundingly good – so much better than I though it would be – but it still needs some space for grace. iOS default distances mostly work, although in a densely packed field of controls I’d want just a touch more space between them myself. After wearing the device for a couple of hours, I’d find the tracking not as crisp and I’d have a bit more error. Apps that eschewed accessible buttons for random visuals and tap targets are deeply annoying in Vision Pro. You get no feedback affordances to let you know if you’re on target or not. (D&D Beyond… I’ve got to say, you’ve got some WORK to do)

Targeting actions (or not) gets even more complicated when you’re looking at touchable targets in a web browser. Video players in general are a bit of a tar pit in terms of useful controls and feedback. Youtube’s video player was better than some of the others, but web pages in general were a notable challenge – especially the ones flooded with ads, pop-overs, and shit moving around and “catching your eye”. The term becoming far more literal and relevant when you accidentally trigger an errant click after some side movement shifted my gaze, and now I’m looking at some *%&$!!# video ad that I want nothing to do with.

In a win to potential productivity for me, you can have windows everywhere. The currently-narrowish field of vision constrains it: you have move your head – instead of glance – to see some side windows. It’s a huge refresher to the “do one thing at a time” metaphor that didn’t exist on macOS, pervades iOS, and lives in some level of Dante’s inferno on iPadOS. I can see a path to being more productive with the visionOS “spatial computer” than I ever would be with an iPad. The real kicker for me (not yet explored), will be text selection – and specifically selecting a subrange of a bit of text. That use case is absolutely dreadful in Safari on iOS. For example, try and select the portion of the URL after the host name in the safari address bar. That seemingly simple task is a huge linchpin to my ability to work productively.

The weight and battery life of this first product release are definitely suboptimal. Easily survivable for me, but sometimes annoying. Given the outstanding technology that’s packed into this device, it’s not surprising. The headset sometimes feels like it’s slipping down my face, or I need to lift and reset it a bit to make it comfortable. For wearing the device over an hour or so while sitting upright, I definitely prefer to use the over-the-head strap – and I don’t give a shit what my hair looks like.

Speaking of caring what I look like – I despise the “persona” feature and won’t be using it. It’s straight into the gaping canyon of uncanny valley. I went through the process to set one up and took a look at it. I tried to be dispassionate about it, but ultimately fled in horror and don’t want a damn thing to do with it. I don’t even want to deal with FaceTime if that’s the only option. I’d far prefer to use one of those stylized Memoji, or be able to provide my own 3D animation puppet that was mapped to my facial expressions. I can make a more meaningful connection to a stylized image or puppet than I can to the necrotic apparition of the current Persona.

And a weird quirk: I have a very mobile and expressive face, and can raise and lower either eyebrow easily. I use that a lot in my facial expressions. The FaceTime facial expression tracking can’t clue in to that – it’s either both or not at all. While I’m impressed it can read anything about my eyebrows while wearing the Vision Pro, that’s a deal killer for representing my facial expressions.

Jumping back to something more positive – in terms of consuming media, the Vision Pro is a killer device right where it is now. The whole space of viewing and watching photos and video is amazing. The panoramas I’ve collected while traveling are everything I hoped for. The immersive 180° videos made me want to learn how to make some of those, and the stereoscopic images and video (smaller field of view, but same gist) are wonderful. It’s a potent upgrade to the clicking wheels of the 3D viewFinder from my childhood. Just watching a movie was amazing – either small and convenient to the side, or huge in the field of view – at my control – with with a truly impressive “immersive theater” mode that’s really effective. It’s definitely a solo experience in that respect – I can’t share watching a movie cuddled up on the couch, but even with the high price point – the video (and audio) quality of Vision Pro makes a massive theater out of the tightest cubby. In that respect, the current Vision Pro is a very comparable value to a large home theater.

Add on the environments (I’m digging Mt Hood a lot) – with slightly variable weather and environmental acoustics, day and night transitions – it’s a tremendous break. I’d love to author a few of those. A sort of crazy, dynamic stage/set design problem with a mix of lighting, sounds, supportive visual effects, and the high definition photography to backdrop it all. I was familiar with the concept from the Quest, but the production quality in the Vision Pro is miles ahead, so much more inviting because of that.

I looked at my M1 MacBook Pro and tapped on the connect button and instantly loved it. The screen on the laptop blanked out, replaced by a much larger, high resolution floating display above it. I need to transition my workspace to really work this angle, as its a bit tight for a Vision Pro. Where I work currently, there are overhead pieces nearby that impinge on the upper visual space, prompting warnings and visual intrusions when I’m looking around to keep me from hitting anything. Using the trackpad on the Mac as a pointer within Vision Pro is effective, and the keyboard is amazing. Without a laptop nearby, I’d need (or want) at least a keyboard connected – the pop-up keyboard can get the job done (using either direct or indirect interaction), but it’s horrible for anything beyond a few words.

I have a PS5 controller that I paired with my iPad for playing games, and later paired with the Mac to navigate in the Vision Pro simulator in Xcode. I haven’t paired it with the Vision Pro, but that’s something I’d really like to try – especially for a game. For the “immerse you in an amazing world” games that I enjoy, I can imagine the result. With the impressive results of the immersive environments, there’s a “something” there that I’d like to see. Something from Rockstar, Ubisoft, Hello World Games, of the Sony or Microsoft studios. No idea if that’ll appear as something streamed from a console, or running locally – but the possibilities are huge by leveraging the high visual production values that Vision Pro provides. I’m especially curious what Disney and Epic Games might do together – an expansion or side-track from their virtual sets, creating environments and scenes that physically couldn’t otherwise exist – and then interacting within them. I’m sure they’re thinking about the same. (Hey, No Man’s Sky – I’m ready over here!)

As a wrap up, my head’s been flooded with ideas for apps that lean into the capabilities of Vision Pro. Most are of the “wouldn’t it be cool!” variety, a few are insanely outlandish and would take a huge team of both artists and developers to assemble. Of the ones that aren’t so completely insane, the common theme is the visualization and presentation of information. A large part of my earlier career was more operationally focused: understanding large, distributed systems, managing services running on them, debugging things when “shit went wrong” (such as a DC bus bar in a data center exploding when a water leak dripped on it and shorted it out, scattering copper droplets everywhere). I believe there’s a real potential benefit to seeing information with another dimension added to it, especially when you want to look at what would classically be exposed as a chart, but with values that change over time. There’s a whole crazy world of software debugging and performance analysis, distributed tracing, and correlation with logging and metrics. All of which benefit from making it easier to quickly identify failures and resolve them.

I really want to push what’s available now in a volume 3D view. That’s the most heavily constrained 3D representation in visionOS today, primarily to keep anyone from knowing where you’re gazing as a matter of privacy. Rendering and updating 3D visualizations in a volume lets you “place” it anywhere nearby, change your position around it, and ideally interact with it to explore the information. I think that’s my first real target to explore.

I am curious where the overlap will appear with webGL and how that presents into the visionOS spatial repertoire. I haven’t yet explored that avenue, but it’s intriguing, especially for the data visualization use case.