This repository contains a .NET example of how to incrementally parse JSON
generated by a Large Language Model (LLM) and
stream the results as Newline delimited JSON (NDJSON).
OPENAI_API_KEY=your_openai_api_key \
dotnet run -p TodoList.Api/TodoList.Api.csproj
curl http://localhost:5179
LLMs like GPT-4o are now
good at generating JSON
, which opens up many possibilities.
Most of the time we can just wait for the LLM to complete the generation, parse the answer and return to the UI.
However, given the speed of LLMs, it can be frustrating for users to wait for the completion of the generation.
The best solution would be to display the generated content incrementally as soon as possible. This is quite easy with
text, but it's a bit more complicated with JSON
as we need to make sure that the content is valid at each step.
We need to parse the JSON
while it's being generated, understand the structure, and act accordingly.
We'll build a Life’s to-do list generator. The LLM will generate a list of tasks, and we'll return them as soon as they are generated via an NDJSON stream.
To make it more complex, this is the generated schema:
{
"listName": "Bucket List",
"items": [
{
"recommendedAge": 30,
"description": "Skydiving"
},
{
"recommendedAge": 50,
"description": "Visit all seven continents"
}
]
}
and this is the response:
{"$type": "todoListCreated", "listName": "Bucket List"}
{"$type": "todoListItemAdded", "recommendedAge": 30, "description": "Skydiving"}
{"$type": "todoListItemAdded", "recommendedAge": 50, "description": "Visit all seven continents"}
There are two ready-to-use tools we can use:
- Semantic Kernel: an SDK to interact with AI models.
- Utf8JsonReader:
a high-performance, low-allocation, forward-only reader for
JSON
.
It should be straightforward to combine these two tools to achieve our goal, there's even a section in the docs: Read from a stream using Utf8JsonReader!
Actually, there are multiple challenges:
- The reader example uses a
MemoryStream
while Semantic Kernel usesIAsyncEnumerable<StreamingTextContent>
. Utf8JsonReader
is aref struct
, so:- It doesn't work with streams anyway, it only accepts a
ReadOnlySpan<byte>
during creation. - It can't be passed as a parameter to an
async
method. - It can't be used across
await
oryield
boundaries. - It's a lexer/tokenizer, not a parser, so we need to handle the
JSON
structure ourselves.
- It doesn't work with streams anyway, it only accepts a
We need to solve two problems:
- How to use
Utf8JsonReader
withIAsyncEnumerable<StreamingTextContent>
. - How to parse the
JSON
structure incrementally.
Let's start with the parser as it's simpler.
How to use
SemanticKernel
and stream the results will be covered at the end.
The main method of Utf8JsonReader
is Read()
. A simple JSON
like { "name": "test" }
will return generate the
following tokens:
StartObject
PropertyName
String
EndObject
Each time we call Read()
, the reader move forward and we use:
TokenType
to know the type of the token.ValueSpan
, and other methods, to get the value of the token.- The
bool
returned to know if there are more tokens to read.
The interface for the parser is quite simple:
public interface IIncrementalJsonStreamParser<out TOut> where TOut : class
{
TOut[] ContinueParsing(ref Utf8JsonReader reader, ref bool completed);
}
Given an Utf8JsonReader
, it will return an array of objects of type TOut
and a bool
indicating if the parsing is completed.
The easiest way I found to parse the JSON
with this setup is a state machine.
With each token, we update the state of the machine and act accordingly, for example, by adding an item to the output array.
Here the state machine for the TODO
list:
public enum TodoParsingState
{
None,
ReadingName,
ReadingItems,
// ... more states
}
public class TodoListJsonVisitorParser() : IIncrementalJsonParser<TodoListBaseEvent>
{
// Event builders
private readonly ListState _listState = new();
private readonly ItemAddedState _listItemState = new();
private TodoParsingState _parsingState = TodoParsingState.None;
public TodoListBaseEvent[] ContinueParsing(ref Utf8JsonReader reader, ref bool completed)
{
List<TodoListBaseEvent> results = [];
while (reader.Read())
{
switch (reader.TokenType)
{
case JsonTokenType.PropertyName:
_parsingState = reader.GetString() switch
{
"listName" => TodoParsingState.ReadingName,
"items" => TodoParsingState.ReadingItems,
"recommendedAge" => TodoParsingState.ReadingItemRecommendedAge,
"description" => TodoParsingState.ReadingItemDescription,
_ => TodoParsingState.None
};
break;
case JsonTokenType.String:
var stringValue = reader.GetString() ?? string.Empty;
if (_parsingState == TodoParsingState.ReadingName)
{
_listState.Name = stringValue;
results.Add(_listState.ToEvent());
_parsingState = TodoParsingState.None;
}
break;
// ... more cases for other tokens
}
}
return results.ToArray();
}
}
Let's now see how we can keep feeding the parser.
Below an extension method to convert the IAsyncEnumerable<StreamingTextContent
to IAsyncEnumerable<TodoListBaseEvent>
.
In the final code, we'll stream directly to NDJSON.
public static class StreamingJsonParserExtensions
{
private static readonly byte[] NewLineBytes = Encoding.UTF8.GetBytes(Environment.NewLine);
public static async IAsyncEnumerable<TodoListBaseEvent> ToListBaseEvents<TOut>(
this IAsyncEnumerable<StreamingTextContent> input,
IIncrementalJsonParser<TOut> incrementalParser,
int chunkBufferSize = 48,
CancellationToken cancellationToken = default)
where TOut : class
{
// Control the pace of the stream by reading in chunks
var enumerator = input.GetAsyncEnumerator(cancellationToken);
try
{
// Buffer for the chunks of text
var buffer = new ArrayBufferWriter<byte>();
// Keep track of the state of the JSON reader
JsonReaderState jsonReaderState = new();
var completed = false;
while (!completed)
{
// Load the buffer with the next chunk of text
for (var i = 0; i < chunkBufferSize; i++)
{
// Get next token
var readSuccess = await enumerator.MoveNextAsync();
// Reached the end of the stream
if (!readSuccess)
{
completed = true;
break;
}
if (enumerator.Current?.Text == null)
{
continue;
}
var bytes = Encoding.UTF8.GetBytes(enumerator.Current.Text);
buffer.Write(bytes);
}
// Load the reader with the buffer
var reader = new Utf8JsonReader(
buffer.WrittenSpan,
isFinalBlock: false, // The input might be a partial JSON
state: jsonReaderState);
// Parse as much as possible
var parsedItems = incrementalParser.ContinueParsing(ref reader, ref completed);
// Save the parsing state
jsonReaderState = reader.CurrentState;
// Reset the buffer and write the remaining bytes
var remainingBytes = buffer.WrittenSpan[(int)reader.BytesConsumed..];
buffer.ResetWrittenCount();
buffer.Clear();
buffer.Write(remainingBytes);
// Return the parsed items
foreach (var parsedItem in parsedItems)
{
yield return parserItem;
}
}
}
finally
{
await enumerator.DisposeAsync();
}
}
}
Explanation:
- We manually load a given number of chunks into a buffer.
- We create a
Utf8JsonReader
with the buffer.IsFinalBlock
isfalse
as we don't know if we have reached the end of the stream.- We pass the
JsonReaderState
to keep track of the parsing state.
- Call
ContinueParsing
on the parser. The parser returns once there are no more tokens to read. - We save the state of the reader.
- We create a new buffer with the remaining bytes that were not consumed by the parser.
- We start again until we reach the end of the stream.
To return an NDJSON in Asp.NET, we need to write on the HttpContext
;
- Set the content type:
httpContext.Response.ContentType = "application/x-ndjson";
- Write the item and a new line:
var documentJson = JsonSerializer.SerializeToUtf8Bytes(parsedItem, jsonSerializerOptions); await httpContext.Response.Body.WriteAsync(documentJson, cancellationToken); await httpContext.Response.Body.WriteAsync(NewLineBytes, cancellationToken);
- Return
EmptyResult
to avoid the default response handling.
We can automatically add a $type
property with the JsonDerivedType
attribute on the base class.
[Serializable]
[JsonDerivedType(typeof(TodoListCreatedEvent), typeDiscriminator: "todoListCreated")]
[JsonDerivedType(typeof(TodoListItemAddedEvent), typeDiscriminator: "todoListItemAdded")]
public abstract record TodoListBaseEvent();
[Serializable]
public record TodoListCreatedEvent(string Name)
: TodoListBaseEvent;
[Serializable]
public record TodoListItemAddedEvent(int RecommendedAge, string Description)
: TodoListBaseEvent;
We can use source generation to optimize the serialization of the TodoListBaseEvent
and its derived types.
[JsonSerializable(typeof(TodoListBaseEvent))]
internal partial class SourceGenerationContext : JsonSerializerContext { }
Then we just use in the serialization options:
JsonSerializerOptions jsonSerializer = new()
{
WriteIndented = false, // Needs to be false for NDJSON
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
TypeInfoResolver = SourceGenerationContext.Default // Source-generated context for polymorphic serialization
};
Finally, after updating the extension method to write on the HttpContext
, we can use it with the following code:
// Register the services
OpenAIChatCompletionService chatCompletionService = new(
apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!, // Replace with your OpenAI API key
modelId: "gpt-4.1"
);
builder.Services.AddSingleton<ITextGenerationService>(chatCompletionService);
builder.Services.AddTransient<IIncrementalJsonParser<TodoListBaseEvent>, TodoListJsonVisitorParser>();
// Options
OpenAIPromptExecutionSettings openAiPromptExecutionSettings = new()
{
ResponseFormat = "json_object" // Needed to generate valid JSON
};
// Register endpoint
app.MapGet("/", async (
HttpContext httpContext,
ITextGenerationService textGenerationService,
IIncrementalJsonParser<TodoListBaseEvent> parser,
CancellationToken cancellationToken)
{
return await textGenerationService
.GetStreamingTextContentsAsync(Prompts.GenerateTodoListPrompt, // Prompt with example
executionSettings: openAiPromptExecutionSettings,
cancellationToken: cancellationToken)
.ToNdJsonAsync(httpContext, parser, chunkBufferSize: 48, jsonSerializer, cancellationToken))
});