Thanks to visit codestin.com
Credit goes to github.com

Skip to content

matteobortolazzo/semantic-kernel-json-streaming-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM JSON streaming parser

This repository contains a .NET example of how to incrementally parse JSON generated by a Large Language Model (LLM) and stream the results as Newline delimited JSON (NDJSON).

Run the example

OPENAI_API_KEY=your_openai_api_key \
dotnet run -p TodoList.Api/TodoList.Api.csproj
curl http://localhost:5179 

Demo Video

The problem

LLMs like GPT-4o are now good at generating JSON, which opens up many possibilities.

Most of the time we can just wait for the LLM to complete the generation, parse the answer and return to the UI.

However, given the speed of LLMs, it can be frustrating for users to wait for the completion of the generation.

The best solution would be to display the generated content incrementally as soon as possible. This is quite easy with text, but it's a bit more complicated with JSON as we need to make sure that the content is valid at each step. We need to parse the JSON while it's being generated, understand the structure, and act accordingly.

The example

We'll build a Life’s to-do list generator. The LLM will generate a list of tasks, and we'll return them as soon as they are generated via an NDJSON stream.

To make it more complex, this is the generated schema:

{
  "listName": "Bucket List",
  "items": [
    {
      "recommendedAge": 30,
      "description": "Skydiving"
    },
    {
      "recommendedAge": 50,
      "description": "Visit all seven continents"
    }
  ]
}

and this is the response:

{"$type": "todoListCreated", "listName": "Bucket List"}
{"$type": "todoListItemAdded", "recommendedAge": 30, "description": "Skydiving"}
{"$type": "todoListItemAdded", "recommendedAge": 50, "description": "Visit all seven continents"}

Tools

There are two ready-to-use tools we can use:

It should be straightforward to combine these two tools to achieve our goal, there's even a section in the docs: Read from a stream using Utf8JsonReader!

The actual problem

Actually, there are multiple challenges:

  • The reader example uses a MemoryStream while Semantic Kernel uses IAsyncEnumerable<StreamingTextContent>.
  • Utf8JsonReader is a ref struct, so:
    • It doesn't work with streams anyway, it only accepts a ReadOnlySpan<byte> during creation.
    • It can't be passed as a parameter to an async method.
    • It can't be used across await or yield boundaries.
    • It's a lexer/tokenizer, not a parser, so we need to handle the JSON structure ourselves.

The solution

We need to solve two problems:

  • How to use Utf8JsonReader with IAsyncEnumerable<StreamingTextContent>.
  • How to parse the JSON structure incrementally.

Let's start with the parser as it's simpler.

How to use SemanticKernel and stream the results will be covered at the end.

The parser

The main method of Utf8JsonReader is Read(). A simple JSON like { "name": "test" } will return generate the following tokens:

  • StartObject
  • PropertyName
  • String
  • EndObject

Each time we call Read(), the reader move forward and we use:

  • TokenType to know the type of the token.
  • ValueSpan, and other methods, to get the value of the token.
  • The bool returned to know if there are more tokens to read.

The interface for the parser is quite simple:

public interface IIncrementalJsonStreamParser<out TOut> where TOut : class  
{
    TOut[] ContinueParsing(ref Utf8JsonReader reader, ref bool completed);
}

Given an Utf8JsonReader, it will return an array of objects of type TOut and a bool indicating if the parsing is completed.

State machine

The easiest way I found to parse the JSON with this setup is a state machine. With each token, we update the state of the machine and act accordingly, for example, by adding an item to the output array.

Here the state machine for the TODO list:

TODO list

public enum TodoParsingState
{
    None,
    ReadingName,
    ReadingItems,
    // ... more states
}

public class TodoListJsonVisitorParser() : IIncrementalJsonParser<TodoListBaseEvent>
{
    // Event builders
    private readonly ListState _listState = new();
    private readonly ItemAddedState _listItemState = new();
    private TodoParsingState _parsingState = TodoParsingState.None;

    public TodoListBaseEvent[] ContinueParsing(ref Utf8JsonReader reader, ref bool completed)
    {
        List<TodoListBaseEvent> results = [];

        while (reader.Read())
        {
            switch (reader.TokenType)
            {
                case JsonTokenType.PropertyName:
                    _parsingState = reader.GetString() switch
                    {
                        "listName" => TodoParsingState.ReadingName,
                        "items" => TodoParsingState.ReadingItems,
                        "recommendedAge" => TodoParsingState.ReadingItemRecommendedAge,
                        "description" => TodoParsingState.ReadingItemDescription,
                        _ => TodoParsingState.None
                    };
                    break;
                case JsonTokenType.String:
                    var stringValue = reader.GetString() ?? string.Empty;
                    if (_parsingState == TodoParsingState.ReadingName)
                    {
                        _listState.Name = stringValue;
                        results.Add(_listState.ToEvent());
                        _parsingState = TodoParsingState.None;
                    }
                    break;
                // ... more cases for other tokens
            }
        }

        return results.ToArray();
    }
}

The feeder

Let's now see how we can keep feeding the parser. Below an extension method to convert the IAsyncEnumerable<StreamingTextContent to IAsyncEnumerable<TodoListBaseEvent>.

In the final code, we'll stream directly to NDJSON.

public static class StreamingJsonParserExtensions
{
    private static readonly byte[] NewLineBytes = Encoding.UTF8.GetBytes(Environment.NewLine);

    public static async IAsyncEnumerable<TodoListBaseEvent> ToListBaseEvents<TOut>(
        this IAsyncEnumerable<StreamingTextContent> input,
        IIncrementalJsonParser<TOut> incrementalParser,
        int chunkBufferSize = 48,
        CancellationToken cancellationToken = default)
        where TOut : class
    {
        // Control the pace of the stream by reading in chunks
        var enumerator = input.GetAsyncEnumerator(cancellationToken);

        try
        { 
            // Buffer for the chunks of text
            var buffer = new ArrayBufferWriter<byte>();
            // Keep track of the state of the JSON reader
            JsonReaderState jsonReaderState = new();
            
            var completed = false;
            while (!completed)
            {
                // Load the buffer with the next chunk of text
                for (var i = 0; i < chunkBufferSize; i++)
                {
                    // Get next token
                    var readSuccess = await enumerator.MoveNextAsync();
                    // Reached the end of the stream
                    if (!readSuccess)
                    {
                        completed = true;
                        break;
                    }

                    if (enumerator.Current?.Text == null)
                    {
                        continue;
                    }
                    
                    var bytes = Encoding.UTF8.GetBytes(enumerator.Current.Text);
                    buffer.Write(bytes);
                }

                // Load the reader with the buffer
                var reader = new Utf8JsonReader(
                    buffer.WrittenSpan,
                    isFinalBlock: false, // The input might be a partial JSON
                    state: jsonReaderState);

                // Parse as much as possible
                var parsedItems = incrementalParser.ContinueParsing(ref reader, ref completed);

                // Save the parsing state
                jsonReaderState = reader.CurrentState;

                // Reset the buffer and write the remaining bytes
                var remainingBytes = buffer.WrittenSpan[(int)reader.BytesConsumed..];
                buffer.ResetWrittenCount();
                buffer.Clear();
                buffer.Write(remainingBytes);

                // Return the parsed items
                foreach (var parsedItem in parsedItems)
                {
                    yield return parserItem;
                }
            }
        }
        finally
        {
            await enumerator.DisposeAsync();
        }
    }
}

Explanation:

  • We manually load a given number of chunks into a buffer.
  • We create a Utf8JsonReader with the buffer.
    • IsFinalBlock is false as we don't know if we have reached the end of the stream.
    • We pass the JsonReaderState to keep track of the parsing state.
  • Call ContinueParsing on the parser. The parser returns once there are no more tokens to read.
  • We save the state of the reader.
  • We create a new buffer with the remaining bytes that were not consumed by the parser.
  • We start again until we reach the end of the stream.

NDJSON

To return an NDJSON in Asp.NET, we need to write on the HttpContext;

  1. Set the content type:
    httpContext.Response.ContentType = "application/x-ndjson";
  2. Write the item and a new line:
    var documentJson = JsonSerializer.SerializeToUtf8Bytes(parsedItem, jsonSerializerOptions);
    await httpContext.Response.Body.WriteAsync(documentJson, cancellationToken);
    await httpContext.Response.Body.WriteAsync(NewLineBytes, cancellationToken);
  3. Return EmptyResult to avoid the default response handling.

Polymorphic serialization

We can automatically add a $type property with the JsonDerivedType attribute on the base class.

[Serializable]
[JsonDerivedType(typeof(TodoListCreatedEvent), typeDiscriminator: "todoListCreated")]
[JsonDerivedType(typeof(TodoListItemAddedEvent), typeDiscriminator: "todoListItemAdded")]
public abstract record TodoListBaseEvent();

[Serializable]
public record TodoListCreatedEvent(string Name)
    : TodoListBaseEvent;

[Serializable]
public record TodoListItemAddedEvent(int RecommendedAge, string Description)
    : TodoListBaseEvent;

Optimization

We can use source generation to optimize the serialization of the TodoListBaseEvent and its derived types.

[JsonSerializable(typeof(TodoListBaseEvent))]
internal partial class SourceGenerationContext : JsonSerializerContext { }

Then we just use in the serialization options:

JsonSerializerOptions jsonSerializer = new()
{
    WriteIndented = false, // Needs to be false for NDJSON
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
    DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
    TypeInfoResolver = SourceGenerationContext.Default // Source-generated context for polymorphic serialization
};

Semantic Kernel integration

Finally, after updating the extension method to write on the HttpContext, we can use it with the following code:

// Register the services
OpenAIChatCompletionService chatCompletionService = new(
    apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!, // Replace with your OpenAI API key
    modelId: "gpt-4.1"
);
builder.Services.AddSingleton<ITextGenerationService>(chatCompletionService);
builder.Services.AddTransient<IIncrementalJsonParser<TodoListBaseEvent>, TodoListJsonVisitorParser>();

// Options
OpenAIPromptExecutionSettings openAiPromptExecutionSettings = new()
{
    ResponseFormat = "json_object" // Needed to generate valid JSON
};

// Register endpoint
app.MapGet("/", async (
    HttpContext httpContext,
    ITextGenerationService textGenerationService,
    IIncrementalJsonParser<TodoListBaseEvent> parser,
    CancellationToken cancellationToken)
    {
        return await textGenerationService
            .GetStreamingTextContentsAsync(Prompts.GenerateTodoListPrompt, // Prompt with example
                executionSettings: openAiPromptExecutionSettings,
                cancellationToken: cancellationToken)
            .ToNdJsonAsync(httpContext, parser, chunkBufferSize: 48, jsonSerializer, cancellationToken))
    });

About

Example of how to incrementally parse JSON generated by an LLM and stream the results as NDJSON.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages