API.md

L0 API Reference

Complete API reference for L0 Python.

Most applications should simply use import l0. See Imports for details on available exports.

Core Functions

wrap(client_or_stream, *, guardrails, retry, timeout, ...)

Wrap an OpenAI/LiteLLM client or raw stream with L0 reliability.

This is the preferred API. Pass a client for full retry support, or a raw stream for simple cases.

Wrapping a Client (Recommended)

import l0
from openai import AsyncOpenAI

# Wrap the client once
client = l0.wrap(AsyncOpenAI())

# Use normally - L0 reliability is automatic
response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

# Stream with L0 events
async for event in response:
    if event.is_token:
        print(event.text, end="")

# Or read all at once
text = await response.read()

With Full Configuration

client = l0.wrap(
    AsyncOpenAI(),
    guardrails=l0.Guardrails.recommended(),
    retry=l0.Retry(attempts=5),
    timeout=l0.Timeout(initial_token=10000, inter_token=30000),  # Milliseconds
    continue_from_last_good_token=True,  # Resume from checkpoint on failure
    on_event=lambda e: print(f"[{e.type}]"),
    context={"request_id": "req-123", "user_id": "user-456"},
)

Wrapping a Raw Stream (Simple Cases)

# For one-off streams without retry support
raw_stream = await client.chat.completions.create(..., stream=True)
result = l0.wrap(raw_stream)
text = await result.read()

Parameters:

Parameter	Type	Default	Description
`client_or_stream`	`Client \| AsyncIterator`	required	OpenAI/LiteLLM client or raw stream
`guardrails`	`list[GuardrailRule]`	`None`	Guardrail rules to apply
`retry`	`Retry`	`Retry.recommended()`	Retry configuration (clients only)
`timeout`	`Timeout`	`None`	Timeout configuration
`continue_from_last_good_token`	`bool \| ContinuationConfig`	`False`	Resume from checkpoint on failure
`adapter`	`str \| Adapter`	`None`	Adapter hint or instance
`on_event`	`Callable`	`None`	Observability callback
`context`	`dict`	`None`	User context attached to all events
`build_continuation_prompt`	`Callable[[str], str]`	`None`	Modify prompt for continuation

Returns:

WrappedClient when passed a client (has .chat.completions.create())
LazyStream when passed a raw stream

run(stream, *, fallbacks, guardrails, retry, timeout, adapter, on_event, context, continue_from_last_good_token)

Run L0 with a stream factory. Use when you need retries or fallbacks (which require re-creating the stream).

Note: l0() is an alias to run() for convenience. Both work identically.

import l0

result = await l0.run(
    # Required: Stream factory (lambda for retries)
    stream=lambda: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    ),

    # Optional: Fallback streams
    fallbacks=[
        lambda: client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            stream=True,
        ),
    ],

    # Optional: Guardrails
    guardrails=l0.Guardrails.recommended(),

    # Optional: Retry configuration (defaults shown)
    retry=l0.Retry(
        attempts=3,                              # LLM errors only
        max_retries=6,                           # Total (LLM + network)
        base_delay=1.0,                          # Seconds
        max_delay=10.0,                          # Seconds
        strategy=l0.BackoffStrategy.FIXED_JITTER,
    ),

    # Optional: Timeout configuration (defaults shown, in milliseconds)
    timeout=l0.Timeout(
        initial_token=5000,   # Milliseconds to first token
        inter_token=10000,    # Milliseconds between tokens
    ),

    # Optional: Adapter hint
    adapter="openai",  # or "litellm", or Adapter instance

    # Optional: Event callback
    on_event=lambda event: print(f"[{event.type}]"),

    # Optional: User context attached to all events
    context={"request_id": "req-123", "tenant": "acme"},
    
    # Optional: Resume from checkpoint on failure
    continue_from_last_good_token=True,
)

# Iterate with Pythonic event properties
async for event in result:
    if event.is_token:
        print(event.text, end="")
    elif event.is_tool_call:
        print(f"Tool call: {event.data}")
    elif event.is_complete:
        print("\nComplete")
        print(f"Usage: {event.usage}")
    elif event.is_error:
        print(f"Error: {event.error}")

# Or get full text directly
text = await result.read()

# Access state anytime
print(result.state.content)       # Full accumulated content
print(result.state.token_count)   # Total tokens received
print(result.state.checkpoint)    # Last stable checkpoint
print(result.state.duration)      # Duration in seconds

Parameters:

Parameter	Type	Default	Description
`stream`	`Callable[[], AsyncIterator]`	required	Factory returning async LLM stream
`fallbacks`	`list[Callable]`	`None`	Fallback stream factories
`guardrails`	`list[GuardrailRule]`	`None`	Guardrail rules to apply
`retry`	`Retry`	`None`	Retry configuration
`timeout`	`Timeout`	`None`	Timeout configuration
`continue_from_last_good_token`	`bool \| ContinuationConfig`	`False`	Resume from checkpoint on failure
`adapter`	`str \| Adapter`	`None`	Adapter hint or instance
`on_event`	`Callable`	`None`	Observability callback
`context`	`dict`	`None`	User context attached to all events
`build_continuation_prompt`	`Callable[[str], str]`	`None`	Modify prompt for continuation

Returns: Stream - Async iterator with attached state

Property/Method	Type	Description
`__aiter__`	-	Iterate directly over events
`state`	`State`	Runtime state
`abort()`	`Callable[[], None]`	Abort the stream
`read()`	`async -> str`	Consume stream, return full text
`errors`	`list[Exception]`	Errors encountered

wrap() vs run()

Function	When to Use	Returns
`wrap(client)`	Recommended - Wrap OpenAI client once, use everywhere	`WrappedClient`
`wrap(stream)`	Simple one-off, no retry support	`LazyStream`
`run()`	Need fallbacks or LiteLLM	`Stream`

# Recommended - wrap client
client = l0.wrap(AsyncOpenAI())
response = await client.chat.completions.create(...)

# Simple one-off stream
result = l0.wrap(raw_stream)
text = await result.read()

# With fallbacks - use run()
result = await l0.run(
    stream=lambda: create_stream(),
    fallbacks=[lambda: backup_stream()],
)

WrappedClient

When you wrap an OpenAI client, you get a WrappedClient that mirrors the original API:

client = l0.wrap(AsyncOpenAI())

# Same API as OpenAI
response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    stream=True,
)

# Access the underlying client
raw_client = client.unwrapped

# Create a new client with different options
strict_client = client.with_options(
    guardrails=l0.Guardrails.strict(),
    continue_from_last_good_token=True,
)

Lifecycle Callbacks

L0 provides lifecycle callbacks for monitoring and responding to runtime events. All callbacks are optional and are pure side-effect handlers (they don't affect execution flow).

Callback Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                            L0 LIFECYCLE FLOW                                │
└─────────────────────────────────────────────────────────────────────────────┘

                                ┌──────────┐
                                │  START   │
                                └────┬─────┘
                                     │
                                     ▼
                      ┌──────────────────────────────┐
                      │       on_event(event)        │
                      └──────────────┬───────────────┘
                                     │
                                     ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                              STREAMING PHASE                               │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         on_event(event)                             │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                            │
│  During streaming, events fire as conditions occur:                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │  CHECKPOINT  │  │  TOOL_CALL   │  │    DRIFT     │  │   TIMEOUT    │   │
│  │    SAVED     │  │   detected   │  │   detected   │  │   occurred   │   │
│  └──────────────┘  └──────────────┘  └──────┬───────┘  └──────┬───────┘   │
│                                             │                  │           │
│                                             └────────┬─────────┘           │
│                                                      │ triggers retry      │
└──────────────────────────────────────────────────────┼─────────────────────┘
                                                       │
              ┌────────────────────────────────────────┼────────────────┐
              │                    │                   │                │
              ▼                    ▼                   ▼                ▼
        ┌─────────┐          ┌───────────┐      ┌──────────┐      ┌─────────┐
        │ SUCCESS │          │   ERROR   │      │VIOLATION │      │  ABORT  │
        └────┬────┘          └─────┬─────┘      └────┬─────┘      └────┬────┘
             │                     │                 │                 │
             │                     ▼                 ▼                 ▼
             │              ┌────────────────────────────────┐   ┌───────────┐
             │              │      on_event(ERROR)           │   │ ABORTED   │
             │              └──────────────┬─────────────────┘   └───────────┘
             │                             │
             │                 ┌───────────┼───────────┐
             │                 │           │           │
             │                 ▼           ▼           ▼
             │           ┌──────────┐ ┌──────────┐ ┌──────────┐
             │           │  RETRY   │ │ FALLBACK │ │  FATAL   │
             │           └────┬─────┘ └────┬─────┘ └────┬─────┘
             │                │            │            │
             │                │    ┌───────┘            │
             │                │    │                    │
             │                ▼    ▼                    │
             │          ┌─────────────────────┐         │
             │          │  Has checkpoint?    │         │
             │          └──────────┬──────────┘         │
             │                YES  │  NO                │
             │                ┌────┴────┐               │
             │                ▼         ▼               │
             │          ┌──────────┐    │               │
             │          │  RESUME  │    │               │
             │          └────┬─────┘    │               │
             │               │          │               │
             │               ▼          ▼               │
             │          ┌─────────────────────────┐     │
             │          │    Back to STREAMING    │─────┘
             │          └─────────────────────────┘
             │
             ▼
      ┌─────────────┐
      │  COMPLETE   │
      └─────────────┘

Callback Reference

Callback	Signature	When Called
`on_event`	`(event: ObservabilityEvent) -> None`	Any runtime event emitted

ObservabilityEventType Reference

Event Type	Description
`SESSION_START`	New execution session begins
`SESSION_END`	Session completed
`STREAM_INIT`	Stream initialized
`STREAM_READY`	Stream ready for tokens
`RETRY_START`	Retry sequence starting
`RETRY_ATTEMPT`	Individual retry attempt
`RETRY_END`	Retry sequence completed
`RETRY_GIVE_UP`	All retries exhausted
`FALLBACK_START`	Switching to fallback model
`FALLBACK_END`	Fallback sequence completed
`GUARDRAIL_PHASE_START`	Guardrail check starting
`GUARDRAIL_RULE_RESULT`	Individual rule result
`GUARDRAIL_PHASE_END`	Guardrail check completed
`DRIFT_CHECK_RESULT`	Drift detection result
`NETWORK_ERROR`	Network error occurred
`NETWORK_RECOVERY`	Recovered from network error
`CHECKPOINT_SAVED`	Checkpoint saved
`COMPLETE`	Stream completed successfully
`ERROR`	Error occurred

Usage Example

import l0

def handle_event(event: l0.ObservabilityEvent):
    match event.type:
        case l0.ObservabilityEventType.SESSION_START:
            print(f"Session started: {event.stream_id}")
        case l0.ObservabilityEventType.RETRY_ATTEMPT:
            print(f"Retrying (attempt {event.meta.get('attempt', '?')})")
        case l0.ObservabilityEventType.FALLBACK_START:
            print(f"Switching to fallback {event.meta.get('index', '?')}")
        case l0.ObservabilityEventType.CHECKPOINT_SAVED:
            print(f"Checkpoint saved ({event.meta.get('token_count', 0)} tokens)")
        case l0.ObservabilityEventType.NETWORK_ERROR:
            print(f"Network error: {event.meta.get('error', 'unknown')}")
        case l0.ObservabilityEventType.COMPLETE:
            print(f"Complete! Duration: {event.meta.get('duration', 0)}s")
        case l0.ObservabilityEventType.ERROR:
            print(f"Error: {event.meta.get('error', 'unknown')}")

result = await l0.run(
    stream=lambda: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    ),
    on_event=handle_event,
    context={"request_id": "req-123", "user_id": "user-456"},
)

Streaming Runtime

L0 wraps LLM streams with deterministic behavior and unified event types.

Unified Event Format

All streams are normalized to Event objects:

@dataclass
class Event:
    type: EventType                           # Event type
    text: str | None = None                   # Token content
    data: dict[str, Any] | None = None        # Tool call / misc data
    error: Exception | None = None            # Error (for error events)
    usage: dict[str, int] | None = None       # Token usage
    timestamp: float | None = None            # Event timestamp

    # Pythonic type check properties
    @property
    def is_token(self) -> bool: ...
    @property
    def is_message(self) -> bool: ...
    @property
    def is_data(self) -> bool: ...
    @property
    def is_progress(self) -> bool: ...
    @property
    def is_tool_call(self) -> bool: ...
    @property
    def is_error(self) -> bool: ...
    @property
    def is_complete(self) -> bool: ...

Event Types

class EventType(str, Enum):
    TOKEN = "token"           # Text token
    MESSAGE = "message"       # Full message
    DATA = "data"             # Structured data
    PROGRESS = "progress"     # Progress update
    TOOL_CALL = "tool_call"   # Tool/function call
    ERROR = "error"           # Error occurred
    COMPLETE = "complete"     # Stream complete

Tool Call Handling

result = await l0.run(
    stream=lambda: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What's the weather?"}],
        tools=[{
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {"location": {"type": "string"}},
                    "required": ["location"],
                },
            },
        }],
        stream=True,
    ),
)

async for event in result:
    if event.is_tool_call:
        print(f"Tool: {event.data['name']}")
        print(f"Args: {event.data['arguments']}")
        print(f"ID: {event.data['id']}")

State Tracking

# Access state at any point
state = result.state

state.content           # Accumulated content
state.checkpoint        # Last validated checkpoint
state.token_count       # Total tokens received
state.model_retry_count # Model error retries
state.network_retry_count # Network error retries
state.fallback_index    # Current model (0=primary)
state.violations        # Guardrail violations
state.drift_detected    # Whether drift was detected
state.completed         # Stream completed successfully
state.aborted           # Stream was aborted
state.first_token_at    # Timestamp of first token
state.last_token_at     # Timestamp of last token
state.duration          # Total duration (seconds)
state.resumed           # Resumed from checkpoint

Retry Configuration

Retry

All delays are in seconds (float), matching Python conventions like asyncio.sleep().

@dataclass
class Retry:
    attempts: int = 3                 # Model errors only
    max_retries: int = 6              # Absolute cap (all errors)
    base_delay: float = 1.0           # Starting delay (seconds)
    max_delay: float = 10.0           # Maximum delay (seconds)
    strategy: BackoffStrategy = BackoffStrategy.FIXED_JITTER

BackoffStrategy

class BackoffStrategy(str, Enum):
    EXPONENTIAL = "exponential"    # delay * 2^attempt
    LINEAR = "linear"              # delay * (attempt + 1)
    FIXED = "fixed"                # constant delay
    FULL_JITTER = "full-jitter"    # random(0, exponential)
    FIXED_JITTER = "fixed-jitter"  # base/2 + random(base/2) - DEFAULT

Backoff Calculation

Strategy	Formula	Example (base=1.0s, attempt=2)
`EXPONENTIAL`	`min(base * 2^attempt, max)`	4.0s
`LINEAR`	`min(base * (attempt + 1), max)`	3.0s
`FIXED`	`base`	1.0s
`FULL_JITTER`	`random(0, min(base * 2^attempt, max))`	0-4.0s
`FIXED_JITTER`	`temp/2 + random(temp/2)`	2.0-4.0s

Retry Behavior by Error Type

Error Type	Retries	Counts Toward `attempts`	Counts Toward `max_retries`
Network disconnect	Yes	No	Yes
Zero output	Yes	No	Yes
Timeout	Yes	No	Yes
429 rate limit	Yes	No	Yes
503 server error	Yes	No	Yes
Guardrail violation	Yes	Yes	Yes
Drift detected	Yes	Yes	Yes
Auth error (401/403)	No	-	-

Retry Presets

from l0 import (
    MINIMAL_RETRY,      # 2 attempts, 4 max, linear backoff
    RECOMMENDED_RETRY,  # 3 attempts, 6 max, fixed-jitter backoff
    STRICT_RETRY,       # 3 attempts, 6 max, full-jitter backoff
    EXPONENTIAL_RETRY,  # 4 attempts, 8 max, exponential backoff
    Retry,
)

# Use preset directly
result = await l0.run(stream=my_stream, retry=RECOMMENDED_RETRY)

# Or use class method presets
result = await l0.run(stream=my_stream, retry=Retry.recommended())
result = await l0.run(stream=my_stream, retry=Retry.minimal())
result = await l0.run(stream=my_stream, retry=Retry.strict())
result = await l0.run(stream=my_stream, retry=Retry.exponential())

# Environment-specific presets
result = await l0.run(stream=my_stream, retry=Retry.mobile())  # Higher delays for mobile
result = await l0.run(stream=my_stream, retry=Retry.edge())    # Shorter delays for edge runtimes

Preset	attempts	max_retries	backoff	base_delay	max_delay
`MINIMAL_RETRY`	2	4	`linear`	1.0s	10.0s
`RECOMMENDED_RETRY`	3	6	`fixed-jitter`	1.0s	10.0s
`STRICT_RETRY`	3	6	`full-jitter`	1.0s	10.0s
`EXPONENTIAL_RETRY`	4	8	`exponential`	1.0s	10.0s

Centralized Defaults

from l0 import RETRY_DEFAULTS, ERROR_TYPE_DELAY_DEFAULTS

# RETRY_DEFAULTS contains all default values
RETRY_DEFAULTS.attempts      # 3
RETRY_DEFAULTS.max_retries   # 6
RETRY_DEFAULTS.base_delay    # 1.0 (seconds)
RETRY_DEFAULTS.max_delay     # 10.0 (seconds)
RETRY_DEFAULTS.backoff       # BackoffStrategy.FIXED_JITTER

# ERROR_TYPE_DELAY_DEFAULTS for network error types
ERROR_TYPE_DELAY_DEFAULTS.connection_dropped  # 1.0s
ERROR_TYPE_DELAY_DEFAULTS.timeout             # 1.0s
ERROR_TYPE_DELAY_DEFAULTS.dns_error           # 3.0s
ERROR_TYPE_DELAY_DEFAULTS.ssl_error           # 0.0s (don't retry SSL errors)

Custom Retry Logic

Override default retry behavior with custom functions.

shouldRetry (Async Veto Callback)

The should_retry callback provides async control over retry decisions. It can only veto retries, never force them.

from l0 import Retry, State, ErrorCategory

async def custom_should_retry(
    error: Exception,
    state: State,
    attempt: int,
    category: ErrorCategory
) -> bool:
    # Veto retry if we already have substantial content
    if state.token_count > 100:
        return False
    
    # Veto retry for context length errors
    if "context_length_exceeded" in str(error):
        return False
    
    # Check external service before retrying
    can_retry = await check_rate_limit_service()
    if not can_retry:
        return False
    
    # Return True to allow default retry behavior
    return True

result = await l0.run(
    stream=my_stream,
    retry=Retry(
        attempts=5,
        should_retry=custom_should_retry,
    ),
)

Key Behavior

The final retry decision follows this formula:

final_should_retry = default_decision AND should_retry(...)

Default Decision	should_retry Returns	Final Result	Explanation
`True`	`True`	Retry	Both agree to retry
`True`	`False`	No retry	User vetoed the retry
`False`	`True`	No retry	User cannot force retry
`False`	`False`	No retry	Both agree not to retry

should_retry Parameters

Parameter	Type	Description
`error`	`Exception`	The error that occurred
`state`	`State`	Current state (content, token_count, etc.)
`attempt`	`int`	Current attempt (0-based)
`category`	`ErrorCategory`	Error category (network/transient/model/fatal)

calculateDelay

Custom delay calculation function to override default backoff behavior:

from l0 import Retry

def custom_calculate_delay(context: dict) -> float:
    """
    context contains:
    - attempt: int - Current retry attempt (0-based)
    - total_attempts: int - Total attempts including network
    - category: str - Error category (network/model/fatal)
    - reason: str - Error reason code
    - error: Exception - The error that occurred
    - default_delay: float - Default delay that would be used
    """
    # Different delays based on error category
    if context["category"] == "network":
        return 0.5
    if context["reason"] == "rate_limit":
        return 5.0
    
    # Custom exponential backoff with full jitter
    import random
    base = 1.0
    cap = 30.0
    temp = min(cap, base * (2 ** context["attempt"]))
    return random.random() * temp

result = await l0.run(
    stream=my_stream,
    retry=Retry(
        attempts=3,
        base_delay=1.0,
        calculate_delay=custom_calculate_delay,
    ),
)

Error Type Delays

Custom delays for specific network error types. Overrides base_delay for fine-grained control.

from l0 import Retry, ErrorTypeDelays

result = await l0.run(
    stream=my_stream,
    retry=Retry(
        attempts=3,
        error_type_delays=ErrorTypeDelays(
            # Connection errors
            connection_dropped=2.0,   # Connection dropped mid-stream
            econnreset=1.5,           # Connection reset by peer
            econnrefused=3.0,         # Connection refused
            
            # Network errors
            fetch_error=0.5,          # Generic fetch failure
            dns_error=5.0,            # DNS resolution failed
            timeout=1.5,              # Request timeout
            
            # Streaming errors
            sse_aborted=1.0,          # Server-sent events aborted
            no_bytes=0.5,             # No bytes received
            partial_chunks=1.0,       # Incomplete chunks received
            
            # Runtime errors
            runtime_killed=5.0,       # Runtime process killed
            background_throttle=2.0,  # Background tab throttling
            
            # Fallback
            unknown=1.0,              # Unknown error type
        ),
    ),
)

Retryable Error Types

from l0 import Retry, RetryableErrorType

# Only retry on specific error types
result = await l0.run(
    stream=my_stream,
    retry=Retry(
        attempts=3,
        retry_on=[
            RetryableErrorType.NETWORK_ERROR,
            RetryableErrorType.TIMEOUT,
            RetryableErrorType.RATE_LIMIT,
            # Exclude: ZERO_OUTPUT, GUARDRAIL_VIOLATION, DRIFT, etc.
        ],
    ),
)

Available error types:

Error Type	Description
`ZERO_OUTPUT`	No meaningful output generated
`GUARDRAIL_VIOLATION`	Guardrail rule failed
`DRIFT`	Output drift detected
`INCOMPLETE`	Incomplete output
`NETWORK_ERROR`	Network/connection error
`TIMEOUT`	Request timeout
`RATE_LIMIT`	429 rate limit
`SERVER_ERROR`	5xx server error

Checkpoint Resumption

When a stream fails mid-generation (timeout, network error), L0 can resume from the last checkpoint instead of starting over.

continue_from_last_good_token

Enable with continue_from_last_good_token=True:

client = l0.wrap(
    AsyncOpenAI(),
    continue_from_last_good_token=True,
    timeout=l0.Timeout(inter_token=30.0),
)

# If the stream times out after "Hello wor", L0 will:
# 1. Save checkpoint: "Hello wor"
# 2. Retry the request
# 3. Deduplicate any overlapping content from the retry
# 4. Continue seamlessly

How It Works

Checkpoint Saving: L0 saves checkpoints at configurable intervals (default: every 5 tokens)
Failure Detection: On timeout or transient error, the checkpoint is preserved
Retry with Continuation: On retry, the checkpoint content is available
Deduplication: If the LLM repeats content from the checkpoint, L0 removes the overlap

ContinuationConfig

For fine-grained control:

from l0 import ContinuationConfig, DeduplicationOptions

config = ContinuationConfig(
    enabled=True,
    checkpoint_interval=5,        # Save checkpoint every N tokens
    deduplicate=True,             # Remove overlapping content
    deduplication_options=DeduplicationOptions(
        min_overlap=2,            # Minimum chars to consider overlap
        max_overlap=500,          # Maximum chars to check
        case_sensitive=True,
        normalize_whitespace=False,
    ),
    validate_checkpoint=True,     # Run guardrails on checkpoint
)

client = l0.wrap(
    AsyncOpenAI(),
    continue_from_last_good_token=config,
)

State Fields

After completion, check continuation state:

response = await client.chat.completions.create(...)
async for event in response:
    pass

# Check if continuation was used
print(response.state.resumed)              # True if retried
print(response.state.checkpoint)           # Last checkpoint content
print(response.state.continuation_used)    # True if resumed from checkpoint
print(response.state.deduplication_applied)  # True if overlap removed
print(response.state.overlap_removed)      # The overlapping text that was removed

Smart Continuation Deduplication

When using continue_from_last_good_token, LLMs often repeat words from the end of the checkpoint at the beginning of their continuation. L0 automatically detects and removes this overlap.

How It Works

# Checkpoint: "Hello world"
# LLM continues with: "world is great"
# Without deduplication: "Hello worldworld is great"
# With deduplication: "Hello world is great" ✓

Deduplication is enabled by default when continue_from_last_good_token=True. The algorithm:

Buffers incoming continuation tokens until overlap can be detected
Finds the longest suffix of the checkpoint that matches a prefix of the continuation
Removes the overlapping portion from the continuation
Emits only the non-overlapping content

Configuration

from l0 import ContinuationConfig, DeduplicationOptions

config = ContinuationConfig(
    enabled=True,
    checkpoint_interval=5,
    
    # Deduplication enabled by default, explicitly disable:
    deduplicate=False,
    
    # Or configure options:
    deduplication_options=DeduplicationOptions(
        min_overlap=2,       # Minimum chars to consider overlap (default: 2)
        max_overlap=500,     # Maximum chars to check (default: 500)
        case_sensitive=True, # Case-sensitive matching (default: True)
        normalize_whitespace=False,  # Normalize whitespace for matching (default: False)
    ),
)

result = await l0.run(
    stream=lambda: client.chat.completions.create(..., stream=True),
    continue_from_last_good_token=config,
)

Options

Option	Type	Default	Description
`min_overlap`	int	2	Minimum overlap length to detect (avoids false positives)
`max_overlap`	int	500	Maximum overlap length to check (performance limit)
`case_sensitive`	bool	True	Whether matching is case-sensitive
`normalize_whitespace`	bool	False	Normalize whitespace when matching (`"hello world"` matches `"hello world"`)

Utility Functions

The overlap detection is also available as standalone utilities:

from l0 import Continuation

# Full result with metadata
result = Continuation.detect_overlap("Hello world", "world is great")
# OverlapResult(
#     has_overlap=True,
#     overlap_length=5,
#     overlap_text="world",
#     deduplicated=" is great"
# )

# Convenience wrapper - just the deduplicated string
text = Continuation.deduplicate("Hello world", "world is great")
# " is great"

# With options
from l0 import DeduplicationOptions

options = DeduplicationOptions(case_sensitive=False, min_overlap=3)
result = Continuation.detect_overlap("Hello World", "world test", options)

Examples

Case-insensitive matching:

# Checkpoint: "Hello World"
# Continuation: "world is great"
# With case_sensitive=False → "Hello World is great"

config = ContinuationConfig(
    enabled=True,
    deduplication_options=DeduplicationOptions(case_sensitive=False),
)

Multi-word overlap:

# Checkpoint: "The quick brown fox"
# Continuation: "brown fox jumps over"
# Result: "The quick brown fox jumps over"

Document Windows

Process documents that exceed context limits with automatic chunking.

Window.create(document, *, size, overlap, strategy)

Create a window for processing long documents.

from l0 import Window

window = Window.create(
    long_document,
    size=2000,           # Tokens per chunk
    overlap=200,         # Overlap between chunks
    strategy="paragraph", # "token" | "char" | "paragraph" | "sentence"
)

# Navigation
chunk = window.current()     # Current chunk
window.next()                # Move to next
window.prev()                # Move to previous
window.jump(5)               # Jump to chunk 5

# Search
matches = window.find_chunks("keyword")

# Get context around a chunk
context = window.get_context(chunk_index, before=1, after=1)

# Statistics
stats = window.get_stats()
print(f"Total chunks: {stats.total_chunks}")
print(f"Total tokens: {stats.total_tokens}")

Window Presets

from l0 import Window

# Quick creation with presets
window = Window.small(document)      # 1000 tokens, 100 overlap
window = Window.medium(document)     # 2000 tokens, 200 overlap (default)
window = Window.large(document)      # 4000 tokens, 400 overlap
window = Window.paragraph(document)  # Paragraph-based chunking
window = Window.sentence(document)   # Sentence-based chunking

Processing All Chunks

from l0 import Window, ChunkProcessConfig

window = Window.create(document, size=2000, overlap=200)

# Process all chunks in parallel
results = await window.process_all(
    lambda chunk: ChunkProcessConfig(
        stream=lambda: client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Summarize: {chunk.content}"}],
            stream=True,
        ),
    ),
    concurrency=3,  # Max 3 concurrent
)

# Check results
for result in results:
    if result.status == "success":
        print(f"Chunk {result.chunk.index}: {result.content[:50]}...")
    else:
        print(f"Chunk {result.chunk.index} failed: {result.error}")

# Get processing statistics
stats = Window.get_stats(results)
print(f"Success rate: {stats.success_rate}%")
print(f"Average duration: {stats.avg_duration}ms")

Merging Results

# Merge all successful results
merged_text = Window.merge_results(results, separator="\n\n")

# Merge chunks back into document (handles overlap)
merged_doc = Window.merge_chunks(window.get_all_chunks())

DocumentChunk

@dataclass
class DocumentChunk:
    index: int          # Position (0-based)
    content: str        # Chunk text
    start_pos: int      # Start position in original document
    end_pos: int        # End position in original document
    token_count: int    # Estimated tokens
    char_count: int     # Character count
    is_first: bool      # Is this the first chunk?
    is_last: bool       # Is this the last chunk?
    total_chunks: int   # Total number of chunks
    metadata: dict      # Custom metadata

Pipeline

Multi-phase streaming workflows where each step receives the output of the previous step.

pipe(steps, input, options)

Execute a pipeline of streaming steps.

from openai import AsyncOpenAI
import l0

client = AsyncOpenAI()

async def summarize_step(text: str, ctx: l0.StepContext):
    return lambda: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
        stream=True,
    )

async def refine_step(summary: str, ctx: l0.StepContext):
    return lambda: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Refine this summary: {summary}"}],
        stream=True,
    )

result = await l0.pipe(
    [
        l0.PipelineStep(name="summarize", fn=summarize_step),
        l0.PipelineStep(name="refine", fn=refine_step),
    ],
    long_document,
    l0.PipelineOptions(name="summarize-refine"),
)

print(result.output)  # Final refined summary
print(f"Duration: {result.duration}ms")
print(f"Steps completed: {len(result.steps)}")

PipelineStep

@dataclass
class PipelineStep:
    name: str           # Step name (for logging/debugging)
    fn: Callable        # Step function: (input, context) -> stream factory
    transform: Callable | None = None  # Transform output before next step
    condition: Callable | None = None  # Condition to run this step
    on_error: Callable | None = None   # Error handler for this step
    on_complete: Callable | None = None  # Callback when step completes
    metadata: dict = {}  # Step-specific metadata

PipelineOptions

@dataclass
class PipelineOptions:
    name: str | None = None      # Pipeline name
    stop_on_error: bool = True   # Stop on first error
    timeout: float | None = None # Max execution time (seconds)
    on_start: Callable | None = None     # Called when pipeline starts
    on_complete: Callable | None = None  # Called when pipeline completes
    on_error: Callable | None = None     # Called on error (error, step_index)
    on_progress: Callable | None = None  # Called for progress (step_index, total)
    metadata: dict = {}          # Pipeline-wide metadata

Reusable Pipelines

from l0 import create_pipeline, PipelineStep, PipelineOptions

# Create a reusable pipeline
summarize_pipeline = create_pipeline(
    [
        PipelineStep(name="extract", fn=extract_step),
        PipelineStep(name="summarize", fn=summarize_step),
        PipelineStep(name="format", fn=format_step),
    ],
    PipelineOptions(name="document-summarizer"),
)

# Run multiple times
result1 = await summarize_pipeline.run(document1)
result2 = await summarize_pipeline.run(document2)

# Clone and modify
strict_pipeline = summarize_pipeline.clone()
strict_pipeline.options.stop_on_error = True

Conditional Steps

from l0 import PipelineStep

# Step with condition
conditional_step = PipelineStep(
    name="translate",
    fn=translate_step,
    condition=lambda input, ctx: ctx.metadata.get("language") != "en",
)

# Branch step
from l0 import create_branch_step

branch = create_branch_step(
    "route",
    condition=lambda input, ctx: len(input) > 1000,
    if_true=summarize_step,   # Long text → summarize
    if_false=passthrough_step, # Short text → pass through
)

Chaining and Parallel Pipelines

from l0 import chain_pipelines, parallel_pipelines

# Chain pipelines sequentially
full_pipeline = chain_pipelines(
    extract_pipeline,
    analyze_pipeline,
    format_pipeline,
)

# Run pipelines in parallel and combine
results = await parallel_pipelines(
    [sentiment_pipeline, entity_pipeline, summary_pipeline],
    document,
    lambda results: {
        "sentiment": results[0].output,
        "entities": results[1].output,
        "summary": results[2].output,
    },
)

Pipeline Presets

from l0.pipeline import FAST_PIPELINE, RELIABLE_PIPELINE, PRODUCTION_PIPELINE

# FAST_PIPELINE: stop_on_error=True (fail fast)
# RELIABLE_PIPELINE: stop_on_error=False (graceful failures)
# PRODUCTION_PIPELINE: stop_on_error=False, timeout=300s

Formatting Helpers

Utilities for formatting prompts, context, memory, and tool definitions.

Context Formatting

from l0 import Format

# Wrap content with delimiters
context = Format.context(
    "User manual content here",
    label="documentation",
    delimiter="xml",  # "xml" | "markdown" | "brackets" | "none"
)
# Output: <documentation>\nUser manual content here\n</documentation>

# Format multiple contexts
contexts = Format.contexts([
    {"content": "Doc 1", "label": "doc1"},
    {"content": "Doc 2", "label": "doc2"},
])

# Format a document with metadata
doc = Format.document(content, {"title": "Report", "author": "User"})

# Format instructions
instructions = Format.instructions("You are a helpful assistant")

Memory Formatting

from l0 import Format

# Format conversation history
memory = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi there!"},
    {"role": "user", "content": "How are you?"},
]

formatted = Format.memory(memory, {"style": "conversational", "max_entries": 10})
# Output:
# User: Hello
# Assistant: Hi there!
# User: How are you?

# Create timestamped memory entries
entry = Format.memory_entry("user", "New message")

# Memory utilities
filtered = Format.filter_memory(memory, "user")  # Only user messages
last_5 = Format.last_n_entries(memory, 5)
size = Format.memory_size(memory)  # Character count
truncated = Format.truncate_memory(memory, max_size=1000)

Output Formatting

from l0 import Format

# Request JSON output
instruction = Format.json_output({"strict": True, "schema": "..."})

# Request structured output
instruction = Format.structured_output("yaml", {"strict": True})

# Define output constraints
constraints = Format.output_constraints({
    "max_length": 500,
    "format": "bullet_points",
})

# Clean model output
cleaned = Format.clean_output("Sure! Here's the JSON: {...}")  # "{...}"

# Extract JSON from output
json_str = Format.extract_json(model_output)

# Validate JSON
is_valid, error = Format.validate_json(output)

Tool Formatting

from l0 import Format

# Create a tool definition
tool = Format.create_tool(
    "search",
    "Search the web for information",
    [
        Format.parameter("query", "string", "Search query", required=True),
        Format.parameter("limit", "integer", "Max results", default=10),
    ],
)

# Format for model
formatted = Format.tool(tool, {"style": "json-schema"})

# Format multiple tools
formatted_tools = Format.tools([tool1, tool2])

# Parse function call from output
fn_call = Format.parse_function_call(model_output)
if fn_call:
    print(f"Function: {fn_call.name}, Args: {fn_call.arguments}")

String Utilities

from l0 import Format

# Basic operations
Format.trim("  hello  ")           # "hello"
Format.truncate("Hello World", 8)  # "Hello..."
Format.truncate_words("Hello World", 8)  # "Hello..."
Format.wrap("Long text...", 80)    # Word-wrapped text
Format.pad("hello", 10, align="center")  # "  hello   "

# Escaping
Format.escape("Hello\nWorld")      # "Hello\\nWorld"
Format.unescape("Hello\\nWorld")   # "Hello\nWorld"
Format.escape_html("<div>")        # "&lt;div&gt;"
Format.unescape_html("&lt;div&gt;") # "<div>"
Format.escape_regex("foo.*bar")    # "foo\\.\\*bar"
Format.sanitize("text\x00here")    # "texthere" (removes control chars)
Format.remove_ansi("\x1b[31mred\x1b[0m")  # "red"

Network Protection

Error Categorization

from l0.errors import categorize_error
from l0.types import ErrorCategory

category = categorize_error(error)

match category:
    case ErrorCategory.NETWORK:
        print("Network error - retry forever")
    case ErrorCategory.TRANSIENT:
        print("Transient (429/503) - retry forever")
    case ErrorCategory.MODEL:
        print("Model error - counts toward limit")
    case ErrorCategory.CONTENT:
        print("Content error - counts toward limit")
    case ErrorCategory.PROVIDER:
        print("Provider error - may retry")
    case ErrorCategory.FATAL:
        print("Fatal - no retry (401/403)")
    case ErrorCategory.INTERNAL:
        print("Internal - no retry (bug)")

Network Error Patterns

L0 automatically detects these patterns in error messages:

Pattern	Description
`connection.*reset`	Connection reset by peer
`connection.*refused`	Connection refused
`connection.*timeout`	Connection timeout
`timed?\s*out`	Request timed out
`dns.*failed`	DNS resolution failed
`name.*resolution`	Name resolution error
`socket.*error`	Socket error
`ssl.*error`	SSL/TLS error
`eof.*occurred`	Unexpected EOF
`broken.*pipe`	Broken pipe
`network.*unreachable`	Network unreachable
`host.*unreachable`	Host unreachable

HTTP Status Code Handling

Status	Category	Behavior
429	`TRANSIENT`	Retry forever
500-599	`TRANSIENT`	Retry forever
401	`FATAL`	No retry
403	`FATAL`	No retry

Structured Output

structured(schema, stream, *, fallbacks, auto_correct, retry, on_validation_error, on_auto_correct, on_event, adapter)

Guaranteed valid JSON matching a Pydantic schema.

from pydantic import BaseModel
import l0

class UserProfile(BaseModel):
    name: str
    age: int
    email: str
    tags: list[str] = []

result = await l0.structured(
    schema=UserProfile,
    stream=lambda: client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Generate user data as JSON"}],
        stream=True,
    ),
    auto_correct=True,  # Fix common JSON errors
)

# Type-safe access
print(result.data.name)    # str
print(result.data.age)     # int
print(result.data.email)   # str
print(result.data.tags)    # list[str]

Parameters:

Parameter	Type	Default	Description
`schema`	`type[BaseModel]`	required	Pydantic model class
`stream`	`AsyncIterator \| Callable[[], AsyncIterator]`	required	Async LLM stream or factory returning one
`fallbacks`	`list[AsyncIterator \| Callable]`	`None`	Fallback streams to try if primary fails
`auto_correct`	`bool`	`True`	Auto-fix common JSON errors
`retry`	`Retry`	`None`	Retry configuration for validation failures
`on_validation_error`	`Callable[[ValidationError, int], None]`	`None`	Callback when validation fails (error, attempt)
`on_auto_correct`	`Callable[[AutoCorrectInfo], None]`	`None`	Callback when auto-correction is applied
`on_event`	`Callable[[ObservabilityEvent], None]`	`None`	Callback for observability events
`adapter`	`Any \| str`	`None`	Adapter hint ("openai", "litellm", or instance)

JSON Auto-Correction

from l0._utils import auto_correct_json, extract_json_from_markdown

# Remove trailing commas
auto_correct_json('{"a": 1,}')  # '{"a": 1}'

# Balance braces
auto_correct_json('{"a": {"b": 1}')  # '{"a": {"b": 1}}'

# Balance brackets
auto_correct_json('[1, 2, 3')  # '[1, 2, 3]'

# Strip whitespace
auto_correct_json('  {"a": 1}  ')  # '{"a": 1}'

# Extract from markdown fences
extract_json_from_markdown('''
Here's the data:
```json
{"key": "value"}

''') # '{"key": "value"}'


---

## Fallback Models

Sequential fallback when primary model fails:

```python
result = await l0.run(
    stream=lambda: openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    ),
    fallbacks=[
        # Fallback 1: Cheaper OpenAI model
        lambda: openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            stream=True,
        ),
        # Fallback 2: Different provider via LiteLLM
        lambda: litellm.acompletion(
            model="anthropic/claude-3-haiku-20240307",
            messages=[{"role": "user", "content": prompt}],
            stream=True,
        ),
    ],
)

# Check which model succeeded
if result.state.fallback_index == 0:
    print("Primary model (gpt-4o) succeeded")
elif result.state.fallback_index == 1:
    print("Fallback 1 (gpt-4o-mini) succeeded")
else:
    print(f"Fallback {result.state.fallback_index} succeeded")

Fallback Behavior

Primary stream fails (error, timeout, guardrail violation)
L0 exhausts retries for primary stream
Moves to first fallback, resets retry counter
Repeats until success or all fallbacks exhausted
Raises last error if all fail

Guardrails

Built-in Rules

import l0

# Individual rules (via Guardrails scoped API)
l0.Guardrails.json()           # Validates JSON structure (balanced braces)
l0.Guardrails.strict_json()    # Validates complete JSON (on completion only)
l0.Guardrails.pattern()        # Detects "As an AI..." patterns
l0.Guardrails.zero_output()    # Detects empty output
l0.Guardrails.stall()          # Detects token stalls
l0.Guardrails.repetition()     # Detects model looping

Presets (Recommended)

import l0

# Recommended: json + pattern + zero_output
guardrails = l0.Guardrails.recommended()

# Strict: All rules including drift detection
guardrails = l0.Guardrails.strict()

# JSON only
guardrails = l0.Guardrails.json_only()

# None (empty list)
guardrails = l0.Guardrails.none()

Rule Details

Rule	Streaming	Default Severity	Description
`Guardrails.json()`	Yes	error	Checks balanced `{}[]` brackets
`Guardrails.strict_json()`	No	error	Validates JSON via `json.loads()` on complete
`Guardrails.pattern(patterns)`	Yes	warning	Regex patterns (default: AI slop)
`Guardrails.zero_output()`	No	error	Empty output on complete
`Guardrails.stall(max_gap)`	Yes	warning	No tokens for `max_gap` seconds
`Guardrails.repetition(window, threshold)`	Yes	error	Repeated content detection

Custom Guardrails

from l0 import GuardrailRule, GuardrailViolation
from l0.types import State

def max_length_rule(limit: int = 1000) -> GuardrailRule:
    """Detect output exceeding length limit."""
    
    def check(state: State) -> list[GuardrailViolation]:
        if len(state.content) > limit:
            return [GuardrailViolation(
                rule="max_length",
                message=f"Output exceeds {limit} chars",
                severity="error",
                recoverable=True,
            )]
        return []
    
    return GuardrailRule(
        name="max_length",
        check=check,
        description="Detects output exceeding length limit",
        streaming=True,
        severity="error",
        recoverable=True,
    )

# Usage
result = await l0.run(
    stream=my_stream,
    guardrails=[max_length_rule(500)],
)

GuardrailRule

@dataclass
class GuardrailRule:
    name: str                                    # Unique name
    check: Callable[[State], list[GuardrailViolation]]
    description: str | None = None               # Human description
    streaming: bool = True                       # Check during streaming
    severity: Severity = "error"                 # Default severity
    recoverable: bool = True                     # Can retry on violation

GuardrailViolation

@dataclass
class GuardrailViolation:
    rule: str                         # Rule name that triggered
    message: str                      # Human-readable message
    severity: Severity                # "warning" | "error" | "fatal"
    recoverable: bool = True          # Can retry/fallback
    position: int | None = None       # Position in content
    timestamp: float | None = None    # When detected
    context: dict[str, Any] | None = None   # Extra context
    suggestion: str | None = None     # Suggested fix

Violation Handling

# Access violations from result
for violation in result.state.violations:
    print(f"[{violation.severity}] {violation.rule}: {violation.message}")
    
    if not violation.recoverable:
        print("  Fatal - cannot retry")

Consensus

consensus(tasks, strategy)

Multi-generation consensus for high-confidence results.

import l0

result = await l0.consensus(
    tasks=[
        lambda: generate_answer_model_a(),
        lambda: generate_answer_model_b(),
        lambda: generate_answer_model_c(),
    ],
    strategy="majority",  # "unanimous" | "majority" | "best"
)

Parameters:

Parameter	Type	Default	Description
`tasks`	`list[Callable[[], Awaitable[T]]]`	required	Async callables
`strategy`	`Strategy`	`"majority"`	Consensus strategy

Strategies

Strategy	Description	Raises
`unanimous`	All results must be identical	`ValueError` if any differ
`majority`	Most common result wins (>50%)	`ValueError` if no majority
`best`	Return first result	Never (unless all fail)

Example: Multi-Model Validation

async def get_answer(model: str) -> str:
    result = await l0.run(
        stream=lambda: client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": question}],
            stream=True,
        ),
    )
    return await result.read()

# Require agreement from multiple models
try:
    answer = await l0.consensus(
        tasks=[
            lambda: get_answer("gpt-4o"),
            lambda: get_answer("gpt-4o-mini"),
            lambda: get_answer("gpt-4-turbo"),
        ],
        strategy="majority",
    )
    print(f"Consensus answer: {answer}")
except ValueError as e:
    print(f"No consensus: {e}")

Parallel Operations

parallel(tasks, concurrency)

Run tasks with concurrency limit.

import l0

async def process_document(doc: str) -> str:
    result = await l0.run(stream=lambda: summarize(doc))
    return await result.read()

# Process 10 documents, max 3 concurrent
results = await l0.parallel(
    tasks=[lambda d=doc: process_document(d) for doc in documents],
    concurrency=3,
)

Parameters:

Parameter	Type	Default	Description
`tasks`	`list[Callable[[], Awaitable[T]]]`	required	Async callables
`concurrency`	`int`	`5`	Max concurrent tasks

Returns: list[T] - Results in same order as tasks

race(tasks)

Return first successful result, cancel remaining.

import l0

# First model to respond wins
result = await l0.race([
    lambda: fast_but_expensive_model(),
    lambda: slow_but_cheap_model(),
    lambda: backup_model(),
])

Behavior:

All tasks start immediately
First to complete successfully is returned
All other tasks are cancelled
If first fails, does NOT wait for others

batched(items, handler, batch_size)

Process items in batches.

import l0

async def embed(text: str) -> list[float]:
    # Get embedding for single text
    return embedding

# Process 1000 texts in batches of 50
embeddings = await l0.batched(
    items=texts,  # 1000 texts
    handler=embed,
    batch_size=50,
)
# Result: 1000 embeddings in order

Parameters:

Parameter	Type	Default	Description
`items`	`list[T]`	required	Items to process
`handler`	`Callable[[T], Awaitable[R]]`	required	Async handler
`batch_size`	`int`	`10`	Batch size

Pattern Comparison

Pattern	Execution	Cost	Best For
`run()` with fallbacks	Sequential on failure	Low	High availability
`race()`	Parallel, first wins	High	Low latency
`parallel()`	Parallel with limit	Medium	Batch processing
`batched()`	Sequential batches	Low	Large datasets
`consensus()`	Parallel, vote	High	High reliability

Custom Adapters

Adapter Protocol

from typing import Protocol, Any
from collections.abc import AsyncIterator

class Adapter(Protocol):
    name: str
    
    def detect(self, stream: Any) -> bool:
        """Return True if this adapter can handle the stream."""
        ...
    
    def wrap(self, stream: Any) -> AsyncIterator[Event]:
        """Wrap raw stream into Event stream."""
        ...

Built-in Adapters

Adapter	Auto-Detected	Description
`OpenAIAdapter`	Yes	OpenAI SDK streams
`LiteLLMAdapter`	Yes	LiteLLM streams (alias for OpenAI)

Creating Custom Adapters

from collections.abc import AsyncIterator
from typing import Any
import l0
from l0 import Event, EventType, Adapters

class AnthropicAdapter:
    """Adapter for direct Anthropic SDK (if not using LiteLLM)."""
    name = "anthropic"
    
    def detect(self, stream: Any) -> bool:
        return "anthropic" in type(stream).__module__
    
    async def wrap(self, stream: Any) -> AsyncIterator[Event]:
        usage = None
        
        async for event in stream:
            event_type = getattr(event, "type", None)
            
            if event_type == "content_block_delta":
                delta = getattr(event, "delta", None)
                if delta and hasattr(delta, "text"):
                    yield Event(type=EventType.TOKEN, text=delta.text)
            
            elif event_type == "content_block_start":
                block = getattr(event, "content_block", None)
                if block and getattr(block, "type", None) == "tool_use":
                    yield Event(
                        type=EventType.TOOL_CALL,
                        data={
                            "id": getattr(block, "id", None),
                            "name": getattr(block, "name", None),
                        }
                    )
            
            elif event_type == "message_delta":
                msg_usage = getattr(event, "usage", None)
                if msg_usage:
                    usage = {
                        "input_tokens": getattr(msg_usage, "input_tokens", 0),
                        "output_tokens": getattr(msg_usage, "output_tokens", 0),
                    }
            
            elif event_type == "message_stop":
                yield Event(type=EventType.COMPLETE, usage=usage)

# Register for auto-detection
Adapters.register(AnthropicAdapter())

Adapter Functions

from l0 import Adapters

# Register custom adapter (takes priority over built-ins)
Adapters.register(MyAdapter())

# Explicitly detect adapter
adapter = Adapters.detect(stream)
print(adapter.name)

# Use specific adapter by name
result = await l0.run(
    stream=my_stream,
    adapter="openai",  # Force OpenAI adapter
)

# Use adapter instance directly
result = await l0.run(
    stream=my_stream,
    adapter=MyCustomAdapter(),
)

Adapter Invariants

Adapters MUST:

Preserve text exactly (no trimming, modification)
Convert errors to error events (never throw from wrap)
Emit COMPLETE event exactly once at end
Handle empty/null content gracefully

Observability

EventBus

Central event bus for all L0 observability.

from l0 import EventBus, ObservabilityEvent, ObservabilityEventType

def my_handler(event: ObservabilityEvent):
    print(f"[{event.type}] stream={event.stream_id}")
    print(f"  ts={event.ts}ms")
    print(f"  context={event.context}")  # User-provided context
    print(f"  meta={event.meta}")        # Event-specific data

# Create event bus
bus = EventBus(
    handler=my_handler,
    context={"service": "my-app", "request_id": "req-123"},
)

# Access stream ID (UUIDv7)
print(bus.stream_id)

# Emit custom events
bus.emit(ObservabilityEventType.CHECKPOINT_SAVED, checkpoint="...", token_count=100)

Using with run()

result = await l0.run(
    stream=my_stream,
    on_event=lambda e: print(f"[{e.type}] context={e.context} meta={e.meta}"),
    context={"request_id": "req-123", "user_id": "user-456"},
)

ObservabilityEvent

@dataclass
class ObservabilityEvent:
    type: ObservabilityEventType     # Event type
    ts: float                        # Unix epoch MILLISECONDS
    stream_id: str                   # UUIDv7 stream identifier
    context: dict[str, Any]          # User-provided context (request_id, tenant, etc.)
    meta: dict[str, Any]             # Event-specific metadata (attempt, reason, etc.)

Event Types

class ObservabilityEventType(str, Enum):
    # Session
    SESSION_START = "SESSION_START"
    SESSION_END = "SESSION_END"
    
    # Stream
    STREAM_INIT = "STREAM_INIT"
    STREAM_READY = "STREAM_READY"
    
    # Retry
    RETRY_START = "RETRY_START"
    RETRY_ATTEMPT = "RETRY_ATTEMPT"
    RETRY_END = "RETRY_END"
    RETRY_GIVE_UP = "RETRY_GIVE_UP"
    
    # Fallback
    FALLBACK_START = "FALLBACK_START"
    FALLBACK_END = "FALLBACK_END"
    
    # Guardrail
    GUARDRAIL_PHASE_START = "GUARDRAIL_PHASE_START"
    GUARDRAIL_RULE_RESULT = "GUARDRAIL_RULE_RESULT"
    GUARDRAIL_PHASE_END = "GUARDRAIL_PHASE_END"
    
    # Drift
    DRIFT_CHECK_RESULT = "DRIFT_CHECK_RESULT"
    
    # Network
    NETWORK_ERROR = "NETWORK_ERROR"
    NETWORK_RECOVERY = "NETWORK_RECOVERY"
    
    # Checkpoint
    CHECKPOINT_SAVED = "CHECKPOINT_SAVED"
    
    # Completion
    COMPLETE = "COMPLETE"
    ERROR = "ERROR"

Error Handling

Error Categories

class ErrorCategory(str, Enum):
    NETWORK = "network"      # Connection drops, DNS, SSL
    TRANSIENT = "transient"  # 429, 503 - temporary
    MODEL = "model"          # Model refused, malformed
    CONTENT = "content"      # Guardrail, drift
    PROVIDER = "provider"    # API errors
    FATAL = "fatal"          # Auth errors (401/403)
    INTERNAL = "internal"    # Bugs, internal errors

categorize_error(error)

from l0.errors import categorize_error
from l0.types import ErrorCategory

try:
    result = await l0.run(stream=my_stream)
except Exception as error:
    category = categorize_error(error)
    
    if category in (ErrorCategory.NETWORK, ErrorCategory.TRANSIENT):
        print("Transient error - would have retried")
    elif category == ErrorCategory.FATAL:
        print("Fatal error - check credentials")
    elif category == ErrorCategory.INTERNAL:
        print("Bug - please report")

Error Category Behavior

Category	Retries	Counts Toward Limit	Example
`NETWORK`	Forever	No	Connection reset
`TRANSIENT`	Forever	No	429 rate limit
`MODEL`	Limited	Yes	Model refused
`CONTENT`	Limited	Yes	Guardrail violation
`PROVIDER`	Depends	Depends	API error
`FATAL`	Never	-	401 unauthorized
`INTERNAL`	Never	-	Bug

TimeoutError

L0's timeout error with details:

from l0 import TimeoutError

try:
    result = await l0.run(
        stream=my_stream,
        timeout=l0.Timeout(initial_token=1.0),
    )
    async for event in result:
        pass
except TimeoutError as e:
    print(e.timeout_type)     # "initial_token" or "inter_token"
    print(e.timeout_seconds)  # The timeout value that was exceeded

State Machine

L0 includes a lightweight state machine for tracking runtime state. Useful for debugging and monitoring.

RuntimeState

from l0 import StateMachine, RuntimeState, RuntimeStates

# Use RuntimeState/RuntimeStates constants instead of string literals
class RuntimeState(str, Enum):
    INIT = "init"                           # Initial setup
    WAITING_FOR_TOKEN = "waiting_for_token" # Waiting for first chunk
    STREAMING = "streaming"                 # Receiving tokens
    TOOL_CALL_DETECTED = "tool_call_detected"  # Tool call in progress
    CONTINUATION_MATCHING = "continuation_matching"  # Buffering for overlap detection
    CHECKPOINT_VERIFYING = "checkpoint_verifying"    # Validating checkpoint
    RETRYING = "retrying"                   # About to retry same stream
    FALLBACK = "fallback"                   # Switching to fallback stream
    FINALIZING = "finalizing"               # Finalizing (final guardrails, etc.)
    COMPLETE = "complete"                   # Success
    ERROR = "error"                         # Failed

# RuntimeStates is an alias for RuntimeState
RuntimeStates = RuntimeState

StateMachine

from l0 import StateMachine, RuntimeState, create_state_machine

# Create a state machine
sm = StateMachine()
# Or use the factory function
sm = create_state_machine()

# Transition to a new state (use constants)
sm.transition(RuntimeState.STREAMING)

# Get current state
sm.get()  # RuntimeState.STREAMING

# Check if in one of multiple states
sm.is_(RuntimeState.STREAMING, RuntimeState.CONTINUATION_MATCHING)  # True

# Alternative method name
sm.is_state(RuntimeState.STREAMING)  # True

# Check if terminal (complete or error)
sm.is_terminal()  # False (True for COMPLETE or ERROR)

# Subscribe to state changes
def on_state_change(state: RuntimeState):
    print(f"State changed to: {state}")

unsubscribe = sm.subscribe(on_state_change)

# Get history for debugging
history = sm.get_history()
# [StateTransition(from_state=INIT, to_state=STREAMING, timestamp=1234567890.123), ...]

# Reset to initial state and clear history
sm.reset()

# Unsubscribe when done
unsubscribe()

StateTransition

@dataclass
class StateTransition:
    from_state: RuntimeState   # Previous state
    to_state: RuntimeState     # New state
    timestamp: float           # Unix timestamp of transition

Scoped API

from l0 import StateMachine

# Create via class method
sm = StateMachine.create()

Metrics

Simple counters for runtime metrics. OpenTelemetry is opt-in via separate adapter.

Metrics Class

from l0 import Metrics, MetricsSnapshot, create_metrics

# Create a new metrics instance
metrics = Metrics()
# Or use the factory function
metrics = create_metrics()

# Available counters (all integers)
metrics.requests          # Total stream requests
metrics.tokens            # Total tokens processed
metrics.retries           # Total retry attempts
metrics.network_retry_count  # Network retries (subset of retries)
metrics.errors            # Total errors encountered
metrics.violations        # Guardrail violations
metrics.drift_detections  # Drift detections
metrics.fallbacks         # Fallback activations
metrics.completions       # Successful completions
metrics.timeouts          # Timeouts (initial + inter-token)

# Increment counters directly
metrics.requests += 1
metrics.tokens += 150
metrics.completions += 1

# Get snapshot (immutable copy)
snapshot: MetricsSnapshot = metrics.snapshot()
print(f"Total tokens: {snapshot.tokens}")
print(f"Success rate: {snapshot.completions / snapshot.requests * 100}%")

# Reset all counters to zero
metrics.reset()

# Serialize to dictionary
data = metrics.to_dict()
# {"requests": 10, "tokens": 1500, "retries": 2, ...}

MetricsSnapshot

@dataclass
class MetricsSnapshot:
    requests: int            # Total stream requests
    tokens: int              # Total tokens processed
    retries: int             # Total retry attempts
    network_retry_count: int # Network retries (subset)
    errors: int              # Total errors
    violations: int          # Guardrail violations
    drift_detections: int    # Drift detections
    fallbacks: int           # Fallback activations
    completions: int         # Successful completions
    timeouts: int            # Timeouts

Global Metrics

from l0 import Metrics, get_global_metrics, reset_global_metrics

# Get the global metrics singleton
global_metrics = Metrics.get_global()
# Or use the legacy function
global_metrics = get_global_metrics()

# Use global metrics
global_metrics.requests += 1

# Reset global metrics
Metrics.reset_global()
# Or use the legacy function
reset_global_metrics()

Scoped API

from l0 import Metrics

# Create via class method
metrics = Metrics.create()

# Access global instance via class method
global_metrics = Metrics.get_global()

# Reset global via class method
Metrics.reset_global()

Async Checks

Non-blocking wrappers for guardrails and drift detection. Uses fast/slow path pattern to prevent blocking the event loop during streaming.

How It Works

Fast path: Delta-only check or small content - runs synchronously and returns immediately
Slow path: Large content (>10KB) - defers via asyncio.call_soon() to avoid blocking event loop

This prevents guardrails/drift from causing token delays that could trigger false timeouts.

Usage Pattern

The async check pattern is used internally by L0's runtime but can be leveraged for custom implementations:

import asyncio
from l0 import Guardrails, State

async def process_with_async_guardrails(content: str, state: State):
    """Example of async guardrail pattern."""
    rules = Guardrails.recommended()
    
    # For small content, check synchronously (fast path)
    if len(content) < 10000:
        for rule in rules:
            violations = rule.check(state)
            if violations:
                return violations
        return []
    
    # For large content, defer to avoid blocking (slow path)
    loop = asyncio.get_event_loop()
    future = loop.create_future()
    
    def run_check():
        try:
            all_violations = []
            for rule in rules:
                all_violations.extend(rule.check(state))
            loop.call_soon_threadsafe(future.set_result, all_violations)
        except Exception as e:
            loop.call_soon_threadsafe(future.set_exception, e)
    
    loop.call_soon(run_check)
    return await future

Benefits

No false timeouts: Large guardrail checks don't block token processing
Responsive streaming: Tokens continue flowing while checks run in background
Automatic optimization: Small checks run inline, large checks are deferred

Formatting Helpers

Utilities for formatting prompts, context, memory, and tool definitions.

Context Formatting

from l0 import Format

# Wrap content with delimiters
context = Format.context(
    "User manual content here",
    label="documentation",
    delimiter="xml",  # "xml" | "markdown" | "brackets" | "none"
)
# Output: <documentation>\nUser manual content here\n</documentation>

# Format multiple contexts
contexts = Format.contexts([
    {"content": "Doc 1", "label": "doc1"},
    {"content": "Doc 2", "label": "doc2"},
])

# Format a document with metadata
doc = Format.document(content, {"title": "Report", "author": "User"})

# Format instructions
instructions = Format.instructions("You are a helpful assistant")

Memory Formatting

from l0 import Format

# Format conversation history
memory = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi there!"},
    {"role": "user", "content": "How are you?"},
]

formatted = Format.memory(memory, {"style": "conversational", "max_entries": 10})
# Output:
# User: Hello
# Assistant: Hi there!
# User: How are you?

# Create timestamped memory entries
entry = Format.memory_entry("user", "New message")

# Memory utilities
filtered = Format.filter_memory(memory, "user")  # Only user messages
last_5 = Format.last_n_entries(memory, 5)
size = Format.memory_size(memory)  # Character count
truncated = Format.truncate_memory(memory, max_size=1000)

Output Formatting

from l0 import Format

# Request JSON output
instruction = Format.json_output({"strict": True, "schema": "..."})

# Request structured output
instruction = Format.structured_output("yaml", {"strict": True})

# Define output constraints
constraints = Format.output_constraints({
    "max_length": 500,
    "format": "bullet_points",
})

# Clean model output
cleaned = Format.clean_output("Sure! Here's the JSON: {...}")  # "{...}"

# Extract JSON from output
json_str = Format.extract_json(model_output)

# Validate JSON
is_valid, error = Format.validate_json(output)

Tool Formatting

from l0 import Format

# Create a tool definition
tool = Format.create_tool(
    "search",
    "Search the web for information",
    [
        Format.parameter("query", "string", "Search query", required=True),
        Format.parameter("limit", "integer", "Max results", default=10),
    ],
)

# Format for model
formatted = Format.tool(tool, {"style": "json-schema"})

# Format multiple tools
formatted_tools = Format.tools([tool1, tool2])

# Parse function call from output
fn_call = Format.parse_function_call(model_output)
if fn_call:
    print(f"Function: {fn_call.name}, Args: {fn_call.arguments}")

String Utilities

from l0 import Format

# Basic operations
Format.trim("  hello  ")           # "hello"
Format.truncate("Hello World", 8)  # "Hello..."
Format.truncate_words("Hello World", 8)  # "Hello..."
Format.wrap("Long text...", 80)    # Word-wrapped text
Format.pad("hello", 10, align="center")  # "  hello   "

# Escaping
Format.escape("Hello\nWorld")      # "Hello\\nWorld"
Format.unescape("Hello\\nWorld")   # "Hello\nWorld"
Format.escape_html("<div>")        # "&lt;div&gt;"
Format.unescape_html("&lt;div&gt;") # "<div>"
Format.escape_regex("foo.*bar")    # "foo\\.\\*bar"
Format.sanitize("text\x00here")    # "texthere" (removes control chars)
Format.remove_ansi("\x1b[31mred\x1b[0m")  # "red"

Stream Utilities

consume_stream(stream)

Consume stream and return full text.

import l0

result = await l0.run(stream=my_stream)
text = await l0.consume_stream(result)
print(text)

get_text(result)

Helper to get text from Stream result.

import l0

result = await l0.run(stream=my_stream)
text = await l0.get_text(result)
print(text)

Aborting Streams

result = await l0.run(stream=my_stream)

# Start consuming
async for event in result:
    if should_stop(event):
        result.abort()
        break
    process(event)

# Check if aborted
print(result.state.aborted)  # True

Utility Functions

JSON Utilities

from l0._utils import auto_correct_json, extract_json_from_markdown

# Fix common JSON errors
fixed = auto_correct_json('{"a": 1,}')  # '{"a": 1}'
fixed = auto_correct_json('{"a": {"b": 1}')  # '{"a": {"b": 1}}'
fixed = auto_correct_json('[1, 2')  # '[1, 2]'

# Extract JSON from markdown
json_str = extract_json_from_markdown('''
```json
{"key": "value"}

''')


### Debug Logging

```python
import l0

# Enable debug logging
l0.enable_debug()
# Outputs: [l0] DEBUG: Starting L0 stream: ...

Types

WrappedClient

class WrappedClient:
    """Wrapped OpenAI/LiteLLM client with L0 reliability.
    
    Returned by l0.wrap(client). Mirrors the original client API
    but adds automatic reliability features.
    """
    
    chat: WrappedChat                         # chat.completions.create()
    
    @property
    def unwrapped(self) -> Any:
        """Access the underlying unwrapped client."""
        ...
    
    def with_options(
        self,
        *,
        guardrails: list[GuardrailRule] | None = None,
        retry: Retry | None = None,
        timeout: Timeout | None = None,
        continue_from_last_good_token: ContinuationConfig | bool | None = None,
        ...
    ) -> WrappedClient:
        """Create a new wrapped client with updated options."""
        ...

Stream

class Stream:
    """Async iterator result with state and abort attached."""
    
    state: State                              # Runtime state
    abort: Callable[[], None]                 # Abort the stream
    errors: list[Exception]                   # Errors encountered
    
    def __aiter__(self) -> Stream: ...
    async def __anext__(self) -> Event: ...
    async def __aenter__(self) -> Stream: ...
    async def __aexit__(...) -> bool: ...
    async def read(self) -> str:
        """Consume the stream and return the full text content."""
        ...

LazyStream

class LazyStream:
    """Lazy stream wrapper - no await needed on creation.
    
    Like httpx.AsyncClient() or aiohttp.ClientSession(), this returns
    immediately and only does async work when you iterate or read.
    """
    
    state: State                              # Runtime state (after started)
    errors: list[Exception]                   # Errors encountered
    
    def abort(self) -> None: ...
    def __aiter__(self) -> LazyStream: ...
    async def __anext__(self) -> Event: ...
    async def __aenter__(self) -> LazyStream: ...
    async def __aexit__(...) -> bool: ...
    async def read(self) -> str:
        """Consume the stream and return the full text content."""
        ...

State

@dataclass
class State:
    content: str = ""
    checkpoint: str = ""                      # Last known good slice for continuation
    token_count: int = 0
    model_retry_count: int = 0
    network_retry_count: int = 0
    fallback_index: int = 0
    violations: list[GuardrailViolation] = field(default_factory=list)
    drift_detected: bool = False
    completed: bool = False
    aborted: bool = False
    first_token_at: float | None = None
    last_token_at: float | None = None
    duration: float | None = None
    resumed: bool = False                     # Whether stream was resumed from checkpoint
    network_errors: list[Any] = field(default_factory=list)
    
    # Continuation state (for observability)
    resume_point: str | None = None           # The checkpoint content used for resume
    resume_from: int | None = None            # Character offset where resume occurred
    continuation_used: bool = False           # Whether continuation was actually used
    deduplication_applied: bool = False       # Whether deduplication removed overlap
    overlap_removed: str | None = None        # The overlapping text that was removed

Event

@dataclass
class Event:
    type: EventType
    text: str | None = None                   # Token content
    data: dict[str, Any] | None = None        # Tool call / misc data
    error: Exception | None = None            # Error (for error events)
    usage: dict[str, int] | None = None       # Token usage
    timestamp: float | None = None            # Event timestamp

    # Type check properties
    @property
    def is_token(self) -> bool: ...
    @property
    def is_message(self) -> bool: ...
    @property
    def is_data(self) -> bool: ...
    @property
    def is_progress(self) -> bool: ...
    @property
    def is_tool_call(self) -> bool: ...
    @property
    def is_error(self) -> bool: ...
    @property
    def is_complete(self) -> bool: ...

EventType

class EventType(str, Enum):
    TOKEN = "token"
    MESSAGE = "message"
    DATA = "data"
    PROGRESS = "progress"
    TOOL_CALL = "tool_call"
    ERROR = "error"
    COMPLETE = "complete"

Retry

@dataclass
class Retry:
    attempts: int = 3                 # Model errors only
    max_retries: int = 6              # Absolute cap
    base_delay: float = 1.0           # Seconds
    max_delay: float = 10.0           # Seconds
    strategy: BackoffStrategy = BackoffStrategy.FIXED_JITTER
    error_type_delays: ErrorTypeDelays | None = None  # Per-error-type delays
    retry_on: list[RetryableErrorType] | None = None  # Which error types to retry
    should_retry: Callable[..., bool | Coroutine] | None = None  # Veto callback
    calculate_delay: Callable[..., float] | None = None  # Custom delay calculation

Timeout

@dataclass
class Timeout:
    initial_token: int = 5000         # Milliseconds to first token
    inter_token: int = 10000          # Milliseconds between tokens

GuardrailRule

@dataclass
class GuardrailRule:
    name: str
    check: Callable[[State], list[GuardrailViolation]]
    description: str | None = None
    streaming: bool = True
    severity: Severity = "error"
    recoverable: bool = True

GuardrailViolation

@dataclass
class GuardrailViolation:
    rule: str
    message: str
    severity: Severity
    recoverable: bool = True
    position: int | None = None
    timestamp: float | None = None
    context: dict[str, Any] | None = None
    suggestion: str | None = None

Severity

Severity = Literal["warning", "error", "fatal"]

BackoffStrategy

class BackoffStrategy(str, Enum):
    EXPONENTIAL = "exponential"    # delay * 2^attempt
    LINEAR = "linear"              # delay * (attempt + 1)
    FIXED = "fixed"                # constant delay
    FULL_JITTER = "full-jitter"    # random(0, exponential)
    FIXED_JITTER = "fixed-jitter"  # base/2 + random(base/2)

ErrorCategory

class ErrorCategory(str, Enum):
    NETWORK = "network"
    TRANSIENT = "transient"
    MODEL = "model"
    CONTENT = "content"
    PROVIDER = "provider"
    FATAL = "fatal"
    INTERNAL = "internal"

RuntimeState

class RuntimeState(str, Enum):
    INIT = "init"
    WAITING_FOR_TOKEN = "waiting_for_token"
    STREAMING = "streaming"
    TOOL_CALL_DETECTED = "tool_call_detected"
    CONTINUATION_MATCHING = "continuation_matching"
    CHECKPOINT_VERIFYING = "checkpoint_verifying"
    RETRYING = "retrying"
    FALLBACK = "fallback"
    FINALIZING = "finalizing"
    COMPLETE = "complete"
    ERROR = "error"

StateTransition

@dataclass
class StateTransition:
    from_state: RuntimeState
    to_state: RuntimeState
    timestamp: float

StateMachine

class StateMachine:
    def transition(self, next_state: RuntimeState) -> None: ...
    def get(self) -> RuntimeState: ...
    def is_(self, *states: RuntimeState) -> bool: ...
    def is_state(self, *states: RuntimeState) -> bool: ...
    def is_terminal(self) -> bool: ...
    def reset(self) -> None: ...
    def get_history(self) -> list[StateTransition]: ...
    def subscribe(self, listener: Callable[[RuntimeState], None]) -> Callable[[], None]: ...
    
    @classmethod
    def create(cls) -> StateMachine: ...

MetricsSnapshot

@dataclass
class MetricsSnapshot:
    requests: int
    tokens: int
    retries: int
    network_retry_count: int
    errors: int
    violations: int
    drift_detections: int
    fallbacks: int
    completions: int
    timeouts: int

Metrics

class Metrics:
    requests: int
    tokens: int
    retries: int
    network_retry_count: int
    errors: int
    violations: int
    drift_detections: int
    fallbacks: int
    completions: int
    timeouts: int
    
    def reset(self) -> None: ...
    def snapshot(self) -> MetricsSnapshot: ...
    def to_dict(self) -> dict[str, int]: ...
    
    @classmethod
    def create(cls) -> Metrics: ...
    @classmethod
    def get_global(cls) -> Metrics: ...
    @classmethod
    def reset_global(cls) -> None: ...

RetryableErrorType

class RetryableErrorType(str, Enum):
    ZERO_OUTPUT = "zero_output"
    GUARDRAIL_VIOLATION = "guardrail_violation"
    DRIFT = "drift"
    INCOMPLETE = "incomplete"
    NETWORK_ERROR = "network_error"
    TIMEOUT = "timeout"
    RATE_LIMIT = "rate_limit"
    SERVER_ERROR = "server_error"

ErrorTypeDelays

@dataclass
class ErrorTypeDelays:
    connection_dropped: float | None = None
    fetch_error: float | None = None
    econnreset: float | None = None
    econnrefused: float | None = None
    sse_aborted: float | None = None
    no_bytes: float | None = None
    partial_chunks: float | None = None
    runtime_killed: float | None = None
    background_throttle: float | None = None
    dns_error: float | None = None
    ssl_error: float | None = None
    timeout: float | None = None
    unknown: float | None = None

Imports

Main Import (Recommended)

import l0

# Simple wrapping (no await needed!)
result = l0.wrap(stream, guardrails=l0.Guardrails.recommended())
text = await result.read()

# Or with retries/fallbacks
result = await l0.run(
    stream=lambda: create_stream(),
    fallbacks=[lambda: backup_stream()],
    guardrails=l0.Guardrails.recommended(),
)

async for event in result:
    if event.is_token:
        print(event.text, end="")

Direct Imports

from l0 import (
    # Core
    wrap,
    run,
    l0,  # Alias to run()
    Stream,
    LazyStream,
    WrappedClient,
    State,
    Event,
    EventType,
    
    # Retry & Timeout
    Retry,
    Timeout,
    TimeoutError,
    BackoffStrategy,
    RetryableErrorType,
    ErrorTypeDelays,
    RETRY_DEFAULTS,
    ERROR_TYPE_DELAY_DEFAULTS,
    MINIMAL_RETRY,
    RECOMMENDED_RETRY,
    STRICT_RETRY,
    EXPONENTIAL_RETRY,
    
    # Guardrails (scoped API - use Guardrails.json(), Guardrails.pattern(), etc.)
    Guardrails,  # Class with .recommended(), .strict(), .json(), .pattern(), etc.
    GuardrailRule,
    GuardrailViolation,
    Violation,  # Alias for GuardrailViolation
    JsonAnalysis,
    MarkdownAnalysis,
    LatexAnalysis,
    
    # Structured output
    structured,
    structured_stream,
    structured_object,
    structured_array,
    StructuredResult,
    StructuredStreamResult,
    StructuredConfig,
    StructuredState,
    StructuredTelemetry,
    AutoCorrectInfo,
    MINIMAL_STRUCTURED,
    RECOMMENDED_STRUCTURED,
    STRICT_STRUCTURED,
    
    # Parallel operations
    parallel,
    race,
    sequential,
    batched,
    Parallel,
    ParallelResult,
    ParallelOptions,
    RaceResult,
    AggregatedTelemetry,
    
    # Consensus
    Consensus,
    consensus,
    ConsensusResult,
    ConsensusOutput,
    ConsensusAnalysis,
    ConsensusPreset,
    Agreement,
    Disagreement,
    DisagreementValue,
    FieldAgreement,
    FieldConsensus,
    FieldConsensusInfo,
    
    # Pipeline
    pipe,
    Pipeline,
    PipelineStep,
    PipelineOptions,
    PipelineResult,
    StepContext,
    StepResult,
    create_pipeline,
    create_step,
    chain_pipelines,
    parallel_pipelines,
    create_branch_step,
    FAST_PIPELINE,
    RELIABLE_PIPELINE,
    PRODUCTION_PIPELINE,
    
    # Pool
    OperationPool,
    PoolOptions,
    PoolStats,
    create_pool,
    
    # Adapters
    Adapters,
    Adapter,
    AdaptedEvent,
    OpenAIAdapter,
    OpenAIAdapterOptions,
    LiteLLMAdapter,
    
    # Observability
    EventBus,
    ObservabilityEvent,
    ObservabilityEventType,
    
    # Errors
    Error,
    ErrorCode,
    ErrorContext,
    ErrorCategory,
    FailureType,
    RecoveryStrategy,
    RecoveryPolicy,
    NetworkError,
    NetworkErrorType,
    NetworkErrorAnalysis,
    
    # Window
    Window,
    DocumentWindow,
    DocumentChunk,
    WindowConfig,
    WindowStats,
    ChunkProcessConfig,
    ChunkResult,
    ChunkingStrategy,
    ProcessingStats,
    ContextRestorationOptions,
    ContextRestorationStrategy,
    
    # Continuation
    Continuation,
    ContinuationConfig,
    DeduplicationOptions,
    OverlapResult,
    
    # Drift
    Drift,
    DriftDetector,
    DriftConfig,
    DriftResult,
    
    # State Machine
    StateMachine,
    RuntimeState,
    RuntimeStates,
    StateTransition,
    create_state_machine,
    
    # Metrics
    Metrics,
    MetricsSnapshot,
    create_metrics,
    get_global_metrics,
    reset_global_metrics,
    
    # Event Sourcing
    EventSourcing,
    EventStore,
    EventStoreWithSnapshots,
    InMemoryEventStore,
    EventRecorder,
    EventReplayer,
    EventEnvelope,
    RecordedEvent,
    RecordedEventType,
    Snapshot,
    SerializedError,
    ReplayResult,
    ReplayCallbacks,
    ReplayedState,
    ReplayComparison,
    StreamMetadata,
    
    # Monitoring
    Monitoring,
    OpenTelemetry,
    OpenTelemetryConfig,
    Sentry,
    SentryConfig,
    SemanticAttributes,
    
    # Formatting
    Format,
    
    # Text
    Text,
    NormalizeOptions,
    WhitespaceOptions,
    
    # Comparison
    Compare,
    Difference,
    DifferenceSeverity,
    DifferenceType,
    StringComparisonOptions,
    ObjectComparisonOptions,
    
    # JSON
    JSON,
    AutoCorrectResult,
    CorrectionType,
    
    # JSON Schema
    JSONSchema,
    JSONSchemaAdapter,
    JSONSchemaDefinition,
    JSONSchemaValidationError,
    UnifiedSchema,
    
    # Multimodal
    Multimodal,
    ContentType,
    DataPayload,
    Progress,
    
    # Utilities
    consume_stream,
    get_text,
    enable_debug,
    
    # Version
    __version__,
)

Public Exports

Category	Exports
Core	`wrap`, `run`, `l0` (alias), `Stream`, `LazyStream`, `WrappedClient`, `State`, `Event`, `EventType`
Retry & Timeout	`Retry`, `Timeout`, `TimeoutError`, `BackoffStrategy`, `RetryableErrorType`, `ErrorTypeDelays`, `RETRY_DEFAULTS`, `ERROR_TYPE_DELAY_DEFAULTS`, `MINIMAL_RETRY`, `RECOMMENDED_RETRY`, `STRICT_RETRY`, `EXPONENTIAL_RETRY`
Continuation	`Continuation`, `ContinuationConfig`, `DeduplicationOptions`, `OverlapResult`
Errors	`Error`, `ErrorCode`, `ErrorContext`, `ErrorCategory`, `FailureType`, `RecoveryStrategy`, `RecoveryPolicy`, `NetworkError`, `NetworkErrorType`, `NetworkErrorAnalysis`
Guardrails	`Guardrails`, `GuardrailRule`, `GuardrailViolation`, `Violation`, `JsonAnalysis`, `MarkdownAnalysis`, `LatexAnalysis`
Structured	`structured`, `structured_stream`, `structured_object`, `structured_array`, `StructuredResult`, `StructuredStreamResult`, `StructuredConfig`, `StructuredState`, `StructuredTelemetry`, `AutoCorrectInfo`, `MINIMAL_STRUCTURED`, `RECOMMENDED_STRUCTURED`, `STRICT_STRUCTURED`
Parallel	`parallel`, `race`, `sequential`, `batched`, `Parallel`, `ParallelResult`, `ParallelOptions`, `RaceResult`, `AggregatedTelemetry`
Pipeline	`pipe`, `Pipeline`, `PipelineStep`, `PipelineOptions`, `PipelineResult`, `StepContext`, `StepResult`, `create_pipeline`, `create_step`, `chain_pipelines`, `parallel_pipelines`, `create_branch_step`, `FAST_PIPELINE`, `RELIABLE_PIPELINE`, `PRODUCTION_PIPELINE`
Pool	`OperationPool`, `PoolOptions`, `PoolStats`, `create_pool`
Consensus	`Consensus`, `consensus`, `ConsensusResult`, `ConsensusOutput`, `ConsensusAnalysis`, `ConsensusPreset`, `Agreement`, `Disagreement`, `DisagreementValue`, `FieldAgreement`, `FieldConsensus`, `FieldConsensusInfo`
Adapters	`Adapters`, `Adapter`, `AdaptedEvent`, `OpenAIAdapter`, `OpenAIAdapterOptions`, `LiteLLMAdapter`
Observability	`EventBus`, `ObservabilityEvent`, `ObservabilityEventType`
Window	`Window`, `DocumentWindow`, `DocumentChunk`, `WindowConfig`, `WindowStats`, `ChunkProcessConfig`, `ChunkResult`, `ChunkingStrategy`, `ProcessingStats`, `ContextRestorationOptions`, `ContextRestorationStrategy`
Drift	`Drift`, `DriftDetector`, `DriftConfig`, `DriftResult`
State Machine	`StateMachine`, `RuntimeState`, `RuntimeStates`, `StateTransition`, `create_state_machine`
Metrics	`Metrics`, `MetricsSnapshot`, `create_metrics`, `get_global_metrics`, `reset_global_metrics`
Event Sourcing	`EventSourcing`, `EventStore`, `EventStoreWithSnapshots`, `InMemoryEventStore`, `EventRecorder`, `EventReplayer`, `EventEnvelope`, `RecordedEvent`, `RecordedEventType`, `Snapshot`, `SerializedError`, `ReplayResult`, `ReplayCallbacks`, `ReplayedState`, `ReplayComparison`, `StreamMetadata`
Monitoring	`Monitoring`, `OpenTelemetry`, `OpenTelemetryConfig`, `Sentry`, `SentryConfig`, `SemanticAttributes`
Formatting	`Format`
Text	`Text`, `NormalizeOptions`, `WhitespaceOptions`
Comparison	`Compare`, `Difference`, `DifferenceSeverity`, `DifferenceType`, `StringComparisonOptions`, `ObjectComparisonOptions`
JSON	`JSON`, `AutoCorrectResult`, `CorrectionType`
JSON Schema	`JSONSchema`, `JSONSchemaAdapter`, `JSONSchemaDefinition`, `JSONSchemaValidationError`, `UnifiedSchema`
Multimodal	`Multimodal`, `ContentType`, `DataPayload`, `Progress`
Utilities	`consume_stream`, `get_text`, `enable_debug`
Version	`__version__`

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

L0 API Reference

Table of Contents

Core Functions

wrap(client_or_stream, *, guardrails, retry, timeout, ...)

Wrapping a Client (Recommended)

With Full Configuration

Wrapping a Raw Stream (Simple Cases)

run(stream, *, fallbacks, guardrails, retry, timeout, adapter, on_event, context, continue_from_last_good_token)

wrap() vs run()

WrappedClient

Lifecycle Callbacks

Callback Flow Diagram

Callback Reference

ObservabilityEventType Reference

Usage Example

Streaming Runtime

Unified Event Format

Event Types

Tool Call Handling

State Tracking

Retry Configuration

Retry

BackoffStrategy

Backoff Calculation

Retry Behavior by Error Type

Retry Presets

Centralized Defaults

Custom Retry Logic

shouldRetry (Async Veto Callback)

Key Behavior

should_retry Parameters

calculateDelay

Error Type Delays

Retryable Error Types

Checkpoint Resumption

continue_from_last_good_token

How It Works

ContinuationConfig

State Fields

Smart Continuation Deduplication

How It Works

Configuration

Options

Utility Functions

Examples

Document Windows

Window.create(document, *, size, overlap, strategy)

Window Presets

Processing All Chunks

Merging Results

DocumentChunk

Pipeline

pipe(steps, input, options)

PipelineStep

PipelineOptions

Reusable Pipelines

Conditional Steps

Chaining and Parallel Pipelines

Pipeline Presets

Formatting Helpers

Context Formatting

Memory Formatting

Output Formatting

Tool Formatting

String Utilities

Network Protection

Error Categorization

Network Error Patterns

HTTP Status Code Handling

Structured Output

structured(schema, stream, *, fallbacks, auto_correct, retry, on_validation_error, on_auto_correct, on_event, adapter)

JSON Auto-Correction

Fallback Behavior

Guardrails