Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ValueError in KafkaEvent.decoded_headers when processing non-ASCII characters in headers #6862

Closed
@ipt-jbu

Description

@ipt-jbu

Expected Behaviour

record.decoded_headers should be able to correctly process header values that are represented as an array of signed bytes, without raising a ValueError. The signed bytes should be correctly converted to their unsigned representation before being decoded.

For a header {"test-header": "hello-world-ë"}, which AWS might deliver as {'test-header': [104, 101, 108, 108, 111, 45, 119, 111, 114, 108, 100, -61, -85]}, the decoded_headers property should return a dictionary like:

{'test-header': 'hello-world-ë'}

Current Behaviour

When the Lambda function is invoked with a Kafka record containing a non-ASCII character in a header, the code fails with a ValueError. This happens because the header value is passed by AWS as a list of signed bytes (e.g., [-61, -85] for ë), and the bytes() function in Powertools cannot process the negative numbers.

See the full traceback in the "Debugging logs" section.

Code snippet

from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.parser import event_source
from aws_lambda_powertools.utilities.parser.models import KafkaEvent

@event_source(data_class=KafkaEvent)
def lambda_handler(event: KafkaEvent, context: LambdaContext):
    # This is the event structure we receive from AWS for a header like
    # "test-header": "hello-world-ë"
    # The header value is serialized as:
    # [-27, -122, -103, 108, 111, 45, 119, 111, 114, 108, 100, -61, -85]

    for record in event.records:
        try:
            # The following line will raise the ValueError
            decoded_headers = record.decoded_headers
            print(f"Decoded headers: {decoded_headers}")
        except ValueError as e:
            print(f"Error decoding headers: {e}")
            # Log the raw header value that causes the issue
            for header in record.headers:
                if 'test-header' in header:
                     print(f"Raw header value: {header['test-header']}")

Possible Solution

A potential fix could be to convert the signed bytes to unsigned bytes before calling bytes().

For example:

# Example of converting signed to unsigned bytes for a value like [-61, -85]
signed_bytes = [-61, -85]
unsigned_bytes = [b & 255 for b in signed_bytes] # Results in [195, 171]
bytes(unsigned_bytes).decode('utf-8') # Correctly decodes to 'ë'

Steps to Reproduce

  1. Create a Lambda function with a Kafka trigger.
  2. Use the Powertools @event_source(data_class=KafkaEvent) decorator on the handler.
  3. Type-hint the event parameter as KafkaEvent: def lambda_handler(event: KafkaEvent, context: LambdaContext):.
  4. Send a Kafka message with a header containing a non-ASCII character (e.g., test-header: hello-world-ë).
  5. Inside the handler, try to access the decoded headers of a record: headers = event.records[0].decoded_headers.

Powertools for AWS Lambda (Python) version

3.13.0

AWS Lambda function runtime

3.12

Packaging format used

PyPi

Debugging logs

ValueError: bytes must be in range(0, 256)

at kafka_event.py: 83

Metadata

Metadata

Labels

bugSomething isn't workingpending-releaseFix or implementation already in dev waiting to be released

Type

No type

Projects

Status

Coming soon

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions