Dapr 1.16.5

This update includes bug fixes:

Trace information not populated in pubsub component using GPRC as transport

Problem

The pubsub component did not correctly propagate tracing information when delivering messages over gRPC.

Impact

Distributed traces were incomplete or missing links between publishers and subscribers. This prevented users from reliably correlating pubsub messages with their originating requests and spans.

Root Cause

The gRPC metadata used for pubsub calls did not include the tracing headers expected by downstream services and OpenTelemetry tooling. In particular, the trace context was not consistently attached to outgoing gRPC calls.

Solution

The trace context is now explicitly added to the outgoing gRPC metadata for pubsub calls. This ensures that downstream services receive the necessary tracing information and that spans can be correctly correlated across pubsub message flows.

Allow for OIDC clientSecret to be rotated when token is refreshed in the Pulsar PubSub component

Problem

The Pulsar OAuth2 client in the Go SDK only loads the client secret once at startup, and the Dapr Pulsar component only supports providing the clientSecret as a static value.
This combination prevents rotating the OAuth2 client secret via a file path and breaks authentication when the clientSecret is changed.

Impact

Environments with strict security policies that require periodic rotation of the Pulsar OAuth2 client secret cannot safely rotate secrets.
Once the clientSecret file is updated, token refresh operations may fail because the running client continues using the old secret, leading to authentication errors and potential message flow interruption.

Root Cause

The Dapr Pulsar component exposes clientSecret only as a literal value in metadata, not as a file path, so it cannot take advantage of secret rotation mechanisms based on files.

Solution

The Dapr Pulsar component will add support for specifying clientSecret (privateKey) via a file path in its metadata.

Dapr 1.16.4

This update includes bug fixes:

Workflow logging loop when no pending task completed

Problem

When Daprd is made unhealthy during a workflow execution, pending activity tasks which are made completed will result in looping logs from daprd.

Impact

Daprd will continually print log messages indicating that activity task result has been completed for no pending tasks.

Root Cause

Daprd holds a streaming connection to Schedulers which handles job execution for the Jobs API, Actor Reminders, and workflow execution.
Each stream established has a single set of types which the client supports.
When the app reports as unhealthy, the stream to Schedulers need to be re-established as daprd no longer supports the Jobs API and Actor Reminders while the app is unhealthy.
This restarts the workflow runtime, which clears all pending activity tasks.
Resulting task completed from the previous execution are then received with no pending tasks, causing an internal error.
This error is intentionally retried indefinitely, resulting in a logging loop.

Solution

The error occurring from no pending tasks is now typed as a non-retryable error, preventing the logging loop.

Deleted Jobs in all prefix matching deleted Namespaces

Problem

Deleting a namespace in Kubernetes will delete all the associated jobs in that namespace.
If there are any other namespaces with a name which has a prefix matching the deleted namespace, the jobs in those namespaces will also be deleted (i.e. deleting namespace "test" will also delete jobs in namespace "test-1" or "test-abc").

Impact

Deleting a namespace will delete jobs in other namespaces with prefix matching the deleted namespace.

Root Cause

Prefix logic did not terminate the prefix match with an exact match so that deleting a namespace would delete jobs in other namespaces with prefix matching the deleted namespace.

Solution

The prefix logic has been updated to ensure that only jobs in the exact deleted namespace are deleted.

Dapr 1.16.3

This update includes bug fixes:

Sftp binding not handling reconnections

Sftp binding not handling reconnections

Problem

The SFTP binding, introduced in v1.15.0, did not correctly handle reconnections.
If the SFTP connection was closed externally (outside the Dapr sidecar), the sidecar would not attempt to reconnect.

Impact

In scenarios where the SFTP server or network closed the connection, the Dapr sidecar lost connectivity permanently and required a restart to restore SFTP communication.

Root Cause

The SFTP binding maintained a single long-lived connection and did not attempt to recreate it when operations failed due to network or server-side disconnects.
Once the underlying SFTP/SSH session was closed, subsequent binding operations continued to use the stale connection instead of establishing a new one, leaving the binding in a permanently broken state until the sidecar was restarted.

Solution

A new reconnection mechanism was added to the SFTP binding (PR).
When an SFTP action fails due to a connection issue, the binding now attempts to reconnect to the server and restore connectivity automatically, avoiding the need to restart the sidecar.

Dapr 1.16.2

This update includes bug fixes:

HTTP API default CORS behavior

Problem

In the 1.16.0 release a change was introduced that changed the default behavior of CORS in the Dapr HTTP API. Now by default CORS headers were added to all HTTP responses. However this new behavior couldn't be disabled.

Impact

This caused problems in scenarios where CORS is handled outside of the Dapr sidecar, because the Dapr Sidecar always added CORS headers.

Solution

Revert part of the behavior introduced in this PR and change the default value of allowed-origins flag to be an empty string, and disabling the CORS filter by default.

Scheduler External etcd with multiple client endpoints

Problem

Using Scheduler in non-embed mode with multiple etcd client endpoints was not working.

Impact

It was not possible to use multiple etcd endpoints for high availability with an external etcd database for scheduler.

Root Cause

The Scheduler etcd client endpoints CLI flag was typed as an string array, rather than a string slice, causing the given value to be parsed as a single string rather than a slice of strings.

Solution

Changed the type of the etcd client endpoints CLI flag to be a string slice.

Placement not cleaning internal state after host that had actors disconnects

Problem

An actor host that had actors doesn't get properly cleaned up from placement after the sidecar is scaled down and the placement stream is closed.

Impact

This results in the placement server iterating over namespaces that no longer exist for every tick of the disseminate ticker.

Root Cause

The function requiresUpdateInPlacementTables sould not set isActorHost to false once it is set to true, because once a host has actors the placement server keeps internal state for it and cleanup logic must be executed once the host disconnects.

Solution

Update the logic in requiresUpdateInPlacementTables.

Blocked Placement dissemination during high churn

Problem

Placement would fail to ever, or very slowly, disseminate the actor table in high daprd churn scenarios.

Impact

Actors or workflows would fail to be activated, and existing actors or workflows would fail.

Root Cause

Placement used a "small" (100) queue size which when exhausted would cause a deadlock. Placement would also wait for a fully consumed channel queue before disseminating slowing down the dissemination process.

Solution

Increase the queue size to 10000 and change the dissemination logic to not wait for a fully consumed queue before disseminating.

Blocked Placement dissemination with high Scheduler dataset

Problem

Disseminations would hang for long periods of time when the Scheduler dataset was large.

Impact

Dissemination could take up to hours to complete, causing reminders to not be delivered for a long period of time.

Root Cause

The reminder migration of state store to scheduler reminders does a full decoded scan of the Scheduler database, which would take a long time if there were many entries. During this time the dissemination would be blocked.

Solution

Limit the maximum time spent doing the migration to 3 seconds.
Expose a new global.reminders.skipMigration="true" helm chart value which will skip the migration entirely.

Fix panic during actor deactivation

Problem

Daprd could panic during actor deactivation.

Impact

Daprd sidecar would crash, resulting in downtime for the application.

Root Cause

A race in the actor lock cached memory release and claiming logic meant a stale lock could be used during deactivation, double closing it, and causing a panic.

Solution

Tie the lock's lifecycle to the actor's lifecycle, ensuring the lock is only released when the actor is fully deactivated, and claimed with the actor itself.

OpenTelemetry environment variables support

Problem

OpenTelemetry OTEL_* environment variables were not fully respected, and dapr.io/env annotation parsing broke when values contained =.

Impact

OpenTelemetry resource attributes could not be reliably applied to the Dapr sidecar, degrading trace correlation with application containers, especially on Kubernetes. Configuring OTEL_RESOURCE_ATTRIBUTES via annotations did not work.

Root Cause

Resource creation used manual logic instead of the OpenTelemetry SDK’s environment-based resource detection.
The injector’s environment variable parsing treated = as a hard delimiter, breaking values that include =.

Solution

Adopt the OpenTelemetry SDK’s env-based resource detection so OTEL_* variables (including OTEL_RESOURCE_ATTRIBUTES) are honored.
Fix dapr.io/env parsing to allow values containing =.
Keep the Dapr app ID as the default service name when not overridden.

Fixing goavro bug due to codec state mutation

Problem

The goavro library had a bug where the codec state was mutated during decoding, causing the decoder to panic.

Impact

The goavro library would panic, causing the application to crash.

Root Cause

The goavro library did not correctly handle the codec state, causing it to panic when the codec state was mutated during decoding.

Solution

Update the goavro library to v2.14.1 to fix the bug. Take a more defensive approach, bringing back the old approach that always creates a new codec.

APP_API_TOKEN not passed in gRPC metadata for app callbacks

Problem

When APP_API_TOKEN was configured, the token was not being passed in gRPC metadata for app callbacks including:

PubSub subscriptions
Bindings
Jobs

This meant that applications using gRPC protocol could not authenticate incoming requests from Dapr when using the app API token security feature.

Impact

Applications that configured APP_API_TOKEN to secure their endpoints could not validate that incoming gRPC requests were from their Dapr sidecar. This broke the app API token authentication feature for gRPC applications.

Root Cause

The gRPC subscription delivery, binding, and job callback code paths were directly calling the app's gRPC client without going through the channel layer abstraction. The channel layer is responsible for injecting the APP_API_TOKEN in the dapr-api-token metadata header, but these direct calls bypassed this mechanism.

Solution

Centralized the APP_API_TOKEN injection logic in a helper function (AddAppTokenToContext) in the gRPC channel layer. Updated all gRPC app callback code paths (pubsub subscriptions, bindings, and job callbacks) to use this helper, ensuring the token is consistently added to the outgoing gRPC context metadata. Added comprehensive integration tests to verify token passing for all callback scenarios in both HTTP and gRPC protocols.

Fixed Pulsar OAuth token renewal

Problem

The pulsar pubsub component was not renewing the OAuth token when it expired.

Impact

Applications using the pulsar pubsub component could not receive/publish messages when the OAuth token expired.

Root Cause

There was a bug in the component code that was preventing the OAuth token from being renewed when it expired.

Solution

Fixed the bug in the component code ensuring the OAuth token is renewed when it expires. Also added a test to verify the token renewal functionality. Fixed in dapr/components-contrib#4079

Fix Scheduler connection during non-graceful network interruptions

Problem

Catastrophic failure of scheduler connection during non-graceful network interruptions would not cause the dapr runtime to attempt to reconnect to Scheduler.

Impact

A true host network interruption (e.g. unplugging the network cable) would cause the dapr runtime to only recover connections to Scheduler after roughly 2 hours.

Root Cause

The gRPC KeepAlive parameters were not set correctly, causing the gRPC client to not detect broken connections in a timely manner.

Solution

The server and client KeepAlive parameters are now set to 3 second intervals with a 5 second timeout.

Prevent infinite loop when workflow state is corrupted or destroyed

Problem

Dapr workflows could enter an infinite reminder loop when the workflow state in the actor state store is corrupted or destroyed.

Impact

Dapr workflows would enter an infinite loop of reminder calls.

Root Cause

...

Dapr 1.15.13

This update includes bug fixes:

APP_API_TOKEN not passed in gRPC metadata for app callbacks

Problem

When APP_API_TOKEN was configured, the token was not being passed in gRPC metadata for app callbacks including:

PubSub subscriptions
Bindings
Jobs

This meant that applications using gRPC protocol could not authenticate incoming requests from Dapr when using the app API token security feature.

Impact

Applications that configured APP_API_TOKEN to secure their endpoints could not validate that incoming gRPC requests were from their Dapr sidecar. This broke the app API token authentication feature for gRPC applications.

Root Cause

The gRPC subscription delivery, binding, and job callback code paths were directly calling the app's gRPC client without going through the channel layer abstraction. The channel layer is responsible for injecting the APP_API_TOKEN in the dapr-api-token metadata header, but these direct calls bypassed this mechanism.

Solution

Centralized the APP_API_TOKEN injection logic in a helper function (AddAppTokenToContext) in the gRPC channel layer. Updated all gRPC app callback code paths (pubsub subscriptions, bindings, and job callbacks) to use this helper, ensuring the token is consistently added to the outgoing gRPC context metadata. Added comprehensive integration tests to verify token passing for all callback scenarios in both HTTP and gRPC protocols.

Fixed Pulsar OAuth token renewal

Problem

The pulsar pubsub component was not renewing the OAuth token when it expired.

Impact

Applications using the pulsar pubsub component could not receive/publish messages when the OAuth token expired.

Root Cause

There was a bug in the component code that was preventing the OAuth token from being renewed when it expired.

Solution

Fixed the bug in the component code ensuring the OAuth token is renewed when it expires. Also added a test to verify the token renewal functionality. Fixed in dapr/components-contrib#4079

This is the release candidate 1.16.2-rc.2

This is the release candidate 1.16.2-rc.1

Dapr 1.16.1

This update includes bug fixes:

Actor Initialization Timing Fix

Problem

When running Dapr with an --app-port specified but no application listening on that port (either due to no server or delayed server startup), the actor runtime would initialize immediately before the app channel was ready. This created a race condition where actors were trying to communicate with an application that wasn't available yet, resulting in repeated error logs:

WARN[0064] Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…
DEBU[0064] Error for operation DaprBuiltInActorNotFoundRetries was: failed to lookup actor: api error: code = FailedPrecondition desc = did not find address for actor

Impact

This created a poor user experience with confusing error messages when users specified an --app-port but had no application listening on that port.

Root cause

The actor runtime initialization was occurring before the application channel was ready, creating a race condition where actors attempted to communicate with an unavailable application.

Solution

Defer actor runtime initialization until the application channel is ready. The runtime now:

Defers actor runtime initialization until the application is listening on the specified port
Provides informative waiting for application to listen on port XXXX messages instead of confusing error logs
Prevents actor lookup errors during startup

Sidecar Injector Crash with Disabled Scheduler

Problem

The sidecar injector crashes with error (dapr-scheduler-server StatefulSet not found) when the scheduler is disabled via Helm chart (global.scheduler.enabled: false).

Impact

The crash prevents the sidecar injector from functioning correctly when the scheduler is disabled, disrupting deployments.

Root cause

A previous change caused the dapr-scheduler-server StatefulSet to be removed when the scheduler was disabled, instead of scaling it to 0 as originally intended. The injector, hardcoded to check for the StatefulSet in the injector.go file, fails when it is not found.

Solution

Revert the behavior to scale the dapr-scheduler-server StatefulSet to 0 when the scheduler is disabled, instead of removing it, as implemented in the Helm chart.

Workflow actors reminders stopped after Application Health check transition

Problem

Application Health checks transitioning from unhealthy to healthy were incorrectly configuring the scheduler clients to stop watching for actor reminder jobs.

Impact

The misconfiguration in the scheduler clients made workflows to stop executing because reminders no longer executed.

Root cause

On Application Health change daprd was able to trigger an actors update for an empty slice, which caused a scheduler client reconfiguration. However because there were no changes in the actor types, daprd never received a new version of the placement table which caused the scheduler clients to get misconfigured. This happens because when daprd sends an actor types update to the placement server daprd wipes out the known actor types in the scheduler client, and because daprd never received an acknowledgement from placement with a new table version then the scheduler client never got updated back with the actor types.

Solution

Prevent any changes to hosted actor types if the input slice is empty

Fix Scheduler Etcd client port networking in standalone mode

Problem

The Scheduler Etcd client port is not available when running in Dapr CLI standalone mode.

Impact

Cannot perform Scheduler Etcd admin operations in Dapr CLI standalone mode.

Root cause

The Scheduler Etcd client port is only listened on localhost.

Solution

The Scheduler Etcd client listen address is now configurable via the --scheduler-etcd-client-listen-address CLI flag, meaning port can be exposed when running in standalone mode.

Fix Helm chart not honoring --etcd-embed argument

Problem

The Scheduler would always treat --etcd-embed as true, even when set to false in the context of the Helm chart.

Impact

Cannot use external etcd addresses since Scheduler would always assume embedded etcd is used.

Root cause

The Helm template format treated the boolean argument as a seperate argument rather than inline.

Solution

The template format string was fixed to allow for .etcdEmbed to be set to false.

Component initialization timeout check before using reporter

Problem

The Component init timeout was checked after using the component reporter

Impact

This misalignment could lead to false positives, dapr could have reported success when later dapr was returning an error due the timeout check

Solution

Move the timeout check to be right after the actual component initialization and before the component reporter

Fix Regression in pubsub.kafka Avro Message Publication

Problem

The pubsub.kafka component failed to publish Avro messages in Dapr 1.16, breaking existing workflows.

Impact

Avro messages could not be published correctly, causing failures in Kafka message pipelines and potential data loss or dead-lettering issues.

Root cause

The Kafka pubsub component did not correctly create codecs in the SchemaRegistryClient. Additionally, the goavro library had a bug converting default null values that broke legitimate schemas.

Solution

Enabled codec creation in the Kafka SchemaRegistryClient and upgraded github.com/linkedin/goavro/v2 from v2.13.1 to v2.14.0 to fix null value handling. Metadata options useAvroJson and excludeHeaderMetaRegex were validated to ensure correct message encoding and dead-letter handling. Manual tests confirmed Avro and JSON message publication works as expected.

Ensure Files are Closed Before Reading in SFTP Component

Problem

Some SFTP servers require files to be closed before they become available for reading. Without closing, read operations could fail or return incomplete data.

Impact

SFTP file reads could fail or return incomplete data on certain servers, causing downstream processing issues.

Root cause

The SFTP component did not explicitly close files after writing, which some servers require to make files readable.

Solution

Updated the SFTP component to close files after writing, ensuring they are available for reading on all supported servers.

Fix AWS Secrets Manager YAML Metadata Parsing

Problem

The AWS Secrets Manager component failed to correctly parse YAML metadata, causing boolean fields like multipleKeyValuesPerSecret to be misinterpreted.

Impact

Incorrect metadata parsing could lead to misconfiguration, preventing secrets from being retrieved or handled properly.

Root cause

The component used a JSON marshal/unmarshal approach in getSecretManagerMetadata, which did not handle string-to-boolean conversion correctly for YAML metadata.

Solution

Replaced JSON marshal/unmarshal with kitmd.DecodeMetadata to correctly parse YAML metadata and convert string fields to their proper types, ensuring multipleKeyValuesPerSecret works as expected.

Reuse Kafka Clients in AWS v2 Migration

Problem

After migrating to the AWS v2 Kafka client, a new client was created for every message published, causing inefficiency and unnecessary resource usage.

Impact

Frequent client creation led to performance degradation, increased connection overhead, and potential resource exhaustion during high-throughput message publishing.

Root cause

The AWS v2 client integration did not implement client reuse, resulting in a new client being instantiated for each publish operation.

Solution

Updated the Kafka component to reuse clients instead of creating a new one for each message, improving performance and resource efficiency.

Fix Kafka AWS Authentication Configuration Bug

Problem

The Kafka AWS authentication configuration was not initialized correctly, causing authentication failures.

Impact

Kafka components using AWS authentication could fail to connect, preventing message publishing and consumption.

Root cause

A bug in the Kafka AWS auth config initialization prevented proper setup of authentication parameters.

Solution

Fixed the initialization...

This is the release candidate 1.16.1-rc.3

This is the release candidate 1.16.1-rc.2

Releases: dapr/dapr

Dapr Runtime v1.16.5

Dapr 1.16.5

Trace information not populated in pubsub component using GPRC as transport

Problem

Impact

Root Cause

Solution

Allow for OIDC clientSecret to be rotated when token is refreshed in the Pulsar PubSub component

Problem

Impact

Root Cause

Solution

Uh oh!

Dapr Runtime v1.16.4

Dapr 1.16.4

Workflow logging loop when no pending task completed

Problem

Impact

Root Cause

Solution

Deleted Jobs in all prefix matching deleted Namespaces

Problem

Impact

Root Cause

Solution

Uh oh!

Dapr Runtime v1.16.3

Dapr 1.16.3

Sftp binding not handling reconnections

Problem

Impact

Root Cause

Solution

Uh oh!

Dapr Runtime v1.16.2

Dapr 1.16.2

HTTP API default CORS behavior

Problem

Impact

Solution

Scheduler External etcd with multiple client endpoints

Problem

Impact

Root Cause

Solution

Placement not cleaning internal state after host that had actors disconnects

Problem

Impact

Root Cause

Solution

Blocked Placement dissemination during high churn

Problem

Impact

Root Cause

Solution

Blocked Placement dissemination with high Scheduler dataset

Problem

Impact

Root Cause

Solution

Fix panic during actor deactivation

Problem

Impact

Root Cause

Solution

OpenTelemetry environment variables support

Problem

Impact

Root Cause

Solution

Fixing goavro bug due to codec state mutation

Problem

Impact

Root Cause

Solution

APP_API_TOKEN not passed in gRPC metadata for app callbacks

Problem

Impact

Root Cause