Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: Zipstack/unstract

Tags

v0.138.14

Toggle v0.138.14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2901 [FIX] Prevent invalid status updates (EXECUTING/ERROR) from d…

…uplicate file processing runs (#1606)

* UN-2901 [FIX] Prevent invalid status updates (EXECUTING/ERROR) from duplicate file processing runs

Fixes race condition where late-arriving workers overwrite COMPLETED status
with invalid EXECUTING or ERROR states, causing files to appear failed/stuck
even though processing succeeded.

Changes:
- FileAPIClient: Fixed URL construction and method call bugs
- Fresh DB validation: Check current status before updating to EXECUTING
- Grace period optimization: Early exit when duplicate detected during tool polling
- File count accuracy: Include skipped files in total_files calculation

Impact: Files now correctly maintain COMPLETED status; no duplicate processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* coderabbit commnets addressed

---------

Co-authored-by: Claude <[email protected]>

v0.138.13

Toggle v0.138.13's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2901 [FIX] Container startup race condition with polling grace per…

…iod (#1602)

* UN-2901 [FIX] Container startup race condition with polling grace period

* UN-2901 [FIX] Add Redis retry resilience and fix container failure detection

- Add configurable Redis retry decorator with exponential backoff
- Fix critical bug where containers that never start are marked as SUCCESS
- Add robust env var validation for retry configuration
- Apply retry logic to FileExecutionStatusTracker and ToolExecutionTracker
- Document REDIS_RETRY_MAX_ATTEMPTS and REDIS_RETRY_BACKOFF_FACTOR env vars

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* UN-2901 [FIX] Address CodeRabbitAI review feedback for race condition fix

This commit addresses all valid CodeRabbitAI review comments on PR #1602:

1. **Fix retry loop semantics**: Changed retry loop to use range(max_retries + 1)
   where max_retries means "retries after initial attempt", not total attempts.
   Updated default from 5 to 4 (total 5 attempts) for clarity.

2. **Fix TypeError in file_execution_tracker.py**: Fixed json.loads() receiving
   dict instead of string by using string fallback values.

3. **Fix unsafe env var parsing**: Added _safe_get_env_int/_safe_get_env_float
   helpers with validation and fallback to defaults with warning logs.

4. **Fix status None check**: Added defensive None check before calling .get()
   on status dict in grace period reset logic.

5. **Update sample.env defaults**: Changed REDIS_RETRY_MAX_ATTEMPTS from 5 to 4
   and updated comments to clarify retry semantics.

6. **Improve transient failure handling**: Changed logger.error to logger.warning
   for transient status fetch failures, added sleep before continue to respect
   polling interval and avoid API hammering.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>

v0.138.12

Toggle v0.138.12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2897 [FIX] Google Drive connector SIGSEGV crashes in Celery ForkPo…

…olWorker processes (#1597)

UN-2897 [FIX] Google Drive connector SIGSEGV crashes in Celery ForkPoolWorker

Implements lazy initialization for Google Drive API client to prevent
segmentation faults when Celery forks worker processes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>

v0.138.11

Toggle v0.138.11's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2893 [FIX] Fix duplicate process handling status updates and UI er…

…ror logs (#1594)

* UN-2893 [FIX] Fix duplicate process handling status updates and UI error logs

Prevent duplicate worker processes from updating file execution status
and showing UI error logs during GKE race conditions.

- Added is_duplicate_skip flag to FileProcessingResult dataclass
- Fixed destination_processed default value for correct duplicate detection
- Skip status updates and UI logs when duplicate is detected
- Only first worker updates status, second worker silently exits

* logger.error converted to logger.exception

* error to exception in logs

v0.138.10

Toggle v0.138.10's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2889 [FIX] Handle Celery logger with empty request_id to prevent S…

…IGSEGV crashes (#1591)

* UN-2889 [FIX] Handle Celery logger with empty request_id to prevent SIGSEGV crashes

- Simplified logging filters into RequestIDFilter and OTelFieldFilter
- Removed custom DjangoStyleFormatter and StructuredFormatter classes
- Removed Celery's worker_log_format config that created formatters without filters
- Removed LOG_FORMAT environment variable and all format options
- All workers now use single standardized format with filters always applied

* addressd coderabiit comment

* addressd coderabiit comment

v0.138.9

Toggle v0.138.9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2882 [FIX] Fix BigQuery float precision issue in metadata serializ…

…ation (#1589)

* Fix BigQuery float precision issue by normalizing floats before JSON serialization

- Added _sanitize_floats_for_database() helper function to recursively normalize
  float values to 6 decimal precision using string formatting
- Modified _add_processing_columns() to sanitize metadata before json.dumps()
- Fixes BigQuery insertion failures caused by floats that can't round-trip
  through string representation (e.g., 22.770092)
- Solution normalizes internal binary representation via float(f"{x:.6f}")
- Handles edge cases: NaN and Infinity converted to None
- Works recursively on nested dicts/lists
- Backward compatible, preserves meaningful precision
- Protects all database types (BigQuery, PostgreSQL, MySQL, Snowflake)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* addressing PR comments add sanitized method over data feild too

* moved math import to top of the file

---------

Co-authored-by: Claude <[email protected]>

v0.138.8

Toggle v0.138.8's commit message
Fix organization context pollution in shared HTTP sessions

- Remove X-Organization-ID from session headers in _setup_session()
- Remove X-Organization-ID from set_organization_context() method
- Update clear_organization_context() to only clear instance variables
- Use per-request headers in _make_request() to prevent pollution

This prevents callback workers from inheriting wrong organization context
when using shared HTTP sessions with singleton pattern.

Fixes: UN-2877

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

v0.138.7

Toggle v0.138.7's commit message
UN-2866 [FIX] Fix duplicate detection parameter name mismatch causing…

… false positives on worker retry

Fixed parameter name from 'exclude_execution_id' to 'current_execution_id'
in worker API client to match backend endpoint expectations. This allows
worker retries after pod crashes to properly exclude current execution
from duplicate detection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

v0.138.6

Toggle v0.138.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2871 [FEATURE] Log sharing across shared workflows/deployments (#1580

)

* UN-2871 [FEATURE] Add shared workflow executions filter to enable multi-user access

Update WorkflowExecutionManager.for_user() to include executions from workflows shared with users, ensuring consistent access control across workflow and execution models.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Update backend/workflow_manager/workflow_v2/models/execution.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Rahul Johny <[email protected]>

* UN-2871 [FIX] Move Q import to top-level for PEP8 compliance

Move django.db.models.Q import from function-level to module-level to comply with linting standards and improve code organization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* UN-2871 [SECURITY] Fix execution filtering to respect independent workflow and deployment sharing

Update WorkflowExecutionManager.for_user() to properly handle independent sharing between workflows and API deployments/pipelines. Previous implementation only checked workflow sharing, allowing users to see executions for unshared deployments.

Key changes:
- Add separate filters for API deployments and pipelines access
- Implement proper logic for independent sharing scenarios:
  * Workflow shared + no pipeline -> User sees workflow-level executions
  * API/Pipeline shared (regardless of workflow) -> User sees those executions
  * Both shared -> User sees all related executions
  * Neither shared -> User cannot see executions

This ensures users can only view executions for resources they have explicit access to, preventing unauthorized data exposure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* UN-2871 [PERF] Optimize ExecutionFilter to use EXISTS instead of values_list for large datasets

Replace inefficient values_list() queries with EXISTS subqueries in filter_execution_entity().
This significantly improves performance when filtering by entity type on large datasets.

Performance improvements:
- API filter: Uses EXISTS check instead of fetching all API deployment IDs
- ETL filter: Uses EXISTS check instead of fetching all ETL pipeline IDs
- TASK filter: Uses EXISTS check instead of fetching all TASK pipeline IDs
- Workflow filter: Simplified to use isnull check (removed redundant workflow_id filter)

EXISTS is more efficient because:
1. Stops at first match (short-circuits)
2. Doesn't transfer data from database to application
3. Better query optimizer hints for the database
4. Reduced memory usage

The queryset is already filtered by user permissions via get_queryset(),
so this change only optimizes the entity type filtering step.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* UN-2871 [FIX] Move Exists and OuterRef imports to module level for PEP8 compliance

Move django.db.models.Exists and OuterRef imports from function-level to module-level to comply with linting standards and improve code organization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Signed-off-by: Rahul Johny <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

v0.138.5

Toggle v0.138.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
UN-2869 [FIX] Add broker heartbeat configuration to prevent RabbitMQ …

…connection timeouts causing false duplicate detection (#1578)

UN-2869 [FIX] Add broker heartbeat configuration to prevent RabbitMQ connection timeouts

This fix addresses false duplicate file detection caused by stale IN_PROGRESS
records when RabbitMQ disconnects idle workers after 60 seconds.

Changes:
- Added broker_heartbeat=30s to WorkerCeleryConfig in workers/shared/models/worker_models.py
- Configurable via CELERY_BROKER_HEARTBEAT env var (default: 30s)
- Prevents RabbitMQ connection drops during long-running tasks
- Eliminates stale cache/DB entries that cause false duplicate detection

Technical Details:
- RabbitMQ default timeout: 60 seconds
- Recommended heartbeat: 30 seconds (half of timeout)
- Uses get_celery_setting() for hierarchical config: worker-specific -> global -> default

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>