Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

adarsh0728
Copy link
Member

@adarsh0728 adarsh0728 commented Apr 5, 2025

Refer numaproj/numaflow#2479

A new errors pkg with PersistCriticalError functionality for the user to persist a critical error and ultimately view it in the UI.

Go SDK PR numaproj/numaflow-go#189

Java SDK PR numaproj/numaflow-java#178

Copy link

codecov bot commented Apr 5, 2025

Codecov Report

Attention: Patch coverage is 93.75000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 94.19%. Comparing base (157a90d) to head (d2fc341).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pynumaflow/errors/errors.py 90.90% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
- Coverage   94.20%   94.19%   -0.01%     
==========================================
  Files          55       58       +3     
  Lines        2311     2359      +48     
  Branches      119      119              
==========================================
+ Hits         2177     2222      +45     
- Misses         97      100       +3     
  Partials       37       37              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: adarsh0728 <[email protected]>
Signed-off-by: adarsh0728 <[email protected]>
@adarsh0728 adarsh0728 marked this pull request as ready for review April 7, 2025 00:54
@adarsh0728 adarsh0728 self-assigned this Apr 7, 2025
@adarsh0728
Copy link
Member Author

adarsh0728 commented Apr 7, 2025

Testing

Only one file created when multiple threads called persist_critical_error in sourcetransform/example.py

1743998604-udf.json
root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors/transformer# cat 1743998604-udf.json 
{"container": "transformer", "timestamp": 1743998604, "code": "500", "message": "Critical error in my_handler", "details": "Simulated error for testing with multiple threads"}root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors/transformer# 

Other calls received error:

ERROR:root:Thread failed with error: Persist critical error function has already been executed.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.

@kohlisid
Copy link
Contributor

kohlisid commented Apr 8, 2025

@adarsh0728 This is good to review?

@adarsh0728
Copy link
Member Author

@adarsh0728 This is good to review?

yes, please.

@kohlisid
Copy link
Contributor

kohlisid commented Apr 8, 2025

@adarsh0728 Could you please verify the functionality with an async example as well? That would be helpful

@adarsh0728
Copy link
Member Author

adarsh0728 commented Apr 8, 2025

@adarsh0728 Could you please verify the functionality with an async example as well? That would be helpful

Testing Async Sink example

Error persisted only once.

2025-04-08 06:05:16 INFO     Async GRPC Server listening on: unix:///var/run/numaflow/sink.sock with max threads: 4
INFO:pynumaflow._constants:Async GRPC Server listening on: unix:///var/run/numaflow/sink.sock with max threads: 4
INFO:root:Error persisted successfully by thread.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.
ERROR:root:Thread failed with error: Persist critical error function has already been executed.

File Structure

root@simple-mono-vertex-mv-0:/var/numaflow# ls
runtime
root@simple-mono-vertex-mv-0:/var/numaflow# cd runtime/
root@simple-mono-vertex-mv-0:/var/numaflow/runtime# ls
application-errors
root@simple-mono-vertex-mv-0:/var/numaflow/runtime# cd application-errors/
root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors# ls
udsink
root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors# cd udsink/
root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors/udsink# ls
1744092318-udf.json
root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors/udsink#
root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors/udsink# cat 1744092318-udf.json 
{"container": "udsink", "timestamp": 1744092318, "code": "500", "message": "Critical error in my_handler", "details": "Simulated error for testing with multiple threads"}root@simple-mono-vertex-mv-0:/var/numaflow/runtime/application-errors/udsink# 

Code Change in udsink_handler for running multiple threads

async def udsink_handler(datums: AsyncIterable[Datum]) -> Responses:
    responses = Responses()


    "call persist_critical_error with multiple threads at the same time."

        # Define the function to call persist_critical_error
    def call_persist_critical_error():
        error_code = "500"
        error_message = "Critical error in my_handler"
        error_details = "Simulated error for testing with multiple threads"
        result = persist_critical_error(error_code, error_message, error_details)
        if result:
            logging.error("Thread failed with error: %s", result)
        else:
            logging.info("Error persisted successfully by thread.")

    # Create multiple threads to call persist_critical_error
    num_threads = 10
    threads = []
    for _ in range(num_threads):
        thread = threading.Thread(target=call_persist_critical_error)
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

error_code = error_code or INTERNAL_ERROR
current_timestamp = int(time.time())

runtime_error_entry = _RuntimeErrorEntry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we use this only for runtime_error_entry.to_dict()
Do we need a separate class declaration for this?
Is it expected to be used anywhere else as well?

We can directly create an inline dict object

runtime_error = {
    "container": CONTAINER_TYPE,
    "timestamp": current_timestamp,
    "code": error_code,
    "message": error_message,
    "details": error_details,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, it can be used to persist other errors, format would remain same. Also, we want to remain consistent across SDK's to have a separate class for RuntimeErrorEntry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case if you wish to keep this around. Can we convert this to a data class if this is the current use case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: adarsh0728 <[email protected]>
Signed-off-by: adarsh0728 <[email protected]>
Signed-off-by: adarsh0728 <[email protected]>
@adarsh0728 adarsh0728 merged commit 6ccd49e into main Apr 10, 2025
11 checks passed
@adarsh0728 adarsh0728 deleted the error-util branch April 10, 2025 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants