-
Notifications
You must be signed in to change notification settings - Fork 97
[WebConsole] Implement Samply profile collection UI #5423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9a32617 to
06565a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a user interface for collecting and downloading Samply CPU profiles in the Feldera web console. The "Profiler" tab has been renamed to "Dataflow Visualizer" to differentiate it from the new Samply profiling functionality.
Changes:
- Added new Samply profile collection UI with configurable duration and visual progress indicator
- Implemented backend support for tracking in-progress profile collection with expected completion times
- Updated API to return status 202 (Accepted) when starting collection and 307 (Temporary Redirect) when profile is in progress
- Enhanced error message formatting by preserving newlines in error details
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| openapi.json | Updated API responses: 202 for profile start, 307 for in-progress, changed error details schema to Value |
| js-packages/web-console/src/lib/services/pipelineManager.ts | Added collectSamplyProfile and getLatestSamplyProfile wrapper functions |
| js-packages/web-console/src/lib/services/manager/types.gen.ts | Auto-generated type updates from OpenAPI changes |
| js-packages/web-console/src/lib/compositions/useToastNotification.ts | Added whitespace-pre-wrap to preserve newlines in error messages |
| js-packages/web-console/src/lib/compositions/usePipelineManager.svelte.ts | Integrated new Samply profile API functions |
| js-packages/web-console/src/lib/components/pipelines/editor/TabSamplyProfile.svelte | New component implementing profile collection UI with countdown timer and progress visualization |
| js-packages/web-console/src/lib/components/pipelines/editor/TabProfileVisualizer.svelte | Renamed label to "Dataflow Visualizer" |
| js-packages/web-console/src/lib/components/pipelines/editor/MonitoringPanel.svelte | Added Samply tab to monitoring panel |
| js-packages/web-console/src/lib/components/pipelines/editor/InteractionPanel.svelte | Added Samply tab to interaction panel |
| js-packages/web-console/src/lib/components/pipelines/editor/DownloadSupportBundle.svelte | Removed Oxford comma from tooltip |
| js-packages/web-console/package.json | Updated build-openapi command to use -p flag |
| js-packages/web-console/.prettierignore | Added .svelte-kit/ and build/ directories |
| crates/pipeline-manager/src/api/endpoints/pipeline_interaction.rs | Updated OpenAPI documentation for status codes |
| crates/feldera-types/src/error.rs | Changed error details schema from Object to Value |
| crates/adapters/src/server/error.rs | Changed error details schema from Object to Value |
| crates/adapters/src/server.rs | Implemented InProgress state handling and immediate error detection |
| crates/adapters/src/samply.rs | Added InProgress variant to SamplyProfile enum |
| crates/adapters/src/controller.rs | Removed newline escaping in error messages |
| let isCollecting = $state(false) | ||
| let startTime = $state<number | null>(null) | ||
| let expectedCompletion = $state<number | null>(null) | ||
| let profileReady = $state(false) |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The profileReady state variable is set to true when the countdown completes (line 109) but is never used anywhere in the component. This state variable should either be removed if it's not needed, or used to provide visual feedback to the user that the profile is ready for download.
| let profileReady = $state(false) |
| if (profile.expectedInSeconds) { | ||
| // Profile is still being collected, update countdown with the server's expected time | ||
| const now = Date.now() | ||
| expectedCompletion = now + profile.expectedInSeconds * 1000 | ||
|
|
||
| // Ensure collecting state is active | ||
| if (!isCollecting) { | ||
| isCollecting = true | ||
| profileReady = false | ||
| startTime = now | ||
| startCountdown() | ||
| } |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When synchronizing the countdown with the server's expected time in the download handler, the duration variable should also be updated to match the remaining time. Without this, if a user comes back later and clicks download while a profile is being collected, the UI progress bar and time display will be inconsistent because they're calculated based on the duration variable (see lines 44-61), not the actual remaining time from the server.
|
|
||
| Ok(HttpResponse::Ok().finish()) | ||
| // Wait to check if it errored out immediately | ||
| sleep(Duration::from_millis(600)).await; |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard-coded 600ms wait time lacks documentation explaining why this specific duration was chosen. Consider adding a comment explaining that this delay allows the system to detect if profiling fails immediately, or extracting this as a named constant with a descriptive name.
| </script> | ||
|
|
||
| {#snippet label()} | ||
| <span class=""> CPU Profile </span> |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tab label "CPU Profile" is inconsistent with the tab name "Samply" used in the tab configuration. Consider using a more consistent label such as "Samply Profile" or aligning the tab name with "CPU Profile" to avoid confusion.
| <span class=""> CPU Profile </span> | |
| <span class=""> Samply Profile </span> |
| // Set the state to InProgress with expected completion time | ||
| let expected_after = chrono::Utc::now() + chrono::Duration::seconds(duration as i64); | ||
| *state_samply_profile.lock().unwrap() = SamplyProfile::InProgress { expected_after }; | ||
|
|
||
| spawn(async move { | ||
| let result = controller.async_samply_profile(duration).await; | ||
| state_samply_profile.lock().unwrap().update(result); | ||
| }); |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no protection against concurrent profile collection requests. If multiple requests to start profiling are made while a profile is already being collected, the state will be overwritten and multiple profiling tasks will run concurrently, competing to update the shared state. Consider adding a check to return an error (e.g., 409 Conflict) if a profile collection is already in progress (when state is InProgress).
83e0e21 to
27c3c4c
Compare
|
"dataflow graph" is a completely static notion, it does not have any relationship with performance. |
I thought "profiler" was perfect? What are we trying to disambiguate? |
|
I agree with @ryzhyk . "Profiler" was perfect. |
27c3c4c to
4527d3c
Compare
45d39a2 to
7cff437
Compare
|
please work with abhinav so this is an advertised feature in /v0/config and the tab is not displayed if the feature is not enabled in the backend |
7cff437 to
594182a
Compare
crates/adapters/src/samply.rs
Outdated
| pub(crate) type SamplyProfile = Option<Result<Vec<u8>, String>>; | ||
|
|
||
| // Minimal perf_event_attr structure for capability checking | ||
| // We only need the fields that are used in the check | ||
| #[repr(C)] | ||
| struct PerfEventAttr { | ||
| type_: u32, | ||
| size: u32, | ||
| config: u64, | ||
| // The rest of the struct is zeroed, which is fine for our minimal check | ||
| _padding: [u8; 112], // perf_event_attr is 136 bytes total, we've used 16 | ||
| } | ||
|
|
||
| // perf_event_open constants | ||
| const PERF_TYPE_SOFTWARE: u32 = 1; | ||
| const PERF_COUNT_SW_CPU_CLOCK: u64 = 0; | ||
|
|
||
| /// Check if profiling is available by attempting a minimal perf_event_open syscall. | ||
| /// | ||
| /// Returns `Ok(())` if profiling is available, or `Err(reason)` if not. | ||
| /// This is a low-overhead check that doesn't actually start any profiling. | ||
| pub(crate) fn check_profiling_available() -> Result<(), String> { | ||
| // Use perf_event_open to check if we have the necessary permissions. | ||
| // We use a minimal configuration that should succeed if we have PERFMON capability. | ||
| unsafe { | ||
| // Initialize perf_event_attr structure with zeros | ||
| let mut attr: PerfEventAttr = std::mem::zeroed(); | ||
| attr.type_ = PERF_TYPE_SOFTWARE; | ||
| attr.size = std::mem::size_of::<PerfEventAttr>() as u32; | ||
| attr.config = PERF_COUNT_SW_CPU_CLOCK; | ||
|
|
||
| // Try to open a perf event for the current process on any CPU | ||
| // pid=0 means current process, cpu=-1 means any CPU | ||
| let fd = libc::syscall( | ||
| libc::SYS_perf_event_open, | ||
| &attr as *const PerfEventAttr, | ||
| 0i32, // pid: current process | ||
| -1i32, // cpu: any CPU | ||
| -1i32, // group_fd: no group | ||
| 0u64, // flags: none | ||
| ); | ||
|
|
||
| if fd < 0 { | ||
| let errno = *libc::__errno_location(); | ||
| let error = std::io::Error::from_raw_os_error(errno); | ||
|
|
||
| // Check for permission-related errors | ||
| if error.kind() == ErrorKind::PermissionDenied | ||
| || errno == libc::EACCES | ||
| || errno == libc::EPERM | ||
| { | ||
| return Err(format!( | ||
| "CPU profiling is not enabled. In Kubernetes environments, ensure the Helm flag `pipeline.allowProfiling` is set to true. ({})", | ||
| error | ||
| )); | ||
| } | ||
| // Other errors might be transient or unrelated to permissions | ||
| return Err(format!("perf_event_open failed: {}", error)); | ||
| } | ||
|
|
||
| // Successfully opened - close the file descriptor | ||
| libc::close(fd as i32); | ||
| } | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, it might just be simpler to just invoke samply profiling once for a second, and see if it gives an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the plan, it looks like all other methods need tuning. I plan to do this check in a different PR to get this one out. Without it the user will just see an error popup when profiling is not enabled and they try to collect one.
e96217a to
4b46129
Compare
|
|
||
| // Check if profiling is still running or failed immediately | ||
| let samply_state = state.samply_state.lock().unwrap(); | ||
| Ok(match samply_state.samply_status { | ||
| // Profile is still running - return success | ||
| SamplyStatus::InProgress { .. } => HttpResponse::Accepted().finish(), | ||
| // Profile completed during wait - check if it failed | ||
| SamplyStatus::Idle => match &samply_state.last_profile { | ||
| Some(Err(error)) => samply_profile_error_response(error), | ||
| _ => HttpResponse::InternalServerError().json(ErrorResponse { | ||
| message: "samply profiling completed unexpectedly".to_string(), | ||
| error_code: "SamplyProfilingUnexpectedCompletion".into(), | ||
| details: serde_json::Value::Null, | ||
| }), | ||
| }, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if waiting and checking the status this way is ideal. This api is meant to be async.
This api, should:
- trigger a profiling session
- or, report that it is already in progress
Then, if an error occurs when profiling, the onus is on the client to call the GET api and check for success / failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I encountered was when initiating a profile on a misconfigured system I got an HTTP 202 Accepted response, but Samply refused to run immediately due to system profiling permissions. I believe this needs to be seen immediately, even at a cost of <1 s delay - otherwise you need to introduce polling for the status of Samply which is an unnecessary complexity. Now you just make sure that the profile collection has started, and you know how long to wait for the result.
The API is still conceptually async, but it handles a startup failure.
4b46129 to
771d9ec
Compare
f994856 to
708e5a0
Compare
abhizer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through only the Rust parts
Signed-off-by: Karakatiza666 <[email protected]>
Hide 'Download profile' button until one is available Signed-off-by: Karakatiza666 <[email protected]>
…because it does not work as expected Signed-off-by: Karakatiza666 <[email protected]>
Signed-off-by: Karakatiza666 <[email protected]>
0f561cb to
782b758
Compare
Signed-off-by: feldera-bot <[email protected]>
I renamed the "Profiler" tab to "Dataflow visualizer" to disambiguate the new functionality.
I also updated API response status code of some Samply responses, added a way to retrieve the estimate for the profile already being collected, and added a way to catch if the profile collection failed immediately by waiting for a set time after the profile was requested