Thanks to visit codestin.com
Credit goes to github.com

Skip to content

cubelet: add diagnostic context to newExt4RawByReflinkCopy errors#237

Merged
kinwin-ustc merged 1 commit into
TencentCloud:masterfrom
WaylandYang:feat/reflink-copy-error-diagnostics
May 18, 2026
Merged

cubelet: add diagnostic context to newExt4RawByReflinkCopy errors#237
kinwin-ustc merged 1 commit into
TencentCloud:masterfrom
WaylandYang:feat/reflink-copy-error-diagnostics

Conversation

@WaylandYang

Copy link
Copy Markdown
Contributor

Implements proposal (2) from #235.

What it does

When newExt4RawByReflinkCopy fails, the returned error now appends a single-line diagnostic suffix:

newExt4RawByReflinkCopy failed:<stderr> [step=3/4 cmd="e2fsck -fy /a/target.raw" elapsed=812ms target=size=3221225472 base=size=536870912 free=5368709120B]

Fields: command index (1-of-N), the failing argv, elapsed time, target/base file stats (size or "missing"), and free bytes on the target's volume.

Format choices

  • Suffix on the existing message, not a replacement — any string-matching callers (none found in this repo, but defensive) still see the same "newExt4RawByReflinkCopy failed:<stderr>" prefix.
  • Single line so audit / log pipelines that split on \n don't separate the diagnostic from the underlying error.
  • Best-effort statsstat / statfs errors are reported inline ("stat err=<msg>") instead of failing the diagnostic itself; the caller already has a real error to return.

Out of scope (deliberately)

  • newExt4BaseRaw and newExt4RawByCopy have the same opaque-error shape and would benefit from the same describeStorageFailure helper, but the issue named newExt4RawByReflinkCopy specifically. Happy to extend to the other two in a follow-up if you'd like.
  • The helper is diagnostic-only; no retry / structured error refactor.

Tested

  • go vet ./storage/... — clean
  • go build ./storage/... — clean
  • gofmt -l — clean

Related

@WaylandYang WaylandYang requested a review from fslongjin as a code owner May 14, 2026 03:53
@fslongjin

Copy link
Copy Markdown
Member

/cubebot review

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown

Claude finished @fslongjin's task in 3m 23s —— View job


Code Review

  • Read CLAUDE.md for repo guidelines
  • Read the changed file and diff
  • Analyze code and provide review feedback

Overall: This is a clean, well-structured change that adds valuable diagnostic context to an otherwise opaque error path. The design choices (suffix approach, single-line format, best-effort stats) are all sound. Below are my findings.

Positive

  • The describeStorageFailure / describeFile / describeFreeBytes helpers are well-factored with clear doc comments and a defined format contract.
  • Best-effort stat handling (missing, stat err=..., statfs err=...) is the right call — failing the diagnostic would mask the real error.
  • Preserving the "newExt4RawByReflinkCopy failed:<stderr>" prefix is good for backward compatibility.
  • The started timer before the loop captures total elapsed time across all steps, which is more useful than per-command timing for diagnosing slow failures.
  • DCO sign-off is present.

Issues

1. No unit tests for the new helpers (should fix)

The three new unexported helpers (describeStorageFailure, describeFile, describeFreeBytes) are pure functions that are straightforward to unit-test without needing a reflink-capable filesystem. Per CONTRIBUTING.md: "Add tests — add focused test coverage when behavior changes." The existing shell_test.go only tests the happy path via integration-style tests; the diagnostic formatting logic has zero coverage.

Fix this →

2. Deferred error path missing diagnostics (minor, consider)

At shell.go:103, the defer block that handles FileExistAndValid failure still uses the old format:

err = fmt.Errorf("newExt4RawByReflinkCopy failed:%s", err)

If all commands succeed but the post-validation check fails, the error is still opaque. This is a different failure mode (not a command failure), so it may warrant different diagnostics, but as-is it's inconsistent with the new command-failure path. Consider appending at least target file stats here too, or adding a comment explaining why it's omitted.

3. stderr may contain newlines (observation, not a regression)

The PR description notes the format is "single line" so log pipelines don't split it. However, stderr from commands like e2fsck can contain newlines, so the error message was already potentially multi-line before this change. The diagnostic suffix will appear after the last newline of stderr. This is a pre-existing issue, not introduced by this PR, but worth noting since the stated goal is single-line output. A future enhancement could strings.ReplaceAll(stderr, "\n", " ") or truncate stderr.

4. path.Dir edge case (very minor)

In describeFreeBytes(path.Dir(target)) at shell.go:149 — if target is a bare filename without a directory component, path.Dir returns ".", which would statfs the process CWD rather than the target's volume. In practice, callers always pass absolute paths, so this is unlikely to trigger, but filepath.Dir would be more correct for OS paths.

Summary

The change is well-designed and ready to ship with one recommended addition: unit tests for the new helpers (issue #1). The other items are minor observations for potential follow-up.

Implements proposal (2) from TencentCloud#235.

On failure the error now appends step index, the failing argv,
elapsed time, target/base file stats, and free space:

    newExt4RawByReflinkCopy failed:<stderr> [step=3/4
    cmd="e2fsck -fy ..." elapsed=812ms target=size=...
    base=size=... free=...B]

Diagnostic is a suffix on the existing prefix so any string-matching
caller is unaffected. Stat/statfs errors are reported inline rather
than failing the diagnostic.

Adds unit tests for describeStorageFailure / describeFile /
describeFreeBytes covering the missing-file, empty-path, valid-stat,
statfs-error, and full-format-output cases.

Signed-off-by: Wayland Yang <[email protected]>
@WaylandYang WaylandYang force-pushed the feat/reflink-copy-error-diagnostics branch from f5c4c69 to 0fd59e9 Compare May 14, 2026 08:05
@kinwin-ustc

Copy link
Copy Markdown
Collaborator

/cubebot review

@github-actions

github-actions Bot commented May 16, 2026

Copy link
Copy Markdown

Claude finished @kinwin-ustc's task in 1m 6s —— View job


Code Review


Overview

This PR adds diagnostic context to error messages from newExt4RawByReflinkCopy by appending a structured single-line suffix with step index, command, elapsed time, file stats, and free space. Three new helpers (describeStorageFailure, describeFile, describeFreeBytes) and comprehensive unit tests for them are included.


Strengths

  • Good test coverage: All three new helpers have dedicated unit tests covering edge cases (empty path, missing file, valid file, invalid directory, full format output). This addresses the main gap from the earlier review.
  • Best-effort diagnostics: Stat/statfs failures are reported inline ("missing", "stat err=...") rather than aborting — the caller already has a real error to propagate.
  • Backward-compatible format: The diagnostic is appended as a [...] suffix, preserving the existing "newExt4RawByReflinkCopy failed:<stderr>" prefix for any callers doing string matching.
  • Total elapsed time: The started timer captures wall time across all preceding commands, which is more useful for diagnosing slow failures than per-command timing alone.
  • Clean helper factoring: Each helper has a single responsibility with clear formatting contracts.

Issues

1. path.Dir vs filepath.Dir — semver-incorrect for OS paths (shell.go:149)

fmt.Fprintf(&b, " free=%s", describeFreeBytes(path.Dir(target)))

path.Dir uses / as separator and is designed for slash-delimited paths (URLs, POSIX). On a path that lacks a directory component (bare filename), path.Dir returns ".", which would statfs the process CWD instead of the target's volume. The rest of this file already uses path.Clean and path.Join for the same purpose, so this is consistent with existing code, but filepath.Dir would be semantically more precise for OS filesystem paths.

In practice, callers always pass absolute paths, so this is unlikely to trigger.

Fix this →

2. Deferred post-validation error still opaque (shell.go:103)

err = fmt.Errorf("newExt4RawByReflinkCopy failed:%s", err)

The defer block's FileExistAndValid failure path still uses the old opaque format. This is explicitly called out in the PR description as a different failure mode, but it means the diagnostic coverage is incomplete. Consider appending at least describeFile(target) stats here too.

3. Stderr newlines can break single-line goal (pre-existing)

The PR's stated single-line format is a good goal, but stderr from commands like e2fsck or resize2fs can contain newlines. The diagnostic suffix will appear after the last newline of stderr. This isn't a regression (the error was already multi-line before), but a future follow-up could sanitize stderr with strings.ReplaceAll(stderr, "\n", "\\n") to ensure true single-line output.


Test Assessment

The six new test functions provide solid coverage:

Test Scenario covered
TestDescribeFile_Empty Empty path → "<empty>"
TestDescribeFile_Missing Non-existent file → "missing"
TestDescribeFile_Size Existing file → "size=N"
TestDescribeFreeBytes_Invalid Non-existent directory → "statfs err=..."
TestDescribeFreeBytes_Valid Valid directory → "<N>B"
TestDescribeStorageFailure_Format Full diagnostic suffix format

One minor note on the existing test: TestNewExt4RawByReflinkCopy at shell_test.go:82 asserts assert.Errorf for the size=128000 case, meaning it expects a failure. This test is skipped in CI (SkipCI), but if run on a reflink-capable filesystem the e2fsck+resize2fs chain may actually succeed, causing a test failure. This is pre-existing and not introduced by this PR, but worth noting.


Summary

The diagnostic helpers are well-designed, well-tested (the unit tests cleanly address the main feedback from the initial review), and the backward-compatible format is the right choice. Issue #1 (path.Dir) is a minor correctness nit, and issue #2 is acknowledged as out-of-scope. Ready to ship after the path.Dir fix.

@kinwin-ustc kinwin-ustc merged commit 966996c into TencentCloud:master May 18, 2026
caoqianyun added a commit to caoqianyun/CubeSandbox that referenced this pull request May 21, 2026
Add the v0.2.2 release blog post in both English and Simplified Chinese.

Highlights covered:
- Protocol-level E2B compatibility: default sandbox port moved to 49983
  to align with the E2B sandbox protocol; default-port source unified
  in CubeMaster.
- Seven recurring stability fixes from the v0.1.x era: nil-deref panic
  in cubecli exec on stdin EOF (TencentCloud#188), .env port placeholders in deploy
  scripts (TencentCloud#210), idempotent template-image jobs (TencentCloud#227), configurable
  storage cmd timeout (TencentCloud#236), structured storage failure diagnostics
  (TencentCloud#237), PVM ext4 artifact runtime-file consolidation (TencentCloud#282), and a
  smaller (4G to ~100M) quick-start template image.
- First batch of CVE remediations for the 0.2 series: vmm-sys-util
  0.11.x to 0.12.1 closing CVE-2023-50711, with bytes / env_logger
  upgraded in the same PR (TencentCloud#267); time crate upgrade deliberately
  deferred (CVE-2026-25727 vector unreachable in Cube).
- Phase-1 community contribution program now live: Troubleshooting,
  Use Cases, and Integrations doc tracks.

File names follow the YYYY-MM-DD-slug.md convention required by
docs/zh/guide/maintainer/blog.md. Cross-language files share the same
slug. Frontmatter uses only documented fields (title/date/author/
description/featured/weight).

Signed-off-by: caoqianyun <[email protected]>
fslongjin pushed a commit that referenced this pull request May 21, 2026
* docs(blog): bump welcome post weight to 2

Demote the welcome post to weight: 2 so the v0.2.2 release post (added in the next commit) can take the top featured slot. Welcome post stays featured but renders below release news, which is the more time-sensitive content for landing visitors.

Signed-off-by: caoqianyun <[email protected]>

* docs(blog): add v0.2.2 release post (en/zh)

Add the v0.2.2 release blog post in both English and Simplified Chinese.

Highlights covered:
- Protocol-level E2B compatibility: default sandbox port moved to 49983
  to align with the E2B sandbox protocol; default-port source unified
  in CubeMaster.
- Seven recurring stability fixes from the v0.1.x era: nil-deref panic
  in cubecli exec on stdin EOF (#188), .env port placeholders in deploy
  scripts (#210), idempotent template-image jobs (#227), configurable
  storage cmd timeout (#236), structured storage failure diagnostics
  (#237), PVM ext4 artifact runtime-file consolidation (#282), and a
  smaller (4G to ~100M) quick-start template image.
- First batch of CVE remediations for the 0.2 series: vmm-sys-util
  0.11.x to 0.12.1 closing CVE-2023-50711, with bytes / env_logger
  upgraded in the same PR (#267); time crate upgrade deliberately
  deferred (CVE-2026-25727 vector unreachable in Cube).
- Phase-1 community contribution program now live: Troubleshooting,
  Use Cases, and Integrations doc tracks.

File names follow the YYYY-MM-DD-slug.md convention required by
docs/zh/guide/maintainer/blog.md. Cross-language files share the same
slug. Frontmatter uses only documented fields (title/date/author/
description/featured/weight).

Signed-off-by: caoqianyun <[email protected]>

---------

Signed-off-by: caoqianyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants