Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Comments

dolt/dolthub#1430: Add --filter option for dolt diff#10030

Merged
elianddb merged 10 commits intodolthub:mainfrom
codeaucafe:codeaucafe/1430/add-filter-opt-for-diff
Dec 9, 2025
Merged

dolt/dolthub#1430: Add --filter option for dolt diff#10030
elianddb merged 10 commits intodolthub:mainfrom
codeaucafe:codeaucafe/1430/add-filter-opt-for-diff

Conversation

@codeaucafe
Copy link
Contributor

@codeaucafe codeaucafe commented Nov 5, 2025

Summary

TL;DR

There was no action on the original #3499 for issue #1430; the PR was closed ~3 years ago. This PR fixes the open PR comments and updates the implementation details a bit for the RowWriting of filtered rows

Description

Adds --filter option to dolt diff for filtering by change type (added, modified, renamed, dropped, or removed as alias for dropped); please note I added renamed and dropped alias to this filter list. Completes PR #3499 by fixing the issues that blocked it from merging, plus fixes an additional bug where empty table headers were
printed when all rows were filtered out.

Users reviewing large diffs often need to focus on specific change types - deletes may need extra scrutiny while inserts are routine. With diffs spanning thousands of rows across multiple tables, grep isn't enough since updates show
both additions and deletions.

Implementation

Core changes:

  • Added diffTypeFilter struct with validation methods in diff.go
  • Added --filter argument in arg_parser_helpers.go
  • Table filtering uses delta.SchemaChange and delta.IsRename() for modification detection
  • Row filtering checks diff.ChangeType in writeDiffResults()
  • Fixed control flow - uses continue instead of return nil so all tables get processed

Empty header bug fix:

  • Implemented lazyRowWriter that delays BeginTable() until first row write
  • Added shouldUseLazyHeader() helper to determine when lazy initialization is needed
  • Only applies to filtered data-only diffs (no schema changes or renames)
  • Prevents empty table headers from printing when all rows are filtered out

Tests:

  • go/cmd/dolt/commands/diff_filter_test.go - comprehensive unit tests:
    • TestShouldUseLazyHeader - validates lazy header logic (8 test cases)
    • TestLazyRowWriter_* - tests lazy initialization behavior (4 test functions)
    • Mock implementations of diffWriter and SqlRowDiffWriter interfaces
  • BATS integration tests for all filter types with tables, rows, and schema changes
  • Edge cases: invalid values, typos, case sensitivity

What Was Broken in PR #3499

The original PR had three blocking issues from code review comments that I fixed here:

  1. No row-level filtering - only filtered tables, not individual rows

    • Fixed by adding filtering in writeDiffResults()
  2. Incomplete modification detection - didn't check for schema changes properly

    • Fixed by using delta.SchemaChange || delta.IsRename() as requested
  3. Loop exits early - return nil prevented processing remaining tables

    • Fixed by using continue instead

Additional Bug (bug from my initial filter changes; not existing prior) Fixed

Empty table headers when filtering:
When using --filter with data-only changes, if all rows were filtered out, dolt would still print the table diff header with no content:

# Before fix:
$ dolt diff HEAD~1 --filter=modified  # (table has only added rows)
diff --dolt a/t b/t
--- a/t
+++ b/t
(empty - no rows printed)

# After fix:
$ dolt diff HEAD~1 --filter=modified
(completely empty - no output at all)

This occurred because BeginTable() was called before row filtering. The lazyRowWriter pattern delays header output until we confirm at least one row will be written.

Usage/Examples

dolt diff --filter=added      # new tables/rows
dolt diff --filter=modified   # schema changes, row updates
dolt diff --filter=renamed    # renamed tables
dolt diff --filter=dropped    # dropped tables, deleted rows
dolt diff --filter=removed    # alias for dropped
dolt diff HEAD~1 --filter=dropped -r sql

Fixes #1430

@codeaucafe codeaucafe changed the title test(diff): add comprehensive unit tests for filter functionality feat: add --filter option for dolt diff Nov 5, 2025
@codeaucafe codeaucafe changed the title feat: add --filter option for dolt diff dolt/dolthub#1430: Add --filter option for dolt diff Nov 5, 2025
@codeaucafe codeaucafe force-pushed the codeaucafe/1430/add-filter-opt-for-diff branch 4 times, most recently from 096642a to 3bfa4de Compare November 11, 2025 01:06
Add Go unit tests for the diff filter feature to provide fast
feedback and granular validation of filter logic.

Test coverage includes:
- diffTypeFilter struct validation (isValid method)
- Filter inclusion methods (adds, drops, modifications)
- Edge cases (empty strings, typos, case sensitivity)
- Consistency checks across all filter types
- Constant validation (values, uniqueness, lowercase)
- Invalid filter behavior verification

Tests added:
- 12 test functions
- 48+ individual test cases
- 100% coverage of diffTypeFilter struct methods

These tests complement the existing BATS integration tests
and provide unit-level regression protection.

Refs: dolthub#1430
Implement lazy table header initialization to fix a bug where
empty table headers were printed when all rows were filtered out
during data-only diffs. This occurred because BeginTable() was
called before row filtering, causing headers to print even when
no matching rows existed.

The solution introduces a lazyRowWriter that delays the BeginTable()
call until the first row is actually written. This wrapper is only
used when:
- A filter is active (added/modified/removed)
- The diff is data-only (no schema changes or table renames)

Implementation changes:
- Add shouldUseLazyHeader() helper to determine when to use lazy
initialization based on filter presence and diff type
- Add lazyRowWriter type that wraps SqlRowDiffWriter and delays
BeginTable() until first WriteRow() or WriteCombinedRow() call
- Modify diffUserTable() to skip BeginTable when using lazy writer
- Modify diffRows() to conditionally create lazyRowWriter vs normal
rowWriter based on shouldUseLazyHeader() check
- Add comprehensive unit tests for shouldUseLazyHeader logic and
lazyRowWriter behavior (5 test functions, 8+ test cases)
- Add mock implementations of diffWriter and SqlRowDiffWriter
interfaces to enable testing without database dependencies
- Fix BATS test assertions to match actual SQL output format
(lowercase type names, MODIFY COLUMN vs DROP/ADD pattern)

Test coverage:
- TestShouldUseLazyHeader: validates lazy header logic conditions
- TestLazyRowWriter_NoRowsWritten: verifies BeginTable not called
when no rows written (core lazy behavior)
- TestLazyRowWriter_RowsWritten: verifies BeginTable called on
first write
- TestLazyRowWriter_CombinedRowsWritten: tests combined row writes
- TestLazyRowWriter_InitializedOnlyOnce: ensures BeginTable called
exactly once

Refs: dolthub#1430
Extract duplicated row filter checking logic into a reusable
shouldSkipDiffType helper function.

Refs: dolthub#1430
@codeaucafe codeaucafe force-pushed the codeaucafe/1430/add-filter-opt-for-diff branch from d6e1f73 to 6cd5e0b Compare November 11, 2025 01:08
@codeaucafe codeaucafe marked this pull request as ready for review November 11, 2025 01:38
@codeaucafe
Copy link
Contributor Author

codeaucafe commented Nov 11, 2025

I believe the two failing bats tests are unrelated to my changes. I believe a perms issue on my part?

also the one test failing in Test Go/Go tests (ubuntu-22.04) is for TestMergeConcurrency which fails for ubuntu with race detected error but passes for mac and windows. Is this a known flaky test? If its not I can look into this later when I get back from short vacation this week

Copy link
Contributor

@elianddb elianddb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codeaucafe A couple of comments mainly on integrating your solution with our existing infrastructure under table_deltas.go. Let me know if you have any questions.

@codeaucafe
Copy link
Contributor Author

Thanks @elianddb. Sorry for the delay. I have not been able to take a look at your comments yet due to having to work very late last couple days and helping with some family stuff. I'll be sure to review your comments and ask you any questions I have when I get time to work on addressing your comments this weekend.

Thank you!

Cheers,
David

Constants and Naming:
- Add FilterParam constant to flags.go following existing conventions
- Move filter type constants (DiffTypeAdded, DiffTypeModified,
  DiffTypeRemoved, DiffTypeAll) to table_deltas.go for centralization
- Update all references throughout codebase to use diff.DiffType*
  constants for consistency

Validation Consolidation:
- Consolidate duplicate validation logic into single isValid() method
- Introduce newDiffTypeFilter() constructor for proper initialization
- Remove redundant validation checks in parseDiffArgs function

Map-Based Filter Architecture:
- Refactor diffTypeFilter struct to use map[string]bool instead of
  string field for more extensible filtering
- Replace three separate includeXOrAll() methods with single
  shouldInclude() method that performs map lookup
- Update both table-level and row-level filtering to use unified
  shouldInclude() approach

Row-Level Filtering with DiffType:
- Add ChangeTypeToDiffType() helper function in table_deltas.go to
  convert row-level ChangeType enum to table-level DiffType strings
- Refactor shouldSkipDiffType() to shouldSkipRow() using the new
  conversion helper and map-based filtering
- Ensure consistent terminology between table and row filtering by
  using same DiffType string values throughout

Refs: dolthub#1430
Refactor lazyRowWriter to use a cleaner callback-based
architecture that reduces struct complexity from 8 fields to 2
fields. This addresses PR review feedback requesting
simplification of the lazy header implementation.

Changes:
- Replace parameter storage with closure-based callback pattern
that captures BeginTable parameters from outer scope
- Eliminate separate ensureInitialized() method in favor of
inline initialization check in WriteRow() and
WriteCombinedRow()
- Remove initialized bool field by using callback presence (nil
check) to track initialization state
- Always create RowWriter upfront and only delay BeginTable
call, eliminating the need for writer factory function
- Simplify Close() method to always delegate to wrapped writer

Refs: dolthub#1430
Address @elianddb feedback on PR dolthub#10030 by standardizing DiffType
usage, simplifying filter logic, and fixing row-level filtering.

Changes:
- Fix row filtering bug: add early return for ChangeType.None to
prevent incorrectly skipping added/removed rows
- Standardize DiffType: replace string literals with constants
(DiffTypeAdded/Modified/Removed) in table_deltas.go and merge.go
- Simplify table filtering: use delta.DiffType directly
- Update tests: rewrite to use map-based architecture and
newDiffTypeFilter() constructor

Refs: dolthub#1430
Add DiffTypeRenamed constant and "removed" as user-friendly alias
for "dropped" filter option. This provides more granular filtering
and improves user experience with familiar terminology.

List of changes:
- Add DiffTypeRenamed constant for renamed tables
- Revert GetSummary to use DiffTypeRenamed instead of treating
renames as modified (table_deltas.go:742)
- Add "removed" as alias that maps to "dropped" internally for
user convenience
- Update filter validation to include renamed option
- Update CLI help text to document all filter options including
renamed and removed alias
- Handle renamed tables in merge stats alongside modified
- Update getDiffSummariesBetweenRefs and
getSchemaDiffSummariesBetweenRefs to handle renamed diff type
- Update all tests to use new constants and test renamed
filtering

Filter options now available: added, modified, renamed, dropped,
removed (alias for dropped).

Refs: dolthub#1430
Add tests for the --filter=renamed option and the --filter=removed alias
that maps to dropped.

Go tests:
- Tests for filter=renamed checking all diff types
- Tests for "removed" alias mapping to dropped internally
- Verify renamed filter only includes renamed tables

BATS tests:
- Test --filter=renamed with table rename scenario
- Test --filter=dropped with table drop scenario
- Verify --filter=removed alias works same as dropped
- Verify other filters correctly exclude renamed/dropped tables

Refs: dolthub#1430
Update the short help text for the --filter parameter to document
all valid filter options (added, modified, renamed, dropped) and
mention that 'removed' is accepted as an alias for 'dropped'.

Refs: dolthub#1430
Copy link
Contributor Author

@codeaucafe codeaucafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elianddb I updated the code based on your comments. Let me know if I misunderstood something and implemented it wrong.

I believe the failing bats test are unrelated and failing for same reasons those other unrelated bats tests failed for

Thank you!

Copy link
Contributor

@elianddb elianddb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codeaucafe lgtm! I've left a singular comment on a test, but more for future reference.

mockDW := &mockDiffWriter{}
realWriter := &mockRowWriter{}

beginTableCalled := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already exists in your mockDW variable, similar to how it was done in the next test function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, sorry. Can't believe I missed that before submitting to review again


# filter=renamed should not show the dropped table
run dolt diff HEAD~1 --filter=renamed
[ $status -eq 0 ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome test coverage for Bats and Go!

Copy link
Contributor

@elianddb elianddb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta run tests first, I'll come back once they're done.

@elianddb elianddb self-requested a review December 9, 2025 23:04
@elianddb
Copy link
Contributor

elianddb commented Dec 9, 2025

I'm submitting on my PR #10097 to cleanup the release notes and have the CI run correctly. Looks like everything passed though! Thank you for the contribution @codeaucafe!

elianddb added a commit that referenced this pull request Dec 9, 2025
#10030: `--filter` contribution for `dolt diff`
@elianddb elianddb merged commit bd40e89 into dolthub:main Dec 9, 2025
29 of 47 checks passed
@codeaucafe
Copy link
Contributor Author

Thank you @elianddb !!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dolt diff should support --filter option

3 participants