Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Comments

dolthub/dolt#10030: --filter contribution for dolt diff#10097

Merged
elianddb merged 11 commits intomainfrom
codeaucafe/1430
Dec 9, 2025
Merged

dolthub/dolt#10030: --filter contribution for dolt diff#10097
elianddb merged 11 commits intomainfrom
codeaucafe/1430

Conversation

@elianddb
Copy link
Contributor

@elianddb elianddb commented Nov 18, 2025

Author @codeaucafe
Add --filter option to dolt diff, enabling filtering by specific change types and fixing issues from the earlier stalled PR (#3499).

Users reviewing large diffs often need to focus on specific change types - deletes may need extra scrutiny while inserts are routine. With diffs spanning thousands of rows across multiple tables, grep isn't enough since updates show
both additions and deletions.

dolt diff --filter=added      # new tables/rows
dolt diff --filter=modified   # schema changes, row updates
dolt diff --filter=renamed    # renamed tables
dolt diff --filter=dropped    # dropped tables, deleted rows
dolt diff --filter=removed    # alias for dropped
dolt diff HEAD~1 --filter=dropped -r sql

Close #10030
Fix #1430

Add Go unit tests for the diff filter feature to provide fast
feedback and granular validation of filter logic.

Test coverage includes:
- diffTypeFilter struct validation (isValid method)
- Filter inclusion methods (adds, drops, modifications)
- Edge cases (empty strings, typos, case sensitivity)
- Consistency checks across all filter types
- Constant validation (values, uniqueness, lowercase)
- Invalid filter behavior verification

Tests added:
- 12 test functions
- 48+ individual test cases
- 100% coverage of diffTypeFilter struct methods

These tests complement the existing BATS integration tests
and provide unit-level regression protection.

Refs: #1430
Implement lazy table header initialization to fix a bug where
empty table headers were printed when all rows were filtered out
during data-only diffs. This occurred because BeginTable() was
called before row filtering, causing headers to print even when
no matching rows existed.

The solution introduces a lazyRowWriter that delays the BeginTable()
call until the first row is actually written. This wrapper is only
used when:
- A filter is active (added/modified/removed)
- The diff is data-only (no schema changes or table renames)

Implementation changes:
- Add shouldUseLazyHeader() helper to determine when to use lazy
initialization based on filter presence and diff type
- Add lazyRowWriter type that wraps SqlRowDiffWriter and delays
BeginTable() until first WriteRow() or WriteCombinedRow() call
- Modify diffUserTable() to skip BeginTable when using lazy writer
- Modify diffRows() to conditionally create lazyRowWriter vs normal
rowWriter based on shouldUseLazyHeader() check
- Add comprehensive unit tests for shouldUseLazyHeader logic and
lazyRowWriter behavior (5 test functions, 8+ test cases)
- Add mock implementations of diffWriter and SqlRowDiffWriter
interfaces to enable testing without database dependencies
- Fix BATS test assertions to match actual SQL output format
(lowercase type names, MODIFY COLUMN vs DROP/ADD pattern)

Test coverage:
- TestShouldUseLazyHeader: validates lazy header logic conditions
- TestLazyRowWriter_NoRowsWritten: verifies BeginTable not called
when no rows written (core lazy behavior)
- TestLazyRowWriter_RowsWritten: verifies BeginTable called on
first write
- TestLazyRowWriter_CombinedRowsWritten: tests combined row writes
- TestLazyRowWriter_InitializedOnlyOnce: ensures BeginTable called
exactly once

Refs: #1430
Extract duplicated row filter checking logic into a reusable
shouldSkipDiffType helper function.

Refs: #1430
@coffeegoddd
Copy link
Contributor

@elianddb DOLT

comparing_percentages
100.000000 to 100.000000
version result total
6cd5e0b ok 5937471
version total_tests
6cd5e0b 5937471
correctness_percentage
100.0

codeaucafe and others added 8 commits November 26, 2025 18:42
Constants and Naming:
- Add FilterParam constant to flags.go following existing conventions
- Move filter type constants (DiffTypeAdded, DiffTypeModified,
  DiffTypeRemoved, DiffTypeAll) to table_deltas.go for centralization
- Update all references throughout codebase to use diff.DiffType*
  constants for consistency

Validation Consolidation:
- Consolidate duplicate validation logic into single isValid() method
- Introduce newDiffTypeFilter() constructor for proper initialization
- Remove redundant validation checks in parseDiffArgs function

Map-Based Filter Architecture:
- Refactor diffTypeFilter struct to use map[string]bool instead of
  string field for more extensible filtering
- Replace three separate includeXOrAll() methods with single
  shouldInclude() method that performs map lookup
- Update both table-level and row-level filtering to use unified
  shouldInclude() approach

Row-Level Filtering with DiffType:
- Add ChangeTypeToDiffType() helper function in table_deltas.go to
  convert row-level ChangeType enum to table-level DiffType strings
- Refactor shouldSkipDiffType() to shouldSkipRow() using the new
  conversion helper and map-based filtering
- Ensure consistent terminology between table and row filtering by
  using same DiffType string values throughout

Refs: #1430
Refactor lazyRowWriter to use a cleaner callback-based
architecture that reduces struct complexity from 8 fields to 2
fields. This addresses PR review feedback requesting
simplification of the lazy header implementation.

Changes:
- Replace parameter storage with closure-based callback pattern
that captures BeginTable parameters from outer scope
- Eliminate separate ensureInitialized() method in favor of
inline initialization check in WriteRow() and
WriteCombinedRow()
- Remove initialized bool field by using callback presence (nil
check) to track initialization state
- Always create RowWriter upfront and only delay BeginTable
call, eliminating the need for writer factory function
- Simplify Close() method to always delegate to wrapped writer

Refs: #1430
Address @elianddb feedback on PR #10030 by standardizing DiffType
usage, simplifying filter logic, and fixing row-level filtering.

Changes:
- Fix row filtering bug: add early return for ChangeType.None to
prevent incorrectly skipping added/removed rows
- Standardize DiffType: replace string literals with constants
(DiffTypeAdded/Modified/Removed) in table_deltas.go and merge.go
- Simplify table filtering: use delta.DiffType directly
- Update tests: rewrite to use map-based architecture and
newDiffTypeFilter() constructor

Refs: #1430
Add DiffTypeRenamed constant and "removed" as user-friendly alias
for "dropped" filter option. This provides more granular filtering
and improves user experience with familiar terminology.

List of changes:
- Add DiffTypeRenamed constant for renamed tables
- Revert GetSummary to use DiffTypeRenamed instead of treating
renames as modified (table_deltas.go:742)
- Add "removed" as alias that maps to "dropped" internally for
user convenience
- Update filter validation to include renamed option
- Update CLI help text to document all filter options including
renamed and removed alias
- Handle renamed tables in merge stats alongside modified
- Update getDiffSummariesBetweenRefs and
getSchemaDiffSummariesBetweenRefs to handle renamed diff type
- Update all tests to use new constants and test renamed
filtering

Filter options now available: added, modified, renamed, dropped,
removed (alias for dropped).

Refs: #1430
Add tests for the --filter=renamed option and the --filter=removed alias
that maps to dropped.

Go tests:
- Tests for filter=renamed checking all diff types
- Tests for "removed" alias mapping to dropped internally
- Verify renamed filter only includes renamed tables

BATS tests:
- Test --filter=renamed with table rename scenario
- Test --filter=dropped with table drop scenario
- Verify --filter=removed alias works same as dropped
- Verify other filters correctly exclude renamed/dropped tables

Refs: #1430
Update the short help text for the --filter parameter to document
all valid filter options (added, modified, renamed, dropped) and
mention that 'removed' is accepted as an alias for 'dropped'.

Refs: #1430
@coffeegoddd
Copy link
Contributor

@elianddb DOLT

comparing_percentages
100.000000 to 100.000000
version result total
bd40e89 ok 5937471
version total_tests
bd40e89 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
904f8e2 ok 5937471
version total_tests
904f8e2 5937471
correctness_percentage
100.0

@elianddb elianddb marked this pull request as ready for review December 9, 2025 23:00
@elianddb elianddb changed the title dolthub/dolt#10030: Run integration tests dolthub/dolt#10030: Merge --filter contribution for dolt diff Dec 9, 2025
@elianddb elianddb changed the title dolthub/dolt#10030: Merge --filter contribution for dolt diff dolthub/dolt#10030: --filter contribution for dolt diff Dec 9, 2025
@elianddb elianddb merged commit 7cff947 into main Dec 9, 2025
23 checks passed
@github-actions
Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.06 1.67
batching batch sql 10000 1 0.09 1.67
batching by line sql 10000 1 0.1 1.4
blob 1 blob 200000 1 0.92 4.05 4.21
blob 2 blobs 200000 1 0.9 4.52 4.38
blob no blob 200000 1 0.88 2.81 2.7
col type datetime 200000 1 0.79 2.75 2.78
col type varchar 200000 1 0.65 3.82 3.75
config width 2 cols 200000 1 0.73 2.89 2.78
config width 32 cols 200000 1 1.87 2.84 2.66
config width 8 cols 200000 1 0.95 2.87 2.68
pk type float 200000 1 1 2.33 2.11
pk type int 200000 1 0.77 2.74 2.65
pk type varchar 200000 1 1.46 1.98 1.68
row count 1.6mm 1600000 1 5.67 3.04 3.02
row count 400k 400000 1 1.41 2.99 2.9
row count 800k 800000 1 2.94 2.9 2.86
secondary index four index 200000 1 3.55 1.37 1.16
secondary index no secondary 200000 1 0.88 2.82 2.72
secondary index one index 200000 1 1.13 2.58 2.34
secondary index two index 200000 1 2.02 1.77 1.72
sorting shuffled 1mm 1000000 0 5.08 2.89 2.62
sorting sorted 1mm 1000000 1 5.16 2.91 2.57

@github-actions
Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.23
dolt_blame_commit_filter system table 2.62
dolt_commit_ancestors_commit_filter system table 0.63
dolt_commits_commit_filter system table 1.05
dolt_diff_log_join_from_commit system table 2.83
dolt_diff_log_join_to_commit system table 2.91
dolt_diff_table_from_commit_filter system table 1.22
dolt_diff_table_to_commit_filter system table 1.22
dolt_diffs_commit_filter system table 1.1
dolt_history_commit_filter system table 1.4
dolt_log_commit_filter system table 1

@github-actions
Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.64
adds_updates_deletes 60000 60000 60000 3.26
deletes_only 0 60000 0 1.56
updates_only 0 0 60000 2.01

@Hydrocharged Hydrocharged deleted the codeaucafe/1430 branch December 15, 2025 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dolt diff should support --filter option

3 participants