dolt/dolthub#1430: Add --filter option for dolt diff#10030
dolt/dolthub#1430: Add --filter option for dolt diff#10030elianddb merged 10 commits intodolthub:mainfrom
--filter option for dolt diff#10030Conversation
--filter option for dolt diff
--filter option for dolt diff--filter option for dolt diff
096642a to
3bfa4de
Compare
Add Go unit tests for the diff filter feature to provide fast feedback and granular validation of filter logic. Test coverage includes: - diffTypeFilter struct validation (isValid method) - Filter inclusion methods (adds, drops, modifications) - Edge cases (empty strings, typos, case sensitivity) - Consistency checks across all filter types - Constant validation (values, uniqueness, lowercase) - Invalid filter behavior verification Tests added: - 12 test functions - 48+ individual test cases - 100% coverage of diffTypeFilter struct methods These tests complement the existing BATS integration tests and provide unit-level regression protection. Refs: dolthub#1430
Implement lazy table header initialization to fix a bug where empty table headers were printed when all rows were filtered out during data-only diffs. This occurred because BeginTable() was called before row filtering, causing headers to print even when no matching rows existed. The solution introduces a lazyRowWriter that delays the BeginTable() call until the first row is actually written. This wrapper is only used when: - A filter is active (added/modified/removed) - The diff is data-only (no schema changes or table renames) Implementation changes: - Add shouldUseLazyHeader() helper to determine when to use lazy initialization based on filter presence and diff type - Add lazyRowWriter type that wraps SqlRowDiffWriter and delays BeginTable() until first WriteRow() or WriteCombinedRow() call - Modify diffUserTable() to skip BeginTable when using lazy writer - Modify diffRows() to conditionally create lazyRowWriter vs normal rowWriter based on shouldUseLazyHeader() check - Add comprehensive unit tests for shouldUseLazyHeader logic and lazyRowWriter behavior (5 test functions, 8+ test cases) - Add mock implementations of diffWriter and SqlRowDiffWriter interfaces to enable testing without database dependencies - Fix BATS test assertions to match actual SQL output format (lowercase type names, MODIFY COLUMN vs DROP/ADD pattern) Test coverage: - TestShouldUseLazyHeader: validates lazy header logic conditions - TestLazyRowWriter_NoRowsWritten: verifies BeginTable not called when no rows written (core lazy behavior) - TestLazyRowWriter_RowsWritten: verifies BeginTable called on first write - TestLazyRowWriter_CombinedRowsWritten: tests combined row writes - TestLazyRowWriter_InitializedOnlyOnce: ensures BeginTable called exactly once Refs: dolthub#1430
Extract duplicated row filter checking logic into a reusable shouldSkipDiffType helper function. Refs: dolthub#1430
d6e1f73 to
6cd5e0b
Compare
|
I believe the two failing bats tests are unrelated to my changes. I believe a perms issue on my part? also the one test failing in |
There was a problem hiding this comment.
@codeaucafe A couple of comments mainly on integrating your solution with our existing infrastructure under table_deltas.go. Let me know if you have any questions.
|
Thanks @elianddb. Sorry for the delay. I have not been able to take a look at your comments yet due to having to work very late last couple days and helping with some family stuff. I'll be sure to review your comments and ask you any questions I have when I get time to work on addressing your comments this weekend. Thank you! Cheers, |
Constants and Naming: - Add FilterParam constant to flags.go following existing conventions - Move filter type constants (DiffTypeAdded, DiffTypeModified, DiffTypeRemoved, DiffTypeAll) to table_deltas.go for centralization - Update all references throughout codebase to use diff.DiffType* constants for consistency Validation Consolidation: - Consolidate duplicate validation logic into single isValid() method - Introduce newDiffTypeFilter() constructor for proper initialization - Remove redundant validation checks in parseDiffArgs function Map-Based Filter Architecture: - Refactor diffTypeFilter struct to use map[string]bool instead of string field for more extensible filtering - Replace three separate includeXOrAll() methods with single shouldInclude() method that performs map lookup - Update both table-level and row-level filtering to use unified shouldInclude() approach Row-Level Filtering with DiffType: - Add ChangeTypeToDiffType() helper function in table_deltas.go to convert row-level ChangeType enum to table-level DiffType strings - Refactor shouldSkipDiffType() to shouldSkipRow() using the new conversion helper and map-based filtering - Ensure consistent terminology between table and row filtering by using same DiffType string values throughout Refs: dolthub#1430
Refactor lazyRowWriter to use a cleaner callback-based architecture that reduces struct complexity from 8 fields to 2 fields. This addresses PR review feedback requesting simplification of the lazy header implementation. Changes: - Replace parameter storage with closure-based callback pattern that captures BeginTable parameters from outer scope - Eliminate separate ensureInitialized() method in favor of inline initialization check in WriteRow() and WriteCombinedRow() - Remove initialized bool field by using callback presence (nil check) to track initialization state - Always create RowWriter upfront and only delay BeginTable call, eliminating the need for writer factory function - Simplify Close() method to always delegate to wrapped writer Refs: dolthub#1430
Address @elianddb feedback on PR dolthub#10030 by standardizing DiffType usage, simplifying filter logic, and fixing row-level filtering. Changes: - Fix row filtering bug: add early return for ChangeType.None to prevent incorrectly skipping added/removed rows - Standardize DiffType: replace string literals with constants (DiffTypeAdded/Modified/Removed) in table_deltas.go and merge.go - Simplify table filtering: use delta.DiffType directly - Update tests: rewrite to use map-based architecture and newDiffTypeFilter() constructor Refs: dolthub#1430
Add DiffTypeRenamed constant and "removed" as user-friendly alias for "dropped" filter option. This provides more granular filtering and improves user experience with familiar terminology. List of changes: - Add DiffTypeRenamed constant for renamed tables - Revert GetSummary to use DiffTypeRenamed instead of treating renames as modified (table_deltas.go:742) - Add "removed" as alias that maps to "dropped" internally for user convenience - Update filter validation to include renamed option - Update CLI help text to document all filter options including renamed and removed alias - Handle renamed tables in merge stats alongside modified - Update getDiffSummariesBetweenRefs and getSchemaDiffSummariesBetweenRefs to handle renamed diff type - Update all tests to use new constants and test renamed filtering Filter options now available: added, modified, renamed, dropped, removed (alias for dropped). Refs: dolthub#1430
Add tests for the --filter=renamed option and the --filter=removed alias that maps to dropped. Go tests: - Tests for filter=renamed checking all diff types - Tests for "removed" alias mapping to dropped internally - Verify renamed filter only includes renamed tables BATS tests: - Test --filter=renamed with table rename scenario - Test --filter=dropped with table drop scenario - Verify --filter=removed alias works same as dropped - Verify other filters correctly exclude renamed/dropped tables Refs: dolthub#1430
…e/1430/add-filter-opt-for-diff
Update the short help text for the --filter parameter to document all valid filter options (added, modified, renamed, dropped) and mention that 'removed' is accepted as an alias for 'dropped'. Refs: dolthub#1430
There was a problem hiding this comment.
@elianddb I updated the code based on your comments. Let me know if I misunderstood something and implemented it wrong.
I believe the failing bats test are unrelated and failing for same reasons those other unrelated bats tests failed for
Thank you!
There was a problem hiding this comment.
@codeaucafe lgtm! I've left a singular comment on a test, but more for future reference.
| mockDW := &mockDiffWriter{} | ||
| realWriter := &mockRowWriter{} | ||
|
|
||
| beginTableCalled := false |
There was a problem hiding this comment.
This already exists in your mockDW variable, similar to how it was done in the next test function.
There was a problem hiding this comment.
Doh, sorry. Can't believe I missed that before submitting to review again
|
|
||
| # filter=renamed should not show the dropped table | ||
| run dolt diff HEAD~1 --filter=renamed | ||
| [ $status -eq 0 ] |
There was a problem hiding this comment.
Awesome test coverage for Bats and Go!
|
I'm submitting on my PR #10097 to cleanup the release notes and have the CI run correctly. Looks like everything passed though! Thank you for the contribution @codeaucafe! |
#10030: `--filter` contribution for `dolt diff`
|
Thank you @elianddb !!!!! |
Summary
TL;DR
There was no action on the original #3499 for issue #1430; the PR was closed ~3 years ago. This PR fixes the open PR comments and updates the implementation details a bit for the RowWriting of filtered rows
Description
Adds
--filteroption todolt difffor filtering by change type (added,modified,renamed,dropped, orremovedas alias fordropped); please note I added renamed and dropped alias to this filter list. Completes PR #3499 by fixing the issues that blocked it from merging, plus fixes an additional bug where empty table headers wereprinted when all rows were filtered out.
Users reviewing large diffs often need to focus on specific change types - deletes may need extra scrutiny while inserts are routine. With diffs spanning thousands of rows across multiple tables, grep isn't enough since updates show
both additions and deletions.
Implementation
Core changes:
diffTypeFilterstruct with validation methods indiff.go--filterargument inarg_parser_helpers.godelta.SchemaChangeanddelta.IsRename()for modification detectiondiff.ChangeTypeinwriteDiffResults()continueinstead ofreturn nilso all tables get processedEmpty header bug fix:
lazyRowWriterthat delaysBeginTable()until first row writeshouldUseLazyHeader()helper to determine when lazy initialization is neededTests:
go/cmd/dolt/commands/diff_filter_test.go- comprehensive unit tests:TestShouldUseLazyHeader- validates lazy header logic (8 test cases)TestLazyRowWriter_*- tests lazy initialization behavior (4 test functions)diffWriterandSqlRowDiffWriterinterfacesWhat Was Broken in PR #3499
The original PR had three blocking issues from code review comments that I fixed here:
No row-level filtering - only filtered tables, not individual rows
writeDiffResults()Incomplete modification detection - didn't check for schema changes properly
delta.SchemaChange || delta.IsRename()as requestedLoop exits early -
return nilprevented processing remaining tablescontinueinsteadAdditional Bug (bug from my initial filter changes; not existing prior) Fixed
Empty table headers when filtering:
When using
--filterwith data-only changes, if all rows were filtered out, dolt would still print the table diff header with no content:This occurred because
BeginTable()was called before row filtering. ThelazyRowWriterpattern delays header output until we confirm at least one row will be written.Usage/Examples
Fixes #1430