dropDuplicateRows() returns unexpected result – potential bug? #1296
Replies: 2 comments 2 replies
-
Hi @Shubham-at-LF, I confirm this is a bug. It's not due to missing equals() and hashCode() in Row, but rather to using the row iterator, which returns the same row reference and just increase the rowIndex (as rows are views). @benmccann are you still maintaining the project ? Should I make a PR to revert to the version that is not using a set ? |
Beta Was this translation helpful? Give feedback.
-
Hi @benmccann. Thank you for cleaning the code and merging the PR. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I'm currently working on a Spring Boot project using Tablesaw for data processing of large DataFrames. While using the dropDuplicateRows() method, I noticed some unexpected behavior.
Here’s a minimal reproducible example:
java
Table testUnique = testTable.selectColumns("part_id", "nsequence").dropDuplicateRows();
logger.info("Test unique pairs: {}", testUnique.print());
Expected Output:
Actual Output:
As you can see, rows like (2, N40) and (2, N50) are missing after calling dropDuplicateRows(). This was surprising because the combination of part_id and nsequence is unique across all rows.
Investigation
I looked at the source for dropDuplicateRows():
java
It looks like the uniqueness is determined based on the Row object, but I suspect the Row class doesn't override equals() and hashCode() in a way that compares actual content — possibly relying on object identity or internal state like row index.
Environment
Tablesaw version: Using latest via Maven:
xml
Java: OpenJDK 21
Framework: Spring Boot
Request
Could someone please confirm:
Is this an expected behavior?
Should Row properly override equals() and hashCode()?
Is there a workaround for getting proper unique row behavior based on content?
I'm new to Java, so if this is expected and there's a better way to achieve row de-duplication based on column values, I’d appreciate any guidance.
Thank you for your help!
Beta Was this translation helpful? Give feedback.
All reactions