Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Fix flaky#1

Draft
nelsonKat0522 wants to merge 5 commits intomasterfrom
fix-flaky
Draft

Fix flaky#1
nelsonKat0522 wants to merge 5 commits intomasterfrom
fix-flaky

Conversation

@nelsonKat0522
Copy link
Owner

@nelsonKat0522 nelsonKat0522 commented Oct 28, 2025

*What is the purpose of this PR:
This pull request is about to fix flaky test of testJaccard in MinHashModelDataConverterTest.java
*Why the test fails:
The test fails because the comparator used by priorityQueue, which is in findNeighbor() of NearestNeighborModelData.java, is non-deterministic. For example: priorityQueue = [(0.6666666666666666,dict3), (0.5,dict6), (0.5,dict2), (0.75,dict1)] --> when the metrics score of is the same for items (0.5 for dict 6 and dict 2): if the program selects the top 3 metrics, the program selects the item randomly between dict 6 and dict 2 that caused non-deterministic.
*How to reproduce the test failure:
Run the following command in bash of the top-level directory of the project:
clone the test's repo:
cd ~
git clone https://github.com/alibaba/Alink
cd Alink
make clean and compile the program:
mvn clean install -DskipTests -pl core -am
run the regular test
mvn -pl core test -Dtest= com.alibaba.alink.operator.common.similarity.dataConverter.MinHashModelDataConverterTest#testJaccard
run the nondex test to detect flaky test
mvn -pl core edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest= com.alibaba.alink.operator.common.similarity.dataConverter.MinHashModelDataConverterTest#testJaccard
*Expected results:
The Tests should run successful without any failures for Nondex Tests.
*Actual results:
testJaccard(com.alibaba.alink.operator.common.similarity.dataConverter.MinHashModelDataConverterTest) finished, time taken 1.148136296 s
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.189 s <<< FAILURE! -- in com.alibaba.alink.operator.common.similarity.dataConverter.MinHashModelDataConverterTest
[ERROR] com.alibaba.alink.operator.common.similarity.dataConverter.MinHashModelDataConverterTest.testJaccard -- Time elapsed: 1.163 s <<< FAILURE!
arrays first differed at element [2]; expected:<dict[6]> but was:<dict[2]>
*Description of fix:
Adding comparing the ID of the items having the same metrics in the priorityQueue results in descending order, so when searching, the Heap will prefer higher ID of the items. For example: priorityQueue = [(0.6666666666666666,dict3), (0.5,dict6), (0.5,dict2), (0.75,dict1)] --> when the metrics score of is the same for items (0.5 for dict 6 and dict 2): if the program selects the top 3 metrics, the program always selects dict 6. That makes deterministic and fixed flaky test.

@nelsonKat0522 nelsonKat0522 marked this pull request as draft October 28, 2025 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments