Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lbh930
Copy link
Contributor

@lbh930 lbh930 commented Dec 1, 2025

Description

Modified ReproUtil.recoverTrainer to include a fallback when looking up trainer components. If lookup by name fails due to mismatched component names, it now attempts to look up the component by its class type. It sorts the candidate components by their numeric suffix to ensure deterministic mapping between the provenance and configuration list.

Motivation

NonDex detected test flakiness in:

  • org.tribuo.reproducibility.ReproUtilTest.reproduceTransformTrainer
  • org.tribuo.reproducibility.ReproUtilTest#testBaggingTrainer
  • org.tribuo.reproducibility.ReproUtilTest.testBaggingTrainerAllInvocationsChange

One of the cause is possibly the non-deterministic component naming in OLCUT. ProvenanceUtil generates component names like logisticregressiontrainer-1 vs logisticregressiontrainer-2 based on iteration order, which relies on HashMap/HashSet and has no guarantee in determinism. The PR allow ReproUtil to recover the trainer based on its type when the name lookup fails and enforces a strict sorting order to guarantee the correct component is selected. This should improve robustness of those tests. An example of NonDex output attached.

NonDex_ReproUtilTests.txt

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Dec 1, 2025
@Craigacp
Copy link
Member

I think I'd prefer to try and fix this in OLCUT (which is also maintained by our group), but that's lower down on the list of priorities. I'll look into it in the new year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants