Thanks to visit codestin.com
Credit goes to github.com

Skip to content

New reference benchmark development checklist #810

@ShriyaRishab

Description

@ShriyaRishab

New benchmark task force leads should complete this checklist before the benchmark is finalized.

Initial reference code: roadmap

  • Review guidelines on how to create a good mlperf training reference
  • Finalize dataset
  • Finalize model architecture
  • Finalize reference framework
  • Finalize platform/hardware that the reference would be implemented in, add it to the approved list
  • Finalize reference precision and generate initial loss curves to understand training behavior. At this stage, decide the rough benchmarking region and figure out whether the benchmark should start training from randomly initialized weights or from a previously trained checkpoint. If training from random weights shows lots of instability in losses, it might be better to train for a a few hundred steps and generate a checkpoint so the benchmarking region is more stable and smooth.
  • Finalize batch sizes for RCPs and get good hyperparameters at these batch sizes (need at least 3 batch sizes). We typically choose one small batch size, one very large batch size and one in the middle to cover a decent range. Can ask task force members and Training WG members for batch size range suggestions so as to choose options that cover what submitters are targeting.
  • Finalize evaluation metric and dataset. Understand how much the initial dataset can be reduced, ideally we want the smallest possible dataset (both training and evaluation) that allows for a reasonable benchmark.
  • Finalize which hyperparameters are unconstrained for submitters to modify

Finalize code

Generate RCPs

Add new benchmark details

  • Update logging repo to include compliance checks for the new benchmark (example PR359)
  • Update training rules by adding the new benchmark in all relevant tables. Tables list - benchmarks, division, hyperparameters, quality, results, rcp rules. Add any benchmark specific rules to the appendix
  • Update the compatibility table with the new benchmark
  • Create a new benchmark presentation deck and present to the Training WG so everyone is aware of this benchmark and its technical details
  • Create an initial blog write-up with technical details

Metadata

Metadata

Assignees

No one assigned

    Labels

    generalapplies to all benchmarks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions