Local Gradient Aggregation, Grouped Allreduce
Detailed Changes
Added
-
Added support for backward_passes_per_step > 1 for TF Keras graph mode. (#2346)
-
Added support for backward_passes_per_step > 1 for TF Keras eager execution. (#2371)
-
Added support for backward_passes_per_step > 1 for TF LegacyOptimizer in graph mode. (#2401)
-
Added grouped allreduce to enable more efficient tensor fusion and deterministic training. (#2453)
-
Add support for specifying
opandcompressioninhorovod.tensorflow.keras.allreduce(). (#2423) -
Adding support for batched D2D memcopy kernel on GPU. (#2435)
-
Added schema inference in Spark Estimator without sampling. (#2373)
-
Added
Store.create("dbfs:/")mapping toDBFSLocalStore("/dbfs/..."). (#2376)
Changed
-
Changed Keras callbacks to require parameter
initial_lrofLearningRateScheduleCallbackandLearningRateWarmupCallback. (#2459) -
Changed default cycle time from 5ms to 1ms and fusion threshold from 64MB to 128MB. (#2468)
Fixed
-
Fixed support for TensorFlow v2.4.0. (#2381)
-
Fixed averaging using CUDA half2 implementation one element half buffers. (#2375)
-
Fixed
HOROVOD_THREAD_AFFINITYwhen using oneCCL. (#2350) -
Added timeout to SSH check in horovodrun to prevent hanging. (#2448)
-
Added
HOROVOD_GLOO_TIMEOUT_SECONDSvalue to error messages. (#2436) -
Fixed race condition in dynamic timeline API. (#2341)
-
Fixed --log-hide-timestamp to apply to driver logs with Gloo. (#2388)