FTLO is a optimizer for training neural networks
FTLO是一个用于训练神经网络的优化器算法
Here, the coefficients
其中,系数
-
Momentum coefficient
$\alpha$ | 动量系数$\alpha$ :
-
Learning rate
$\gamma$ for$v$ update |$v$ 更新学习率$\gamma$ :
We identified a parameter combination that outperforms Adam through comparative experiments with fixed random seeds on the MNIST task
我们在 MNIST 任务上,通过固定随机种子对比实验,找到了超越 Adam 的参数组合
| Parameter/参数 | Symbol/符号 | Description/描述 | Value/数值 |
|---|---|---|---|
| Initial Learning Rate | - | ||
| 初始学习率 | - | ||
| Momentum Coefficient 1 | Baseline affecting |
||
| 动量系数 1 | 影响 |
||
|
|
History retention rate for |
||
|
|
|
||
| Momentum Decay Exponent | Decay speed for |
||
| 动量衰减指数 | 衰减速度( |
||
|
|
Decay speed for |
||
|
|
衰减速度( |
The FTLO performs well in tasks far from the optimal solution or those that are high-dimensional and complex (such as MNIST). However, when the initial point is very close to the optimum (for example, starting at
FTLO在远离最优解或高维、复杂任务(如 MNIST)中表现出色。然而,当初始点非常接近最优解(例如 Rosenbrock 函数的
Adjusting the decay parameters according to the initial conditions is key to using FTLO:
根据初始条件调整衰减参数,是使用 FTLO 的关键:
| Scenario/场景 | Initial Conditions/初始条件 | Tuning Suggestion/调参建议 | Purpose/目的 |
|---|---|---|---|
| High-Dim / Exploration | Starting point far from target | Keep |
Traverse flat regions quickly |
| 高维/探索 | 初始点远离目标 | 保持 |
快速穿梭平坦区域 |
| Fine-Tuning / Stabilization | Starting point close to target |
Lower |
Weaken momentum influence, accelerate into stable convergence |
| 微调/稳定 | 初始点接近目标 |
调低 |
削弱动量影响,加速进入稳定收敛 |
Since the momentum term
由于FTLO的动量项 optimizer.step() 之前应用梯度裁剪
In our MNIST experiments, we used CLIP_NORM = 1.0
在 MNIST 实验中,我们使用了 CLIP_NORM = 1.0