May I ask the global batch size when I use gpt2.yaml. When I run the code on v3-32, that means the global batch size is 16 (batch size) * 4 (the number of hosts) * 1 (gradient accumulation). If that is right, the global batch size is 64, which maybe too small. Thanks in advance for your help!
Best,
Lucas