Why did multiple performances result in zero in Table 3 - e.g.: FinQA ? <img width="586" alt="image" src="https://github.com/user-attachments/assets/79c6c6b4-c6a0-41c8-b4db-54b1652e31a9"> in paper: https://arxiv.org/pdf/2402.12659