POC: enable to train at the double precision #207

raimis · 2023-07-24T14:07:18Z

No description provided.

RaulPPelaez · 2023-07-24T14:28:06Z

torchmdnet/scripts/train.py

@@ -37,7 +37,7 @@ def get_args():
    parser.add_argument('--ema-alpha-neg-dy', type=float, default=1.0, help='The amount of influence of new losses on the exponential moving average of dy')
    parser.add_argument('--ngpus', type=int, default=-1, help='Number of GPUs, -1 use all available. Use CUDA_VISIBLE_DEVICES=1, to decide gpus')
    parser.add_argument('--num-nodes', type=int, default=1, help='Number of nodes')
-    parser.add_argument('--precision', type=int, default=32, choices=[16, 32], help='Floating point precision')
+    parser.add_argument('--precision', type=int, default=32, choices=[16, 32, 64], help='Floating point precision')


oh wow I totally missed this argument when I implemented #182

RaulPPelaez · 2023-07-24T14:30:25Z

torchmdnet/module.py

-            loss_y = loss_fn(y, batch.y)
+            # y
+            y_dtype = {16: torch.float16, 32: torch.float32, 64: torch.float64}[self.hparams.precision]
+            loss_y = loss_fn(y, batch.y.to(y_dtype))


How come you need this here but not a few lines above for neg_dy?

RaulPPelaez · 2023-07-24T14:31:25Z

torchmdnet/datasets/ace.py

+                # Keep molecules with specific elements
+                if self.atomic_numbers:
+                    if not set(z.numpy()).issubset(self.atomic_numbers):
+                        continue


This got mixed from #206, right?

RaulPPelaez · 2023-07-24T14:33:11Z

torchmdnet/datasets/ace.py

-        y = pt.tensor(self.y_mm[idx], dtype=pt.float32).view(
-            1, 1
-        )  # It would be better to use float64, but the trainer complaints
+        y = pt.tensor(self.y_mm[idx], dtype=pt.float64).view(1, 1)


I would pass dtype as an argument to Ace here and store everything in the correct type. I do not see why store pos in float32 and y in float64.

RaulPPelaez · 2023-08-07T13:26:34Z

@raimis I believe #208 solves what you are trying to do here. I can train with float64 with hte code in that PR.

raimis · 2023-09-05T12:12:15Z

Obsolete

Raimondas Galvelis added 3 commits July 19, 2023 18:49

Implement element filtering in the Ace datasets

223037f

Make the Ace loader to load energy in float64

32ce2dc

Convert energy to right precision for training

8ce19c6

raimis requested a review from RaulPPelaez July 24, 2023 14:07

raimis self-assigned this Jul 24, 2023

RaulPPelaez reviewed Jul 24, 2023

View reviewed changes

raimis closed this Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: enable to train at the double precision #207

POC: enable to train at the double precision #207

Uh oh!

raimis commented Jul 24, 2023

Uh oh!

RaulPPelaez Jul 24, 2023

Uh oh!

RaulPPelaez Jul 24, 2023

Uh oh!

RaulPPelaez Jul 24, 2023

Uh oh!

RaulPPelaez Jul 24, 2023

Uh oh!

RaulPPelaez commented Aug 7, 2023

Uh oh!

raimis commented Sep 5, 2023

Uh oh!

Uh oh!

POC: enable to train at the double precision #207

POC: enable to train at the double precision #207

Uh oh!

Conversation

raimis commented Jul 24, 2023

Uh oh!

RaulPPelaez Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez commented Aug 7, 2023

Uh oh!

raimis commented Sep 5, 2023

Uh oh!

Uh oh!