-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hi,
I'm trying to run Herro using an A100 card. Not clear why it is running out of memory. I was running with -b 128 so I've dropped that down.
I'm running with singularity. Any suggestions?
Thanks
[00:01:26] Processing 1/? batch _
[>---------------------------------------] 93/90774 [W manager.cpp:340] Warning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback
(function runCudaFusionGroup)
thread '' panicked at src/inference.rs:172:64:
called Result::unwrap() on an Err value: Torch("The following operation failed in the TorchScript interpreter.\nTraceback of TorchScript (most recent call last):\nRuntimeError: The following operation failed in the TorchScript interpreter.\nTraceback of TorchScript, serialized code (most recent call last):\n File "code/torch/model.py", line 36, in fallback_cuda_fuser\n x0 = torch.permute(x, [0, 3, 1, 2])\n qn = self.qn\n sliced_sequences_concatenated = (qn).forward(x0, target_positions, lengths, )\n ~~~~~~~~~~~ <--- HERE\n fc2 = self.fc2\n _1 = (fc2).forward(sliced_sequences_concatenated, )\n File "code/torch/transformer.py", line 16, in forward\n _0 = torch.torch.nn.utils.rnn.pad_sequence\n context_read = self.context_read\n x0 = (context_read).forward(x, )\n ~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n context_pos = self.context_pos\n x1 = (context_pos).forward(x0, )\n File "code/torch/torch/nn/modules/container.py", line 15, in forward\n _2 = getattr(self, "2")\n input0 = (_0).forward(input, )\n input1 = (_1).forward(input0, )\n ~~~~~~~~~~~ <--- HERE\n return (_2).forward(input1, )\n def len(self: torch.torch.nn.modules.container.Sequential) -> int:\n File "code/torch/torch/nn/modules/batchnorm.py", line 35, in forward\n weight = self.weight\n bias = self.bias\n _3 = _0(input, running_mean, running_var, weight, bias, bn_training, 0.10000000000000001, 1.0000000000000001e-05, )\n ~~ <--- HERE\n return _3\n def _check_input_dim(self: torch.torch.nn.modules.batchnorm.BatchNorm2d,\n File "code/torch/torch/nn/functional.py", line 52, in batch_norm\n else:\n pass\n _6 = torch.batch_norm(input, weight, bias, running_mean, running_var, training, momentum, eps, True)\n ~~~~~~~~~~~~~~~~ <--- HERE\n return _6\ndef relu(input: Tensor,\n\nTraceback of TorchScript, original code (most recent call last):\n File "/raid/scratch/stanojevicd/projects/haec-BigBird/model.py", line 157, in fallback_cuda_fuser\n sliced_sequences_concatenated = torch.cat(encoded)'''\n x = x.permute((0, 3, 1, 2))\n sliced_sequences_concatenated = self.qn(x, target_positions, lengths)\n ~~~~~~~ <--- HERE\n \n # list of tensors of shape (selected_token_number, 1) -> (selected_token_number)\n File "/raid/scratch/stanojevicd/projects/haec-BigBird/transformer.py", line 36, in forward\n def forward(self, x: Tensor, target_positions: List[Tensor],\n lengths: Tensor) -> Tensor:\n x = self.context_read(x) # [B, I, L, R] -> [B, 128, L, R]\n ~~~~~~~~~~~~~~~~~ <--- HERE\n x = self.context_pos(x) # [B, 128, L, R] -> [B, 256, L, 1]\n x = x.squeeze(-1).transpose(1, 2) # [B, L, 256]\n File "/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/modules/container.py", line 215, in forward\n def forward(self, input):\n for module in self:\n input = module(input)\n ~~~~~~ <--- HERE\n return input\n File "/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward\n used for normalization (i.e. in eval mode when buffers are not None).\n """\n return F.batch_norm(\n ~~~~~~~~~~~~ <--- HERE\n input,\n # If buffers are not to be tracked, ensure that they won't be updated\n File "/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/functional.py", line 2478, in batch_norm\n _verify_batch_size(input.size())\n\n return torch.batch_norm(\n ~~~~~~~~~~~~~~~~ <--- HERE\n input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled\n )\nRuntimeError: CUDA out of memory. Tried to allocate 4.64 GiB (GPU 0; 9.50 GiB total capacity; 4.74 GiB already allocated; 1.93 GiB free; 7.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF\n\n")
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted (core dumped)