Memory errors?

Hi,

I'm trying to run Herro using an A100 card. Not clear why it is running out of memory. I was running with -b 128 so I've dropped that down.

I'm running with singularity. Any suggestions?

Thanks


[00:01:26] Processing 1/? batch _
[>---------------------------------------] 93/90774                                                                                                                       [W manager.cpp:340] Warning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
 (function runCudaFusionGroup)
thread '<unnamed>' panicked at src/inference.rs:172:64:
called `Result::unwrap()` on an `Err` value: Torch("The following operation failed in the TorchScript interpreter.\nTraceback of TorchScript (most recent call last):\nRuntimeError: The following operation failed in the TorchScript interpreter.\nTraceback of TorchScript, serialized code (most recent call last):\n  File \"code/__torch__/model.py\", line 36, in fallback_cuda_fuser\n    x0 = torch.permute(x, [0, 3, 1, 2])\n    qn = self.qn\n    sliced_sequences_concatenated = (qn).forward(x0, target_positions, lengths, )\n                                     ~~~~~~~~~~~ <--- HERE\n    fc2 = self.fc2\n    _1 = (fc2).forward(sliced_sequences_concatenated, )\n  File \"code/__torch__/transformer.py\", line 16, in forward\n    _0 = __torch__.torch.nn.utils.rnn.pad_sequence\n    context_read = self.context_read\n    x0 = (context_read).forward(x, )\n          ~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n    context_pos = self.context_pos\n    x1 = (context_pos).forward(x0, )\n  File \"code/__torch__/torch/nn/modules/container.py\", line 15, in forward\n    _2 = getattr(self, \"2\")\n    input0 = (_0).forward(input, )\n    input1 = (_1).forward(input0, )\n              ~~~~~~~~~~~ <--- HERE\n    return (_2).forward(input1, )\n  def __len__(self: __torch__.torch.nn.modules.container.Sequential) -> int:\n  File \"code/__torch__/torch/nn/modules/batchnorm.py\", line 35, in forward\n    weight = self.weight\n    bias = self.bias\n    _3 = _0(input, running_mean, running_var, weight, bias, bn_training, 0.10000000000000001, 1.0000000000000001e-05, )\n         ~~ <--- HERE\n    return _3\n  def _check_input_dim(self: __torch__.torch.nn.modules.batchnorm.BatchNorm2d,\n  File \"code/__torch__/torch/nn/functional.py\", line 52, in batch_norm\n  else:\n    pass\n  _6 = torch.batch_norm(input, weight, bias, running_mean, running_var, training, momentum, eps, True)\n       ~~~~~~~~~~~~~~~~ <--- HERE\n  return _6\ndef relu(input: Tensor,\n\nTraceback of TorchScript, original code (most recent call last):\n  File \"/raid/scratch/stanojevicd/projects/haec-BigBird/model.py\", line 157, in fallback_cuda_fuser\n        sliced_sequences_concatenated = torch.cat(encoded)'''\n        x = x.permute((0, 3, 1, 2))\n        sliced_sequences_concatenated = self.qn(x, target_positions, lengths)\n                                        ~~~~~~~ <--- HERE\n    \n        # list of tensors of shape (selected_token_number, 1) -> (selected_token_number)\n  File \"/raid/scratch/stanojevicd/projects/haec-BigBird/transformer.py\", line 36, in forward\n    def forward(self, x: Tensor, target_positions: List[Tensor],\n                lengths: Tensor) -> Tensor:\n        x = self.context_read(x)  # [B, I, L, R] -> [B, 128, L, R]\n            ~~~~~~~~~~~~~~~~~ <--- HERE\n        x = self.context_pos(x)  # [B, 128, L, R] -> [B, 256, L, 1]\n        x = x.squeeze(-1).transpose(1, 2)  # [B, L, 256]\n  File \"/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/modules/container.py\", line 215, in forward\n    def forward(self, input):\n        for module in self:\n            input = module(input)\n                    ~~~~~~ <--- HERE\n        return input\n  File \"/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/modules/batchnorm.py\", line 171, in forward\n        used for normalization (i.e. in eval mode when buffers are not None).\n        \"\"\"\n        return F.batch_norm(\n               ~~~~~~~~~~~~ <--- HERE\n            input,\n            # If buffers are not to be tracked, ensure that they won't be updated\n  File \"/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/functional.py\", line 2478, in batch_norm\n        _verify_batch_size(input.size())\n\n    return torch.batch_norm(\n           ~~~~~~~~~~~~~~~~ <--- HERE\n        input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled\n    )\nRuntimeError: CUDA out of memory. Tried to allocate 4.64 GiB (GPU 0; 9.50 GiB total capacity; 4.74 GiB already allocated; 1.93 GiB free; 7.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF\n\n")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory errors? #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory errors? #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions