Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix bmm memory leak#5744

Merged
ezyang merged 1 commit into
pytorch:masterfrom
zou3519:fix-bmm-leak
Mar 15, 2018
Merged

Fix bmm memory leak#5744
ezyang merged 1 commit into
pytorch:masterfrom
zou3519:fix-bmm-leak

Conversation

@zou3519
Copy link
Copy Markdown
Contributor

@zou3519 zou3519 commented Mar 13, 2018

Fixes #5611.

THCTensor_(baddbmm) assumes that newContiguous will always return a new tensor (this is a bad assumption). At the end of the function, tensors are freed if tensor_new != tensor_old. As a result, some tensors aren't freed if they were initially contiguous and newContiguous is called on them.

Test Plan

import subprocess
import torch
from torch.autograd import Variable

# This is from https://discuss.pytorch.org/t/access-gpu-memory-usage-in-pytorch/3192/4
def get_gpu_memory_map():
    """Get the current gpu usage.

    Returns
    -------
    usage: dict
        Keys are device ids as integers.
        Values are memory usage as integers in MB.
    """
    result = subprocess.check_output(
        [
            'nvidia-smi', '--query-gpu=memory.used',
            '--format=csv,nounits,noheader'
        ], encoding='utf-8')
    # Convert lines into a dictionary
    gpu_memory = [int(x) for x in result.strip().split('\n')]
    gpu_memory_map = dict(zip(range(len(gpu_memory)), gpu_memory))
    return gpu_memory_map

l, m, n = 1, 9, 1
w = torch.nn.Parameter(torch.Tensor(1024, 2, l, m).cuda())
for i in range(10000):
    a = Variable(torch.Tensor(1024, 2, m, n).cuda())
    torch.matmul(w, a).permute(0, 3, 1, 2).mean().backward()
    if i % 100 == 0:
        gpu_mem = get_gpu_memory_map()
        print("GPU: {:.2f} KB".format(gpu_mem[0]))

@fmassa
Copy link
Copy Markdown
Member

fmassa commented Mar 13, 2018

I thought that if the tensor is contiguous, then newContiguous return the same tensor as before with the reference count bumped.
So maybe a simpler fix would be to always do THCTensor_free on the tensors, irrespective if they are contiguous or not? Or maybe I'm missing something?

EDIT: bad idea, as there are places where the Tensor is not incref. Disregard what I said

@apaszke
Copy link
Copy Markdown
Contributor

apaszke commented Mar 13, 2018

@fmassa I thought newContiguous always returns a new reference, and IIRC we heavily depend on that in THNN/THCUNN code, so your fix seems valid

@fmassa
Copy link
Copy Markdown
Member

fmassa commented Mar 13, 2018

There are places in the code where they do batch1_ = batch1, so my fix wouldn't work. But those places could be changed though

@zou3519
Copy link
Copy Markdown
Contributor Author

zou3519 commented Mar 13, 2018

Either approach works. If we want to always THCTensor_free on the tensors then we need to call THCTensor_(retain) in the branches of the conditional where newContiguous isn't being called. I took the approach that I did (check if tensor is contiguous before calling newContiguous) to minimize the number of lines modified.

@goldsborough
Copy link
Copy Markdown
Contributor

@pytorchbot retest this please

1 similar comment
@goldsborough
Copy link
Copy Markdown
Contributor

@pytorchbot retest this please

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 15, 2018

@pytorchbot retest this please

1 similar comment
@yf225
Copy link
Copy Markdown
Contributor

yf225 commented Mar 15, 2018

@pytorchbot retest this please

@ezyang ezyang merged commit 8277781 into pytorch:master Mar 15, 2018
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Fixes pytorch#5611.

THCTensor_(baddbmm) assumes that newContiguous will always return a new tensor (this is a bad assumption). At the end of the function, tensors are freed if tensor_new != tensor_old. As a result, some tensors aren't freed if they were initially contiguous and newContiguous is called on them.

Test Plan
code reading
run the following (from the pytorch#5611 bug report) and assert that the memory doesn't leak anymore
import subprocess
import torch
from torch.autograd import Variable

# This is from https://discuss.pytorch.org/t/access-gpu-memory-usage-in-pytorch/3192/4
def get_gpu_memory_map():
    """Get the current gpu usage.

    Returns
    -------
    usage: dict
        Keys are device ids as integers.
        Values are memory usage as integers in MB.
    """
    result = subprocess.check_output(
        [
            'nvidia-smi', '--query-gpu=memory.used',
            '--format=csv,nounits,noheader'
        ], encoding='utf-8')
    # Convert lines into a dictionary
    gpu_memory = [int(x) for x in result.strip().split('\n')]
    gpu_memory_map = dict(zip(range(len(gpu_memory)), gpu_memory))
    return gpu_memory_map

l, m, n = 1, 9, 1
w = torch.nn.Parameter(torch.Tensor(1024, 2, l, m).cuda())
for i in range(10000):
    a = Variable(torch.Tensor(1024, 2, m, n).cuda())
    torch.matmul(w, a).permute(0, 3, 1, 2).mean().backward()
    if i % 100 == 0:
        gpu_mem = get_gpu_memory_map()
        print("GPU: {:.2f} KB".format(gpu_mem[0]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants