Add GPU Memory Polling Function #911

jmdelahanty · 2022-08-12T23:40:57Z

Description

Imports subprocess for invoking nvidia-smi query that asks for GPU indexes, unused/available memory, and total memory on the card. Returns a dictionary that has GPU index as the key with fraction of GPU available as the value.

Types of changes

Does this address any currently open issues?

Nope!

Outside contributors checklist

Review the guidelines for contributing to this repository
Read and sign the CLA and add yourself to the authors list
Make sure you are making a pull request against the develop branch (not main). Also you should start your branch off develop
Add tests that prove your fix is effective or that your feature works
Are tests required for this function and, if so, what should that look like?
Add necessary documentation (if appropriate)
Much redacted documentation in the function, down to write docs elsewhere if you'd like!

Thank you for contributing to SLEAP!

❤️

Add get_gpu_memory function for polling GPUs on a machine and their available vRAM.

Add newline to end of file

Add my name to Authors markdown per SLEAP outside contributor guidelines.

codecov · 2022-08-13T00:05:18Z

Codecov Report

Merging #911 (0c7c955) into develop (4de5213) will decrease coverage by 0.04%.
The diff coverage is 10.00%.

@@             Coverage Diff             @@
##           develop     #911      +/-   ##
===========================================
- Coverage    67.63%   67.58%   -0.05%     
===========================================
  Files          130      130              
  Lines        22209    22226      +17     
===========================================
+ Hits         15020    15022       +2     
- Misses        7189     7204      +15

Impacted Files	Coverage Δ
sleap/nn/inference.py	`79.35% <0.00%> (-0.19%)`	⬇️
sleap/nn/training.py	`59.95% <0.00%> (-0.23%)`	⬇️
sleap/nn/system.py	`43.05% <18.18%> (-4.49%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

roomrys · 2022-08-18T16:50:41Z

TODO to make this usable:

Add a flag to sleap-train called --gpu auto that uses this function to select the GPU with lowest memory usage. It would be then called here:

sleap/sleap/nn/training.py

Lines 1932 to 1941 in 2688d56

    
           else: 
        
               if args.first_gpu: 
        
                   sleap.nn.system.use_first_gpu() 
        
                   logger.info("Using the first GPU for acceleration.") 
        
               elif args.last_gpu: 
        
                   sleap.nn.system.use_last_gpu() 
        
                   logger.info("Using the last GPU for acceleration.") 
        
               else: 
        
                   sleap.nn.system.use_gpu(args.gpu) 
        
                   logger.info(f"Using GPU {args.gpu} for acceleration.")

Something like:

        if args.first_gpu:
            sleap.nn.system.use_first_gpu()
            logger.info("Using the first GPU for acceleration.")
        elif args.last_gpu:
            sleap.nn.system.use_last_gpu()
            logger.info("Using the last GPU for acceleration.")
        else:
            if args.gpu == "auto":
                gpu_ind = np.argmin(sleap.nn.system.get_gpu_memory().values())
            else:
                gpu_ind = int(args.gpu)
            sleap.nn.system.use_gpu(gpu_ind)
            logger.info(f"Using GPU {args.gpu} for acceleration.")

And then a similar setup for inference here:

sleap/sleap/nn/inference.py

Lines 4270 to 4279 in 2688d56

    
           # Setup devices. 
        
           if args.cpu or not sleap.nn.system.is_gpu_system(): 
        
               sleap.nn.system.use_cpu_only() 
        
           else: 
        
               if args.first_gpu: 
        
                   sleap.nn.system.use_first_gpu() 
        
               elif args.last_gpu: 
        
                   sleap.nn.system.use_last_gpu() 
        
               else: 
        
                   sleap.nn.system.use_gpu(args.gpu)

roomrys · 2022-08-24T20:49:45Z

sleap/nn/system.py

+        # Append percent of GPU available to GPU ID
+        memory_dict[gpu_id] = round(int(available_memory) / int(total_memory), 4)


Do we want this to be a percentage?

EDIT: Do we want this to be a fraction instead of just the available_memory?

I don't know! I wasn't sure if having it as a percentage would be helpful/more readable/understandable so I just left it that way.

I also realize now that the comment says percentage but it's not a percent value that's given, so we can change the comment or the value. I have no preference and also don't know what best practice is lol

I think as it is currently written it is a fraction no? Ratio of:

Available Memory

Total Memory

roomrys · 2022-08-24T21:00:11Z

Although we cannot test gpu features through github actions yet, I tested the new additions locally - they passed.

jmdelahanty · 2022-08-24T21:10:03Z

Neato!

jmdelahanty · 2022-08-25T18:11:26Z

This is so much cleaner! I don't know why I didn't think of using the list indices as the index of the card. That makes a lot more sense. Nice one Liezl!

jmdelahanty added 3 commits August 11, 2022 16:56

Update system.py

972f893

Add get_gpu_memory function for polling GPUs on a machine and their available vRAM.

Update system.py

a12232b

Add newline to end of file

Update AUTHORS

2bf956e

Add my name to Authors markdown per SLEAP outside contributor guidelines.

Update get_gpu_memory with black formating

14f7b6c

roomrys added 4 commits August 24, 2022 11:41

Merge branch 'develop' of https://github.com/talmolab/sleap into develop

7ca860f

Implement gpu auto allocation in training and inference

f0bf23c

Lint

a22509f

Merge branch 'develop' of https://github.com/talmolab/sleap into develop

faf28a8

roomrys reviewed Aug 24, 2022

View reviewed changes

roomrys requested a review from talmo August 24, 2022 20:50

Simplify get_gpu_memory function and lint

00b930e

roomrys added 3 commits September 1, 2022 13:38

Merge branch 'develop' of https://github.com/talmolab/sleap into develop

d0cb8d2

Remove --gpu auto as the default

44af846

Merge branch 'develop' of https://github.com/talmolab/sleap into develop

0c7c955

roomrys merged commit c6406a2 into talmolab:develop Sep 6, 2022

roomrys mentioned this pull request Sep 11, 2022

SLEAP v1.2.7 #943

Merged

This was referenced Sep 13, 2022

Training on some systems fails due to GPU memory query #949

Closed

Add test and fix for GPU memory query #950

Merged

SkepticRaven mentioned this pull request Nov 3, 2022

GPU Memory polling ignores GPU masking #1028

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPU Memory Polling Function #911

Add GPU Memory Polling Function #911

Uh oh!

jmdelahanty commented Aug 12, 2022

Uh oh!

codecov bot commented Aug 13, 2022 •

edited

Loading

Uh oh!

roomrys commented Aug 18, 2022

Uh oh!

roomrys Aug 24, 2022 •

edited

Loading

Uh oh!

jmdelahanty Aug 24, 2022

Uh oh!

jmdelahanty Aug 24, 2022

Uh oh!

jmdelahanty Aug 25, 2022

Uh oh!

roomrys commented Aug 24, 2022

Uh oh!

jmdelahanty commented Aug 24, 2022

Uh oh!

jmdelahanty commented Aug 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Append percent of GPU available to GPU ID
		memory_dict[gpu_id] = round(int(available_memory) / int(total_memory), 4)

Add GPU Memory Polling Function #911

Add GPU Memory Polling Function #911

Uh oh!

Conversation

jmdelahanty commented Aug 12, 2022

Description

Types of changes

Does this address any currently open issues?

Outside contributors checklist

Thank you for contributing to SLEAP!

Uh oh!

codecov bot commented Aug 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

roomrys commented Aug 18, 2022

Uh oh!

roomrys Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmdelahanty Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

jmdelahanty Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

jmdelahanty Aug 25, 2022

Choose a reason for hiding this comment

Uh oh!

roomrys commented Aug 24, 2022

Uh oh!

jmdelahanty commented Aug 24, 2022

Uh oh!

jmdelahanty commented Aug 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Aug 13, 2022 •

edited

Loading

roomrys Aug 24, 2022 •

edited

Loading