Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RuntimeError during embedding #3

@jacgonisa

Description

@jacgonisa

Hello,

I am having an issue while running unicore:

unicore createdb -g 012OMARK/AllProteomes/ 013UNICORE/proteome_db prostt5/weights/

Using device: cuda:0

Loading T5 from: prostt5/weights/

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (

previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what i

t means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

/home/jg2070/miniforge3/envs/unicore/etc/predict_3Di_encoderOnly.py:175: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default valu

e), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https:

//github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `T

rue`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are expli

citly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have ful

l control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.

  state = torch.load(checkpoint_p, map_location=device) 

Using models in full-precision.

########################################

Example sequence: unicore_2880843

MQNNCFKIATLCMEPPKFDFEMVLERKRLKDKQKLLKQYRLLEGFVGPTVGTTVTGTNTDIGEADADGGPQEGTTAESDASTQETTEKFTVEEFKDLRRAEGVEDYDDYDFSGELTDDDYIEN

########################################

Total number of sequences: 4952781

Average sequence length: 454.02570131810796

Number of sequences >1000: 397046

  0%|                                                                                                                                     | 0/4952781 [00:00<?, ?it/s]

RuntimeError during embedding for unicore_2793106 (L=315894)

  0%|                                                                                                                        | 1/4952781 [00:02<2867:45:30,  2.08s/it]

RuntimeError during embedding for unicore_2022362 (L=117904)

  0%|                                                                                                                        | 2/4952781 [00:02<1664:13:29,  1.21s/it]

RuntimeError during embedding for unicore_263579 (L=113479)

  0%|                                                                                                                        | 3/4952781 [00:02<1047:25:31,  1.31it/s]

Some details of the GPU I am using:

torch.cuda.get_device_name(0)
'NVIDIA A100-SXM4-80GB'


device = torch.device("cuda:0")
total_memory = torch.cuda.get_device_properties(device).total_memory
total_memory
84974239744
print(f"Total GPU Memory: {total_memory / (1024 ** 3):.2f} GB")
Total GPU Memory: 79.14 GB

Should I perhaps try to compile foldseek with CUDA and run instead with --use-foldseek?

All the best

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions