RuntimeError during embedding

Hello,

I am having an issue while running unicore:

`unicore createdb -g  012OMARK/AllProteomes/ 013UNICORE/proteome_db prostt5/weights/
`

```
Using device: cuda:0

Loading T5 from: prostt5/weights/

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (

previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what i

t means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

/home/jg2070/miniforge3/envs/unicore/etc/predict_3Di_encoderOnly.py:175: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default valu

e), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https:

//github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `T

rue`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are expli

citly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have ful

l control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.

  state = torch.load(checkpoint_p, map_location=device) 

Using models in full-precision.

########################################

Example sequence: unicore_2880843

MQNNCFKIATLCMEPPKFDFEMVLERKRLKDKQKLLKQYRLLEGFVGPTVGTTVTGTNTDIGEADADGGPQEGTTAESDASTQETTEKFTVEEFKDLRRAEGVEDYDDYDFSGELTDDDYIEN

########################################

Total number of sequences: 4952781

Average sequence length: 454.02570131810796

Number of sequences >1000: 397046

  0%|                                                                                                                                     | 0/4952781 [00:00<?, ?it/s]

RuntimeError during embedding for unicore_2793106 (L=315894)

  0%|                                                                                                                        | 1/4952781 [00:02<2867:45:30,  2.08s/it]

RuntimeError during embedding for unicore_2022362 (L=117904)

  0%|                                                                                                                        | 2/4952781 [00:02<1664:13:29,  1.21s/it]

RuntimeError during embedding for unicore_263579 (L=113479)

  0%|                                                                                                                        | 3/4952781 [00:02<1047:25:31,  1.31it/s]
```

Some details of the GPU I am using:


```
torch.cuda.get_device_name(0)
'NVIDIA A100-SXM4-80GB'


device = torch.device("cuda:0")
total_memory = torch.cuda.get_device_properties(device).total_memory
total_memory
84974239744
print(f"Total GPU Memory: {total_memory / (1024 ** 3):.2f} GB")
Total GPU Memory: 79.14 GB
```


Should I perhaps try to compile foldseek with CUDA and run instead with --use-foldseek?

All the best





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError during embedding #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError during embedding #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions