You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,6 +34,7 @@ The project is production-oriented and comes with [backward compatibility guaran
34
34
***Lightweight on disk**<br/>Quantization can make the models 4 times smaller on disk with minimal accuracy loss.
35
35
***Simple integration**<br/>The project has few dependencies and exposes simple APIs in [Python](https://opennmt.net/CTranslate2/python/overview.html) and C++ to cover most integration needs.
36
36
***Configurable and interactive decoding**<br/>[Advanced decoding features](https://opennmt.net/CTranslate2/decoding.html) allow autocompleting a partial sequence and returning alternatives at a specific location in the sequence.
37
+
***Support tensor parallelism for distributed inference.
37
38
38
39
Some of these features are difficult to achieve with standard deep learning frameworks and are the motivation for this project.
Copy file name to clipboardExpand all lines: docs/parallel.md
+43-1Lines changed: 43 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,8 +42,50 @@ Parallelization with multiple Python threads is possible because all computation
42
42
```
43
43
44
44
## Model and tensor parallelism
45
+
Models used with [`Translator`](python/ctranslate2.Translator.rst) and [`Generator`](python/ctranslate2.Generator.rst) can be split into multiple GPUs.
46
+
This is very useful when the model is too big to be loaded in only 1 GPU.
45
47
46
-
These types of parallelism are not yet implemented in CTranslate2.
0 commit comments