Thanks to visit codestin.com
Credit goes to github.com

Skip to content

R wrapper around the EasyNMT python library, providing "Easy to use, state-of-the-art Neural Machine Translation for 100+ languages" locally.

License

Notifications You must be signed in to change notification settings

thieled/easieRnmt

Repository files navigation

easieRnmt

The goal of easieRnmt is to provide a user-friendly R wrapper around the EasyNMT python library, which provides “Easy to use, state-of-the-art Neural Machine Translation for 100+ languages” - on a local machine.

Installation

You can install the development version of easieRnmt from GitHub with:

# install.packages("pak")
pak::pak("thieled/easieRnmt")

The package runs EasyNMT from a conda environment ‘r-easynmt’. This function will install and set up everything for you. It also automatically installs the correct pytorch version - supporting CUDA (Nvidia GPU) integration if this is available on your machine:

easieRnmt::install_easynmt()

Note that the package requires a C++ compiler (e.g. g++). If you are a Windows user, please make sure to install a RTools version that matches your R version, from here.

Example

The package easieRnmt completely takes care of preprocessing your text data - from sentence tokenization, careful cleaning, emoji-replacement, language detection, and handling ambiguous cases.

To avoid compatibility conflicts with the fasttext python library in Windows, it uses the fastText R package for language detection.

It supports efficient batch-processing, and takes care that only language-homogeneous batches are processed – as the models assume that languages is consistent within batches.

Finally, it glues all translated sentences back together to the input format, sorts the translations as the input, and returns either a data.table (including the cleaned text and additional information) or the string only.

# Minimal example
sentences = c('Dies ist ein Satz in Deutsch. Und noch ein Satz.',   # This is a German sentence
              'Esta es una oración en español.', # This is a Spanish sentence
              "هذه جملة باللغة العربية!!!")       # This is an Arabic sentence

library(easieRnmt)

# Initialize easieRnmt
easieRnmt::initialize_easynmt()

# Translate
res <- easieRnmt::translate(sentences,
                     model = 'opus-mt',
                     targ_lang = "en",
                     return_string = T)
# Print results
print(res)

### Output: 

# Running fastText language detection...
#   |                                                  | 0 % ~calculating  Processing language: ar
# Translating batches: 100%|██████████| 1/1 [00:00<00:00,  5.09batch/s]
#   |+++++++++++++++++                                 | 33% ~00s          Processing language: de
# Translating batches: 100%|██████████| 1/1 [00:00<00:00, 13.80batch/s]
#   |++++++++++++++++++++++++++++++++++                | 67% ~00s          Processing language: es
# Translating batches: 100%|██████████| 1/1 [00:00<00:00, 19.86batch/s]
#   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
# > print(res)
# [1] "This is a sentence in German. And another sentence." "This is a sentence in Spanish."                     
# [3] "That's a sentence in Arabic!" 

About

R wrapper around the EasyNMT python library, providing "Easy to use, state-of-the-art Neural Machine Translation for 100+ languages" locally.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published