Step 4: Exploring Additional Language Pairs

Section 8 of 8-~ 12 min read-Synced from Cuantum content

MarianMT supports various language pairs. You can experiment with models such as:

Helsinki-NLP/opus-mt-en-de for English to German.

Helsinki-NLP/opus-mt-fr-en for French to English.

Simply replace the model name in the model_name variable to load a different language pair. Here’s an example for English to German:

model_name = "Helsinki-NLP/opus-mt-en-de"tokenizer = MarianTokenizer.from_pretrained(model_name)model = MarianMTModel.from_pretrained(model_name) # Translate a sentencetext_to_translate = ["Welcome to the world of transformers!"]inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)translated_outputs = model.generate(**inputs)translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs] print(f"Translated Text (EN to DE): {translated_texts[0]}")

Let's break down this code example:

1. Model Setup:

Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"

Initializes both the tokenizer and model from the pre-trained weights

2. Translation Process:

Creates a sample text array with one sentence: "Welcome to the world of transformers!"

Converts the text into tokens that the model can understand using the tokenizer

Generates the translation using the model's generate method

Decodes the output back into readable text, skipping special tokens

3. Output:

Finally prints the translated text, showing the English to German conversion