NLP with Transformers: Advanced Techniques and Multimodal ApplicationsChapter 25

Step 4: Exploring Additional Language Pairs

Section 8 of 8-~ 12 min read-Synced from Cuantum content

MarianMT supports various language pairs. You can experiment with models such as:

  • Helsinki-NLP/opus-mt-en-de for English to German.
  • Helsinki-NLP/opus-mt-fr-en for French to English.

Simply replace the model name in the model_name variable to load a different language pair. Here’s an example for English to German:

model_name = "Helsinki-NLP/opus-mt-en-de"tokenizer = MarianTokenizer.from_pretrained(model_name)model = MarianMTModel.from_pretrained(model_name) # Translate a sentencetext_to_translate = ["Welcome to the world of transformers!"]inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)translated_outputs = model.generate(**inputs)translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs] print(f"Translated Text (EN to DE): {translated_texts[0]}")

Let's break down this code example:

1. Model Setup:

  • Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
  • Initializes both the tokenizer and model from the pre-trained weights

2. Translation Process:

  • Creates a sample text array with one sentence: "Welcome to the world of transformers!"
  • Converts the text into tokens that the model can understand using the tokenizer
  • Generates the translation using the model's generate method
  • Decodes the output back into readable text, skipping special tokens

3. Output:

  • Finally prints the translated text, showing the English to German conversion