NLP with Transformers: Fundamentals and Core ApplicationsChapter 108

8. Step 5: Testing with New Data

Section 8 of 9-~ 12 min read-Synced from Cuantum content

You can test your model on custom news articles to see how well it categorizes them.

# Define a custom news articlecustom_text = "The stock market saw significant gains today as tech stocks rallied." # Tokenize and predictinputs = tokenizer(custom_text, return_tensors="pt", truncation=True, padding=True)outputs = model(**inputs)predicted_label = outputs.logits.argmax(-1).item() # Map predicted label to categorycategories = ['World', 'Sports', 'Business', 'Sci/Tech']print(f"Predicted Category: {categories[predicted_label]}")

Let's break down this code that tests the BERT model with new data:

1. Input Definition:

custom_text = "The stock market saw significant gains today as tech stocks rallied."

This line creates a sample news article text that we want to categorize.

2. Processing the Input:

  • The tokenizer() function converts the text into a format BERT can understand, with these parameters:
  • return_tensors="pt": Returns PyTorch tensors
  • truncation=True: Cuts text if it's too long
  • padding=True: Adds padding to standardize input length

3. Making Predictions:

  • The model(**inputs) runs the processed text through the BERT model
  • The outputs.logits.argmax(-1).item() gets the predicted category index with the highest probability

4. Category Mapping:

  • The code maps the numerical prediction to one of four categories: World, Sports, Business, or Sci/Tech
  • Finally, it prints the predicted category for the input text

This code represents the practical application of the BERT model, allowing it to categorize any new news article into one of these predefined categories.