NLP with Transformers: Fundamentals and Core ApplicationsChapter 107
7. Step 4: Evaluating the Model
Section 7 of 9-~ 12 min read-Synced from Cuantum content
Once the model is trained, evaluate its performance on the test set using metrics like accuracy and F1-score.
from sklearn.metrics import classification_report # Predict on the evaluation setpredictions = trainer.predict(eval_dataset) # Convert predictions to labelspredicted_labels = predictions.predictions.argmax(-1) # Print classification reportprint(classification_report(eval_dataset['label'], predicted_labels))Code breakdown:
- First, we import the classification_report from scikit-learn's metrics module, which will help us generate a detailed performance analysis.
- The code performs these key steps:
- Uses trainer.predict() to generate predictions for the evaluation dataset
- Converts the raw predictions into label indices using argmax(-1), which selects the category with the highest probability score
- Generates a comprehensive classification report by comparing the predicted labels against the actual labels in eval_dataset
The classification_report will provide important metrics including:
- Precision: The accuracy of positive predictions
- Recall: The proportion of actual positives correctly identified
- F1-score: The harmonic mean of precision and recall
- Support: The number of samples for each category
This evaluation step is crucial for understanding how well your model performs across different news categories, which in this case includes World, Sports, Business, and Sci/Tech.