7. Step 4: Evaluating the Model

Section 7 of 9-~ 12 min read-Synced from Cuantum content

Once the model is trained, evaluate its performance on the test set using metrics like accuracy and F1-score.

from sklearn.metrics import classification_report # Predict on the evaluation setpredictions = trainer.predict(eval_dataset) # Convert predictions to labelspredicted_labels = predictions.predictions.argmax(-1) # Print classification reportprint(classification_report(eval_dataset['label'], predicted_labels))

Code breakdown:

First, we import the classification_report from scikit-learn's metrics module, which will help us generate a detailed performance analysis.

The code performs these key steps:

Uses trainer.predict() to generate predictions for the evaluation dataset

Converts the raw predictions into label indices using argmax(-1), which selects the category with the highest probability score

Generates a comprehensive classification report by comparing the predicted labels against the actual labels in eval_dataset

The classification_report will provide important metrics including:

Precision: The accuracy of positive predictions

Recall: The proportion of actual positives correctly identified

F1-score: The harmonic mean of precision and recall

Support: The number of samples for each category

This evaluation step is crucial for understanding how well your model performs across different news categories, which in this case includes World, Sports, Business, and Sci/Tech.