NLP with Transformers: Advanced Techniques and Multimodal ApplicationsChapter 96

Step 5: Build the NER Pipeline

Section 7 of 9-~ 12 min read-Synced from Cuantum content

Create a pipeline that will handle three essential tasks in sequence:

  1. Process text input by breaking it down into tokens that the model can understand
  1. Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
  1. Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure

This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.

from transformers import pipeline # Load fine-tuned modelner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple") # Process text inputtext = "Barack Obama was born in Hawaii."entities = ner_pipeline(text) # Print recognized entitiesfor entity in entities:    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")

Here's a breakdown of what the code does:

1. Pipeline Setup

  • Imports the pipeline module from the transformers library
  • Creates a NER pipeline using the previously fine-tuned model stored in "./results"
  • Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens

2. Text Processing

  • Takes a sample text input ("Barack Obama was born in Hawaii.")
  • Processes the text through the NER pipeline to identify entities

3. Output Format

  • Loops through the detected entities
  • For each entity, prints three pieces of information:
  • The word/phrase identified as an entity
  • The entity type (e.g., PER for person, LOC for location)
  • A confidence score indicating how certain the model is about its prediction

This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.