- Turing Talks
- Posts
- Issue #18: How to Perform Named Entity Recognition Using Transformers
Issue #18: How to Perform Named Entity Recognition Using Transformers
Discover Who and What Matters: Simplifying Named Entity Recognition with Transformers for Smarter Data Analysis
Named Entity Recognition (NER) is a fascinating part of natural language processing (NLP). It’s like teaching a program to identify and classify the names of things in the text.
These things can be names of people, companies, locations, dates, and more.
Today, I’ll guide you through performing NER using Hugging Face Transformers. This tool is like a magic wand for NLP tasks, and you’re about to learn how to wield it.
What is Named Entity Recognition?
Imagine you’re reading a book, and you highlight every name, place, and date. That’s essentially what NER does.
It’s a way for computers to understand and categorize key parts of a text.
This helps in summarizing information, answering questions, and even organizing data. This technology is widely used in various applications.
For example, it helps search engines identify and prioritize content relevant to users’ queries.
In customer service, NER can automatically sort feedback by topics, making it easier to address customer concerns.
In media and research, it streamlines the process of sifting through large volumes of text to find specific information.
Additionally, voice-activated assistants use NER to better understand and respond to user commands.
Overall, Named Entity Recognition enhances the efficiency of information processing and retrieval across many fields.
Why Hugging Face Transformers?
Hugging Face is a company that’s all about making NLP easy and accessible.
Their Transformers library is packed with pre-trained models. These models are like well-taught students ready to apply their knowledge.
With Hugging Face, you can perform NER with just a few lines of code. It’s efficient, powerful, and user-friendly.
Huggingface also has a pipeline feature which automates the complex process of input processing, model prediction, and output generation into a simple, user-friendly interface. It’s like having an AI assistant at your fingertips!
So, let’s write some code. You can find the finished code here.
Getting Started
First things first, you need to set up your environment. I would recommend using a Google Collab notebook.
First, let's install the Transformers library. Don't use “!” if you are installing it on your local machine.
!pip install transformers
To perform Named Entity Recognition (NER) using Hugging Face’s Transformers library, we’ll utilize the pipeline
function.
Now let’s import the Pipeline function.
from transformers import pipeline
Next, we are going to set up the NER pipeline.
ner_pipeline = pipeline("ner",grouped_entites=True)
This line of code will automatically download and load a default pre-trained NER model and tokenizer. The parameter grouped_entities=True
tells the pipeline to group the individual tokens of entities that have been split across multiple tokens by the tokenizer.
In NER tasks, especially with models like BERT that use WordPiece tokenization, a single entity like a person's name or a location might be split into several tokens.
By default, the NER pipeline returns each of these tokens as separate entities, including their type and confidence score. This can make it challenging to understand which tokens belong to the same entity without additional processing.
Now we can start analyzing text for Named Entities. Let’s pass a string of text to the NER pipeline we created. I’ll use the following paragraph as input and let’s see what our model discovers.
In March 2021, Dr. Sarah Johnson and her team at GenTech Innovations embarked on a groundbreaking project focused on renewable energy solutions. The project, headquartered in San Francisco, aims to collaborate with various international partners, including EcoPower Europe in Paris and SolarTech Asia in Singapore. The announcement came during a global conference on climate change held in Tokyo, Japan, which attracted experts and activists from around the world, including notable speaker Elon Musk. The initiative has received significant funding from the Global Green Grant Fund, with a commitment to reduce carbon emissions by 40% over the next decade. This ambitious project highlights the growing importance of sustainable development and international cooperation in tackling environmental challenges.
text = “In March 2021, Dr. Sarah Johnson and her team at GenTech Innovations embarked on a groundbreaking project focused on renewable energy solutions. The project, headquartered in San Francisco, aims to collaborate with various international partners, including EcoPower Europe in Paris and SolarTech Asia in Singapore. The announcement came during a global conference on climate change held in Tokyo, Japan, which attracted experts and activists from around the world, including notable speaker Elon Musk. The initiative has received significant funding from the Global Green Grant Fund, with a commitment to reduce carbon emissions by 40% over the next decade. This ambitious project highlights the growing importance of sustainable development and international cooperation in tackling environmental challenges.”
entities = ner_pipeline(text)
The pipeline will return a list of dictionaries, each representing an entity found in the text. Each dictionary includes the entity, its type, and confidence scores. Let’s look at the result.
[{'entity_group': 'PER',
'score': 0.9996027,
'word': 'Sarah Johnson',
'start': 19,
'end': 32},
{'entity_group': 'ORG',
'score': 0.9969076,
'word': 'GenTech Innovations',
'start': 49,
'end': 68},
{'entity_group': 'LOC',
'score': 0.9984195,
'word': 'San Francisco',
'start': 175,
'end': 188},
{'entity_group': 'ORG',
'score': 0.9864332,
'word': 'EcoPower Europe',
'start': 257,
'end': 272},
{'entity_group': 'LOC',
'score': 0.9993825,
'word': 'Paris',
'start': 276,
'end': 281},
{'entity_group': 'ORG',
'score': 0.99762076,
'word': 'SolarTech Asia',
'start': 286,
'end': 300},
{'entity_group': 'LOC',
'score': 0.99973804,
'word': 'Singapore',
'start': 304,
'end': 313},
{'entity_group': 'LOC',
'score': 0.99940014,
'word': 'Tokyo',
'start': 390,
'end': 395},
{'entity_group': 'LOC',
'score': 0.99975413,
'word': 'Japan',
'start': 397,
'end': 402},
{'entity_group': 'PER',
'score': 0.99678767,
'word': 'Elon Musk',
'start': 491,
'end': 500},
{'entity_group': 'ORG',
'score': 0.9429054,
'word': 'Global Green Grant Fund',
'start': 559,
'end': 582}]
Look at the result! We have found so many entities from the input text.
We have built a NER application using a state-of-the-art model from the Hugging Face Transformers library. That’s how easy it is to work with Huggingface Transformers.
Conclusion
What we covered is just the start. Hugging Face Transformers offers advanced features like fine-tuning models on your data, using multi-lingual models, and much more.
The best way to learn is by doing. Try running the code snippets, experiment with different texts, and explore other models. The more you practice, the better you’ll understand how NER works and how to use Hugging Face Transformers to your advantage.
Hope you enjoyed this article. If you have any questions, let me know in the comments. See you soon with a new topic.
Reply