What is Natural entitiy recognition in nlp

This recipe explains what is Natural entitiy recognition in nlp
Last Updated: 18 Jul 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

What is Natural entitiy recognition?

Natural entity recognition (NER) is an keyword extraction technique that uses Natural language processing to automatically identify named entities from a chunk of text or larger text and classify them according to the predetermined categories for e.g People, organization, email address, location, values etc. lets understand this with an example:

NLP Techniques to Learn for your Next NLP Project

Jon is from canada he works at Apple.

So in the above the highlighted words are from some categories which are as follows:

Name - Jon

location - canada

Organization - Apple.

Some of the practical applications of NER are:

Scanning news articles for the people, organizations and locations reported.

Quickly retrieving geographical locations talked about in Twitter posts.

In Human Resources it will speed up the hiring process by summarizing applicants’ CVs; improve internal workflows by categorizing employee complaints and questions

In Customer Supprt it will improve response times by categorizing user requests, complaints and questions and filtering by priority keywords. And Many more..

Step 1 - Import the necessary libraries

import nltk from nltk.tokenize import word_tokenize from nltk.tag import pos_tag

Step 2 - Take a sample text

My_text = '''Thomas Alva Edison (February 11, 1847 – October 18, 1931) was an American inventor and businessman who has been described as America's greatest inventor.[1][2][3] He developed many devices in fields such as electric power generation, mass communication, sound recording, and motion pictures.[4] These inventions, which include the phonograph, the motion picture camera, and early versions of the electric light bulb, have had a widespread imp

We have taken a sample paragraph of Thomas Elva Edison from wikipedia for our reference

Step 3 - Tokenize the sentence in words by using word_tokenizer

tokenized_text = nltk.word_tokenize(My_text) print(tokenized_text)

['Thomas', 'Alva', 'Edison', '(', 'February', '11', ',', '1847', '–', 'October', '18', ',', '1931', ')', 'was', 'an', 'American', 'inventor', 'and', 'businessman', 'who', 'has', 'been', 'described', 'as', 'America', "'s", 'greatest', 'inventor', '.', '[', '1', ']', '[', '2', ']', '[', '3', ']', 'He', 'developed', 'many', 'devices', 'in', 'fields', 'such', 'as', 'electric', 'power', 'generation', ',', 'mass', 'communication', ',', 'sound', 'recording', ',', 'and', 'motion', 'pictures', '.', '[', '4', ']', 'These', 'inventions', ',', 'which', 'include', 'the', 'phonograph', ',', 'the', 'motion', 'picture', 'camera', ',', 'and', 'early', 'versions', 'of', 'the', 'electric', 'light', 'bulb', ',', 'have', 'had', 'a', 'widespread', 'impact', 'on', 'the', 'modern', 'industrialized', 'world', '.', '[', '5', ']', 'He', 'was', 'one', 'of', 'the', 'first', 'inventors', 'to', 'a', 'pply', 'the', 'principles', 'of', 'organized', 'science', 'and', 'teamwork', 'to', 'the', 'process', 'of', 'invention', ',', 'working', 'with', 'many', 'researchers', 'and', 'employees', '.', 'He', 'established', 'the', 'first', 'industrial', 'research', 'laboratory', '.', '[', '6', ']']

From the above we can see that the sentence has been tokenized into words

Step 4 - Apply part-of-speech (POS) tagging to the tokenized text

tagged_text = nltk.pos_tag(tokenized_text) print(tagged_text)

[('Thomas', 'NNP'), ('Alva', 'NNP'), ('Edison', 'NNP'), ('(', '('), ('February', 'NNP'), ('11', 'CD'), (',', ','), ('1847', 'CD'), ('–', 'NNP'), ('October', 'NNP'), ('18', 'CD'), (',', ','), ('1931', 'CD'), (')', ')'), ('was', 'VBD'), ('an', 'DT'), ('American', 'JJ'), ('inventor', 'NN'), ('and', 'CC'), ('businessman', 'NN'), ('who', 'WP'), ('has', 'VBZ'), ('been', 'VBN'), ('described', 'VBN'), ('as', 'IN'), ('America', 'NNP'), ("'s", 'POS'), ('greatest', 'JJS'), ('inventor', 'NN'), ('.', '.'), ('[', 'CC'), ('1', 'CD'), (']', 'JJ'), ('[', '), ('2', 'CD'), (']', 'NNP'), ('[', 'VBD'), ('3', 'CD'), (']', 'NN'), ('He', 'PRP'), ('developed', 'VBD'), ('many', 'JJ'), ('devices', 'NNS'), ('in', 'IN'), ('fields', 'NNS'), ('such', 'JJ'), ('as', 'IN'), ('electric', 'JJ'), ('power', 'NN'), ('generation', 'NN'), (',', ','), ('mass', 'NN'), ('communication', 'NN'), (',', ','), ('sound', 'NN'), ('recording', 'NN'), (',', ','), ('and', 'CC'), ('motion', 'NN'), ('pictures', 'NNS'), ('.', '.'), ('[', '), ('4', 'CD'), (']', 'NNP'), ('These', 'DT'), ('inventions', 'NNS'), (',', ','), ('which', 'WDT'), ('include', 'VBP'), ('the', 'DT'), ('phonograph', 'NN'), (',', ','), ('the', 'DT'), ('motion', 'NN'), ('picture', 'NN'), ('camera', 'NN'), (',', ','), ('and', 'CC'), ('early', 'JJ'), ('versions', 'NNS'), ('of', 'IN'), ('the', 'DT'), ('electric', 'JJ'), ('light', 'NN'), ('bulb', 'NN'), (',', ','), ('have', 'VBP'), ('had', 'VBN'), ('a', 'DT'), ('widespread', 'JJ'), ('impact', 'NN'), ('on', 'IN'), ('the', 'DT'), ('modern', 'JJ'), ('industrialized', 'VBN'), ('world', 'NN'), ('.', '.'), ('[', 'CC'), ('5', 'CD'), (']', 'NN'), ('He', 'PRP'), ('was', 'VBD'), ('one', 'CD'), ('of', 'IN'), ('the', 'DT'), ('first', 'JJ'), ('inventors', 'NNS'), ('to', 'TO'), ('a', 'DT'), ('pply', 'NN'), ('the', 'DT'), ('principles', 'NNS'), ('of', 'IN'), ('organized', 'VBN'), ('science', 'NN'), ('and', 'CC'), ('teamwork', 'NN'), ('to', 'TO'), ('the', 'DT'), ('process', 'NN'), ('of', 'IN'), ('invention', 'NN'), (',', ','), ('working', 'VBG'), ('with', 'IN'), ('many', 'JJ'), ('researchers', 'NNS'), ('and', 'CC'), ('employees', 'NNS'), ('.', '.'), ('He', 'PRP'), ('established', 'VBD'), ('the', 'DT'), ('first', 'JJ'), ('industrial', 'JJ'), ('research', 'NN'), ('laboratory', 'NN'), ('.', '.'), ('[', 'CC'), ('6', 'CD'), (']', 'NN')]

Step 5 - Pass the tagged text to a entity chunk function

print(nltk.ne_chunk(tagged_text))

(S
  (PERSON Thomas/NNP)
  (ORGANIZATION Alva/NNP Edison/NNP)
  (/(
  February/NNP
  11/CD
  ,/,
  1847/CD
  –/NNP
  October/NNP
  18/CD
  ,/,
  1931/CD
  )/)
  was/VBD
  an/DT
  (GPE American/JJ)
  inventor/NN
  and/CC
  businessman/NN
  who/WP
  has/VBZ
  been/VBN
  described/VBN
  as/IN
  (GPE America/NNP)
  's/POS
  greatest/JJS
  inventor/NN
  ./.
  [/CC
  1/CD
  ]/JJ
  [/$
  2/CD
  ]/NNP
  [/VBD
  3/CD
  ]/NN
  He/PRP
  developed/VBD
  many/JJ
  devices/NNS
  in/IN
  fields/NNS
  such/JJ
  as/IN
  electric/JJ
  power/NN
  generation/NN
  ,/,
  mass/NN
  communication/NN
  ,/,
  sound/NN
  recording/NN
  ,/,
  and/CC
  motion/NN
  pictures/NNS
  ./.
  [/$
  4/CD
  ]/NNP
  These/DT
  inventions/NNS
  ,/,
  which/WDT
  include/VBP
  the/DT
  phonograph/NN
  ,/,
  the/DT
  motion/NN
  picture/NN
  camera/NN
  ,/,
  and/CC
  early/JJ
  versions/NNS
  of/IN
  the/DT
  electric/JJ
  light/NN
  bulb/NN
  ,/,
  have/VBP
  had/VBN
  a/DT
  widespread/JJ
  impact/NN
  on/IN
  the/DT
  modern/JJ
  industrialized/VBN
  world/NN
  ./.
  [/CC
  5/CD
  ]/NN
  He/PRP
  was/VBD
  one/CD
  of/IN
  the/DT
  first/JJ
  inventors/NNS
  to/TO
  a/DT
  pply/NN
  the/DT
  principles/NNS
  of/IN
  organized/VBN
  science/NN
  and/CC
  teamwork/NN
  to/TO
  the/DT
  process/NN
  of/IN
  invention/NN
  ,/,
  working/VBG
  with/IN
  many/JJ
  researchers/NNS
  and/CC
  employees/NNS
  ./.
  He/PRP
  established/VBD
  the/DT
  first/JJ
  industrial/JJ
  research/NN
  laboratory/NN
  ./.
  [/CC
  6/CD
  ]/NN)

So here we have passed the tagged text to a entity chunk function (named entity chunk) which will return the text as a tree

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Multilabel Classification Project for Predicting Shipment Modes

Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

View Project Details

Build a Similar Images Finder with Python, Keras, and Tensorflow

Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

View Project Details

Build Time Series Models for Gaussian Processes in Python

Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

View Project Details

Learn to Build a Siamese Neural Network for Image Similarity

In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

View Project Details

Expedia Hotel Recommendations Data Science Project

In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

View Project Details

Azure Text Analytics for Medical Search Engine Deployment

Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

View Project Details

MLOps AWS Project on Topic Modeling using Gunicorn Flask

In this project we will see the end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable machine learning models by using AWS

View Project Details

Build and Deploy an AI Resume Analyzer with OpenAI and Azure

In this AI Resume Analyzer project, you will learn to build and deploy AI resume analyzer that helps job seekers assess how effectively their resumes match job descriptions using OpenAI's language models and Azure's cloud infrastructure.

View Project Details

Data Analysis of Working Capital Management using Tableau

In this Data Analysis Project using Tableau, you will focus on optimizing working capital by analyzing receivables and payables data using Tableau and build actionable dashboards.

View Project Details

Learn Object Tracking (SOT, MOT) using OpenCV and Python

Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

View Project Details

What is Natural entitiy recognition in nlp

Recipe Objective

Step 1 - Import the necessary libraries

Step 2 - Take a sample text

Step 3 - Tokenize the sentence in words by using word_tokenizer

Step 4 - Apply part-of-speech (POS) tagging to the tokenized text

Step 5 - Pass the tagged text to a entity chunk function

What Users are saying..

Ameeruddin Mohammed

Relevant Projects

You might also like

Relevant Projects