How to perform fuzzy logic string matching in nlp

This recipe helps you perform fuzzy logic string matching in nlp

Recipe Objective

How to perform fuzzy logic string matching ?

fuzzy logic is the simplest method in case of string matching or we can say comparing the string. The library used in this is called fuzzywuzzy library where we can have a score out of 100 which will denote the two strings are equal by giving similarity index.It is process of finding strings that matches given pattern.Levenshtein distance is used in it for calculation of the difference between the sequences.

Lets understand with practical implementation

Step 1 - Import the necessary libraries

from fuzzywuzzy import fuzz from fuzzywuzzy import process

Step 2 - Lets try Simple ratio usage

print("Lets see the ratio for string matching:",fuzz.ratio('Fuzzy for String Matching', 'Fuzzy String Matching'), '\n') print("Lets see the ratio for string matching:",fuzz.ratio('This is an NLP Session', 'This is an NLP Session'),'\n') print("Lets see the ratio for string matching:",fuzz.ratio('your learning fuzzywuzzy', 'Your Learning FuzzyWuzzy'))

Lets see the ration for string matching: 91 
Lets see the ration for string matching: 100 
Lets see the ration for string matching: 83

From the above we can say that,

the first string match score 91/100 because one of the word is missing in the second string i.e for,

the second string match score is 100/100 because both the strings are same or matching exactly with each other.

the third string match score is 83/100 because in first string all the characters are in lower case but in the second string some of the characters are in upper case and some are in lower case.

Step 3 - Now we will try with partial ratio

print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Jon is eating', 'Jon is eating !'), '\n') print("Lets see the partial ratio for string matching:",fuzz.partial_ratio('Mark is walking on streets', 'Mark walking streets'),'\n')

Lets see the partial ratio for string matching: 100 
Lets see the partial ratio for string matching: 80 

From the above we understand about partial_ration using FuzzyWuzzy library,

The first Sentence partial_ration score is 100/100 because as there is a Exclamation mark in the second string, but still partially words are same so score comes 100.

The second sentence score is 80/100, score is less because there is a extra token present in the first string.

Step 4 - Token set ratio and token sort ratio

print("Lets see the token sort ratio for string matching:",fuzz.token_sort_ratio("for every one", "every one for"), '\n') print("Lets see the token set ratio for string matching:",fuzz.token_set_ratio("This is done", "This is done done"))

Lets see the token sort ratio for string matching: 100 
Lets see the token set ratio for string matching: 100

Ratio comes 100/100 in both cases because,

token sort ratio This gives 100 as every word is same, irrespective of the position. Position not matters when words are same.

token set ratio it considers duplicate words as a single word.

Step 5 - WRatio with Example

print("Lets see the Wratio for string matching:",fuzz.WRatio("This is good", "This is good"), '\n') print("Lets see the Wratio for string matching:",fuzz.WRatio("Sometimes good is bad!!!", "Sometimes good is bad"))

Lets see the Wratio for string matching: 100 
Lets see the Wratio for string matching: 100

sometimes its better to use WRatio instead of simple ratio as WRatio handles lower and upper cases and some other parameters too.

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Autogen Project to Build an Intelligent AI Personal Assistant
Build a multi-agent AI personal assistant using Autogen that can handle tasks like managing calendars, emails, reminders, messaging, research, and weather updates, automating everyday workflows with LLMs and tool integrations. This is an upcoming project that is expected to be launched in June.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

Learn to Build an End-to-End Machine Learning Pipeline - Part 3
This machine learning project integrates model monitoring, CI/CD practices and Amazon Sagemaker pipelines into the logistics-oriented machine learning pipeline to streamline workflow orchestration for scalable and reliable deployment of ML models in logistics.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

PyTorch Project to Build a GAN Model on MNIST Dataset
In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

OSZAR »