How to handle dummy variables in R?

This recipe helps you handle dummy variables in R

Recipe Objective

In Data Science, whenever we create machine learning models using different algorithms, we want all our variables to be numeric for the algorithm to process it. If the data we have is non-numeric then we need to process or handle the data before creating any model. ​

In this recipe, we will learn how to handle string categorical variable by converting them into a dummmy variable.

Categorical variable is a type of variable which has distinct string values or categories to which different observations are assigned to. They don't hold any mathematical significance in creation of a model. Hence, we need to convert them into dummy variable which is similar to OneHotEncoding technique in Python. It creates (n-1) columns for n-unique categories/values in a categorical variable and assigns 0 and 1 to it. "1" indicating that the category is being considered.

Step 1: Loading the required library and dataset

We require fastDummies and knitr package to do so ​

# installing required package install.packages(c("fastDummies","knitr")) library(fastDummies) library(knitr) # Data manipulation package library(tidyverse) # reading a dataset customer_seg = read.csv('R_223_Mall_Customers.csv') glimpse(customer_seg)

Observations: 200
Variables: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

Step 2: Creating dummy variable

We create dummy variables for "Gender" variable using dummy_cols() function of fastDummies package. ​

Syntax: fastDummies::dummy_cols(x, select_columns = ) ​

where: ​

  1. x = dataframe
  2. select_columns = Column (Categorical variable) that you wanna create dummy variables of.

# creating dummy variables df_dummies = fastDummies::dummy_cols(customer_seg, select_columns = "Gender") # dropping the original column along with Gender_female column to get (n-1) coluns similar to OneHotEncoding. new_customer_seg = df_dummies[c(-2,-6)] glimpse(new_customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
$ Gender_Male             1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1,...

Note: In the dummy variable (Gender_male) created: 1 = Male and 0 = Female ​

query_1 = mutate(STUDENT, Total_marks = Science_Marks+Math_Marks) glimpse(query_1)

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Build PowerBI Dashboard for Water Quality Sensor Data Analysis
In this PowerBI Project, you will learn to build a PowerBI Dashboard to analyze and visualize water quality sensor data from various European countries.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

Predictive Analytics Project for Working Capital Optimization
In this Predictive Analytics Project, you will build a model to accurately forecast the timing of customer and supplier payments for optimizing working capital.

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Multi-Class Text Classification with Deep Learning using BERT
In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

Data Analysis of Working Capital Management using Tableau
In this Data Analysis Project using Tableau, you will focus on optimizing working capital by analyzing receivables and payables data using Tableau and build actionable dashboards.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

OSZAR »