How to use SVM Classifier in R?

This recipe helps you use SVM Classifier in R

Recipe Objective

Support Vector Machines is a supervised learning algorithm which can work for both classification and regression problems. The main objective of the SVM is to find the optimum hyperplane (i.e. 2D line and 3D plane) that maximises the margin (i.e. twice the distance between the closest data point and hyperplane) between two classes.

It applies a penalty for misclassification and transforms the data to higher dimension if it is non-linearly separable. ​

Major advantages of using SVM are that: ​

  1. it works well with large number of predictors.
  2. Works well in-case of non-linear separable data.
  3. Works well in-case of image classification and does not suffer multicollinearity problem.

This recipe demonstrates the modelling of a SVM Classifier for Binary classification, we use a famous dataset by National institute of Diabetes and Digestive and Kidney Diseases. ​

List of Classification Algorithms in Machine Learning

STEP 1: Importing Necessary Libraries

library(caret) library(tidyverse) # for data manipulation

STEP 2: Read a csv file and explore the data

Data Description: This datasets consist of several medical predictor variables (also known as the independent variables) and one target variable (Outcome).

Independent Variables: ​

  1. Pregnancies
  2. Glucose
  3. BloodPressure
  4. SkinThickness
  5. Insulin
  6. BMI
  7. DiabetesPedigreeFunction
  8. Age

Dependent Variable: ​

Outcome ( 0 = 'does not have diabetes', 1 = 'Has diabetes') ​

data <- read.csv("R_354_diabetes.csv") glimpse(data)

Rows: 768
Columns: 9
$ Pregnancies               6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10, 10, ...
$ Glucose                   148, 85, 183, 89, 137, 116, 78, 115, 197, ...
$ BloodPressure             72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92,...
$ SkinThickness             35, 29, 0, 23, 35, 0, 32, 0, 45, 0, 0, 0, ...
$ Insulin                   0, 0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, ...
$ BMI                       33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31.0, ...
$ DiabetesPedigreeFunction  0.627, 0.351, 0.672, 0.167, 2.288, 0.201, ...
$ Age                       50, 31, 32, 21, 33, 30, 26, 29, 53, 54, 30...
$ Outcome                   1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, ...

summary(data) # returns the statistical summary of the data columns

Pregnancies        Glucose      BloodPressure    SkinThickness  
 Min.   : 0.000   Min.   :  0.0   Min.   :  0.00   Min.   : 0.00  
 1st Qu.: 1.000   1st Qu.: 99.0   1st Qu.: 62.00   1st Qu.: 0.00  
 Median : 3.000   Median :117.0   Median : 72.00   Median :23.00  
 Mean   : 3.845   Mean   :120.9   Mean   : 69.11   Mean   :20.54  
 3rd Qu.: 6.000   3rd Qu.:140.2   3rd Qu.: 80.00   3rd Qu.:32.00  
 Max.   :17.000   Max.   :199.0   Max.   :122.00   Max.   :99.00  
    Insulin           BMI        DiabetesPedigreeFunction      Age       
 Min.   :  0.0   Min.   : 0.00   Min.   :0.0780           Min.   :21.00  
 1st Qu.:  0.0   1st Qu.:27.30   1st Qu.:0.2437           1st Qu.:24.00  
 Median : 30.5   Median :32.00   Median :0.3725           Median :29.00  
 Mean   : 79.8   Mean   :31.99   Mean   :0.4719           Mean   :33.24  
 3rd Qu.:127.2   3rd Qu.:36.60   3rd Qu.:0.6262           3rd Qu.:41.00  
 Max.   :846.0   Max.   :67.10   Max.   :2.4200           Max.   :81.00  
    Outcome     
 Min.   :0.000  
 1st Qu.:0.000  
 Median :0.000  
 Mean   :0.349  
 3rd Qu.:1.000  
 Max.   :1.000  

dim(data)

768 9

# Converting the dependent variable into factor levels data$Outcome = as.factor(data$Outcome)

STEP 3: Train Test Split

# createDataPartition() function from the caret package to split the original dataset into a training and testing set and split data into training (80%) and testing set (20%) parts = createDataPartition(data$Cost, p = .8, list = F) train = data[parts, ] test = data[-parts, ]

STEP 4: Building SVM classifier model

We will use caret package to perform SVM classification. First, we will use the trainControl() function to define the method of cross validation to be carried out and search type i.e. "grid". Then train the model using train() function.

Syntax: train(formula, data = , method = , trControl = , tuneGrid = )

where:

  1. formula = y~x1+x2+x3+..., where y is the independent variable and x1,x2,x3 are the dependent variables
  2. data = dataframe
  3. method = Type of the model to be built ("svmLinear" for SVM)
  4. trControl = Takes the control parameters. We will use trainControl function out here where we will specify the Cross validation technique.
  5. tuneGrid = takes the tuning parameters and applies grid search CV on them

# specifying the CV technique which will be passed into the train() function later and number parameter is the "k" in K-fold cross validation train_control = trainControl(method = "cv", number = 5) set.seed(50) # training a Regression model while tuning parameters (Method = "rpart") model = train(Outcome~., data = train, method = "svmLinear", trControl = train_control) # summarising the results print(model)

Support Vector Machines with Linear Kernel 

615 samples
  8 predictor
  2 classes: '0', '1' 

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 492, 492, 492, 492, 492 
Resampling results:

  Accuracy   Kappa    
  0.7642276  0.4535223

Tuning parameter 'C' was held constant at a value of 1

STEP 5: Make predictions on the final SVM classifier model

We use our final SVM classifier model to make predictions on the testing data (unseen data) and predict the 'Outcome' value and generate performance measures.

#use model to make predictions on test data pred_y = predict(model, test) # confusion Matrix confusionMatrix(data = pred_y, test$Outcome)

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 92 25
         1  8 28
                                          
               Accuracy : 0.7843          
                 95% CI : (0.7106, 0.8466)
    No Information Rate : 0.6536          
    P-Value [Acc > NIR] : 0.0003018       
                                          
                  Kappa : 0.4848          
                                          
 Mcnemar's Test P-Value : 0.0053488       
                                          
            Sensitivity : 0.9200          
            Specificity : 0.5283          
         Pos Pred Value : 0.7863          
         Neg Pred Value : 0.7778          
             Prevalence : 0.6536          
         Detection Rate : 0.6013          
   Detection Prevalence : 0.7647          
      Balanced Accuracy : 0.7242          
                                          
       'Positive' Class : 0           

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Learn How to Build a Logistic Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Build an Image Segmentation Model using Amazon SageMaker
In this Machine Learning Project, you will learn to implement the UNet Architecture and build an Image Segmentation Model using Amazon SageMaker

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

OSZAR »