Skip to main content

Machine Learning - SVM kernel functions and kernlab package

 

Machine Learning - SVM kernel functions and kernlab package

We have seen, how SVM model is built in R using e1071 package and its function “svm()”. In this lab, we will see how to define linear and RBF (or Gaussian) kernel functions

The first part (defining kernel functions) of this lab, which is to give you some knowledge about these kernel functions, is optional. Just like previous machine learning algorithms we have built in the previous sections, we can also build a fully functional machine learning SVM algorithm by defining an optimizer, but that requires a lot of coding and is not fit for a basic course like this.

In this lab, we will also go through another very useful package “kernlab” for building a support vector machine model using a dataset, promotergene that comes along with the package and its function ksvm(). We will also generate a simulated data to build an svm model and then see how its plot looks like.

Linear Kernel

This kernel is mainly used for data set with classes which are linearly separable

linear.Kernel <- function(x1, x2){
  
  # It is a similarity function
  # Calculates the similarity between x1 and x2
  # Returns the value
  
  # x1 and x2 must be column vectors
  x1 = as.matrix(x1)
  x2 = as.matrix(x2)
  
  # Computing the similarity using dot product
  lin.sim = t(x1)%*%x2  # dot product
  lin.sim
}

Vectorized version of Linear Kernel

linear.Kernel.vec <- function(X){
  
  # Computes the kernel on every pair of examples
  
  # X must be matrix of different variables
  
  # Computing the similarity on every pair in X
  K.lin = (X)%*%t(X)
  
  K.lin
}

Radial-based kernel function (RBF) or Gaussian Kernel

This kernel is used for data set with classes which are separable only using non-linear boundaries

RBF.Kernel <- function(x1, x2, sigma){
  
  # It is a similarity function
  # Calculates the similarity between x1 and x2
  # Returns the value
  
  # sigma = is tuning or bandwidth parameter to control the non-linearity in the model
  # Lower values of sigma make the decision boundaries more non-linear
  # it also determines how fast the similarity metric reaches 0, as data points go further apart
  
  # Ensure that x1 and x2 are column vectors
  
  x1 <- as.matrix(x1)
  x2 <- as.matrix(x2)
  
  rbf.sim <- 0
  
  rbf.sim <- exp(-(t(x1-x2)%*%(x1-x2))/(2*sigma^2)) #We use L2 Norm
  
  rbf.sim
  
}

Vectorized Radial-based kernel function (RBF) or Gaussian Kernel

RBF.Kernel.vec <- function(X, sigma){
  
    #Vectorized RBF Kernel
    # This is equivalent to computing the kernel on every pair of examples
  
    X2 <- matrix(rowSums(X^2))
    K.rbf <- sweep(sweep(-2*X%*%t(X),2,t(X2),FUN="+"),1,X2,FUN="+")
    K.rbf <- matrix(rep((RBF.Kernel(1, 0,sigma)),length(K.rbf)),nrow(K.rbf))^K.rbf
  
    K.rbf
}

Implementing Linear Kernel

Let’s see how our Linear kernel works on some data sets

x1 = c(5, 3, 2, 9); x2 = c(1, 9, -6, 2);
lin.sim = linear.Kernel(x1, x2)

cat('Linear Kernel between 4-dimensional 2 data points x1 = c(5, 3, 2, 9), x2 = c(1, 9, -6, 2):','\n', lin.sim) 
## Linear Kernel between 4-dimensional 2 data points x1 = c(5, 3, 2, 9), x2 = c(1, 9, -6, 2): 
##  38

Implementing RBF Kernel

Let’s see how our RBF kernel works on some data sets

x1 = c(5, 3, 2, 9); x2 = c(1, 9, -6, 2); sigma = 2;
rbf.sim = RBF.Kernel(x1, x2, sigma)

cat('RBF Kernel between 4-dimensional 2 data points x1 = c(5, 3, 2, 9), x2 = c(1, 9, -6, 2), with sigma = 2 :','\n', rbf.sim) 
## RBF Kernel between 4-dimensional 2 data points x1 = c(5, 3, 2, 9), x2 = c(1, 9, -6, 2), with sigma = 2 : 
##  1.103256e-09

Implementing Vectorised Linear Kernel

Let’s see how our Vectorised Linear kernel works on some data sets

x1 = c(5, 3, 2, 9); x2 = c(1, 9, -6, 2);

X = rbind(x1, x2)
k.lin = linear.Kernel.vec(X)

cat('Vectorised Linear Kernel 4-dimensional matrix X:','\n'); k.lin
## Vectorised Linear Kernel 4-dimensional matrix X:
##     x1  x2
## x1 119  38
## x2  38 122

In vectorized version of linear kernel, we get a kernel matrix of dimension nrow(X)-by-nrow(X), due to the pairwise calculation. We will get the same result when we put the non-vectorised version in two for loops, as we have to calculate similarity for each pair. Each pair will be repeated twice due to two loops and there will be an entry for each record calculating similarity with itself. For example, x1 and x2 are two records and therefore, either vectorised or two for loops version will calculate similarities of pairs x1-x1, x1-x2,x2-x1 and x2-x2.Diagonal values are similarities of a record with itself (e.g. x1-x1, x2-x2) and that’s why they have large values.

Implementing Vectorised RBF Kernel

Let’s see how our Vectorised RBF kernel works on some data sets

x1 = c(5, 3, 2, 9); x2 = c(1, 9, -6, 2); sigma = 2;

X = rbind(x1, x2)
k.rbf = RBF.Kernel.vec(X, sigma)

cat('Vectorized RBF Kernel 4-dimensional matrix X, with sigma = 2 :','\n'); k.rbf 
## Vectorized RBF Kernel 4-dimensional matrix X, with sigma = 2 :
##              x1           x2
## x1 1.000000e+00 1.103256e-09
## x2 1.103256e-09 1.000000e+00

In vectorized version of RBF kernel, we get a kernel matrix of dimension nrow(X)-by-nrow(X), due to the pairwise calculation. We will get the same result when we put the non-vectorised version in two for loops, as we have to calculate similarity for each pair. Each pair will be repeated twice due to two loops and there will be an entry for each record calculating similarity with itself. For example, x1 and x2 are two records and therefore, either vectorised or two for loops version will calculate similarities of pairs x1-x1, x1-x2,x2-x1 and x2-x2. Diagonal values are similarities of a record with itself (e.g. x1-x1, x2-x2) and that’s why they have large values. In rbf kernel, the maximum value cannot exceed 1.

Exercise: Try writing functions with two for loops using non-vectorized rbf and linear kernels and verify the results against vectorized kernels. You can directly call the non-vectorized function within your function inside second for loop. You can use the same example of matrix X (with two rows x1 and x2) that we have used above.

Verify our functions with Kernlab package

library(kernlab)

k.lin = vanilladot() #Creating linear kernel function
k.lin(x1, x2)
##      [,1]
## [1,]   38
k.rbf = rbfdot(sigma=1/8) #Creating rfb kernel function, 
                          #kernlab equation is exp(-σ (||x - x'||)^2), while our equation is
                          #exp(- (||x - x'||)^2/2σ^2), hence kernlab σ = 1/2*(σ)^2 (our sigma)
k.rbf(x1, x2)
##              [,1]
## [1,] 1.103256e-09

As we can see that the results of our functions and that of kernlab functions are same.

Exercise: Try different set of values/vectors/matrices and calcluate rbf and linear kernel using the functions we have defined and verify against kernlab functions.

SVM using kernlab package

We will use the package kernlab to build an SVM model on the dataset promotergene (part of the package). You need to install the package.

dataset promotergene: A data frame with 106 observations and 58 variables. The first variable Class is a factor with levels + for a promoter gene and - for a non-promoter gene. The remaining 57 variables V2 to V58 are factors describing the sequence. The DNA bases are coded as follows: a adenine c cytosine g guanine t thymine.

data(promotergene)
dim(promotergene)
## [1] 106  58
promotergene[1:4,1:3]
##   Class V2 V3
## 1     +  g  c
## 2     +  a  t
## 3     +  c  c
## 4     +  t  c

106 records with 58 features.

All the features are factor variables.

Train and Test dataset

m = nrow(promotergene)
set.seed(1); train.idx = sample(1:m, 0.8*m, replace=F)

train = promotergene[train.idx,]
test = promotergene[-train.idx,]

dim(train)
## [1] 84 58
dim(test)
## [1] 22 58

SVM model

We will use the function ksvm from the package to build the SVM model.

svm.model = ksvm(Class~.,data=train,kernel="rbfdot",kpar="automatic",C=60,cross=3,prob.model=TRUE)
svm.model
## Support Vector Machine object of class "ksvm" 
## 
## SV type: C-svc  (classification) 
##  parameter : cost C = 60 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.0160353535353535 
## 
## Number of Support Vectors : 77 
## 
## Objective Function Value : -43.9461 
## Training error : 0 
## Cross validation error : 0.154762 
## Probability model included.

So, the model has used 77 support vectors with sigm= 0.016, and its train error = 0

Performance of test data

pred.test = predict(svm.model, newdata=test)

table(test$Class, pred.test)
##    pred.test
##      +  -
##   + 11  1
##   -  0 10
cat('\n Accuracy on test data set:\n',mean((pred.test == test$Class))*100, '\n')
## 
##  Accuracy on test data set:
##  95.45455

Accuracy = 95.45% is quite good; hence model has performed well.

Since, it is a multi-dimensional data set with all variables as factors, plotting the model may not be possible.

However, if you wish to see how this package could be used to plot a model, we can generate some simulated data.

Plotting SVM model with kernlab package

Let’s generate a simulated data and build an SVM model to plot it.

set.seed(1)
x <- rbind(matrix(rnorm(120),,2),matrix(rnorm(120,mean=3),,2))
y <- matrix(c(rep(1,60),rep(-1,60)))
svm.model2 <- ksvm(x,y,type="C-svc")


svm.model2
## Support Vector Machine object of class "ksvm" 
## 
## SV type: C-svc  (classification) 
##  parameter : cost C = 1 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  1.38298912592912 
## 
## Number of Support Vectors : 31 
## 
## Objective Function Value : -9.4017 
## Training error : 0
plot(svm.model2,data=x)

See, how beautifully, the model is plotted with different shades of two colors (representing two classes -1 and +1). All the solid data points (triangles as well as circles) are support vectors.

Click the links below for more


Comments

Popular posts from this blog

Metaverse needs better technology, scalable infra, strong governance

Many minds have been intrigued by the idea of metaverse, and its effect is such that the social media giant like Facebook has been rebranded as Meta. Yet, there is a big question mark on the future of this technology. The enablers of metaverse such as augmented reality, mixed reality and virtual reality operating on computers, smartphones and other devices have failed to give the complete real-world like immersive experience to end users. There is a clear lack of standard virtual environment and technical specifications for implementing metaverse  –  a bottleneck in using technologies from different proprietors. Due to the business privacy and transparency concerns, interoperability of services from various providers has become a big challenge. Although, the efforts to standardize virtual reality, such as Universal Scene Description, glTF and OpenXR may help in a long run, but a lot more needs to be put in.  The technologies and devices, such as wireless he...

What is ChatGPT?

Introduction ChatGPT is a language model developed by OpenAI based on the GPT-3.5 architecture. It is designed to perform various natural language processing tasks such as language translation, text summarization, question-answering, and chatbot interactions. In this blog, we will discuss ChatGPT, its architecture, applications, and benefits. Architecture ChatGPT is based on the GPT-3.5 architecture, which is an extension of the GPT-3 architecture. The model has 175 billion parameters, making it one of the largest language models available. The architecture consists of 96 transformer blocks with a hidden size of 12,288 and 10 attention heads. The model is trained using a combination of unsupervised and supervised learning techniques. Applications ChatGPT has a wide range of applications in various fields such as healthcare, finance, customer service, and education. Some of the applications of ChatGPT are as follows: Language translation: ChatGPT can translate text from one language to ...

Exploratory Data Analysis

  Lab_D_2_RM Asmi Ariv 2022-10-14 Exploratory Data Analysis In this lab, we will go through various steps to explore a dataset using descriptive statistics, summary of data, different graphs, etc. Factor Variables (try the following in R): data = read.csv( "patient.csv" );data #Reading patient data ## Patient Gender Age Group ## 1 Dick M 20 2 ## 2 Anna F 25 1 ## 3 Sam M 30 3 ## 4 Jennie F 28 2 ## 5 Joss M 29 3 ## 6 Don M 21 2 ## 7 Annie F 26 1 ## 8 John M 32 3 ## 9 Rose F 27 2 ## 10 Jack M 31 3 data$Gender #It is a string/character variable ## [1] "M" "F" "M" "F" "M" "M" "F" "M" "F" "M" data$Gender = factor(data$Gender,levels=c( "M" , "F" ), ordered= TRUE ) #...