Skip to main content

Machine Learning - Building Logistic Regression Algorithm from scratch

 

Machine Learning - Logistic Regression Algorithm

We have seen, how logistic regression model is built in R using the predefined functions such as, “glm()”. However, it is also important to know how the machine learning works at the back-end. What are the functions running inside the main function. We will walk through various stages of machine learning process in logistic regression. We will create our own machine learning functions step-by-step:

  1. Define Sigmoid Function
  2. Define Cost Function using regularization
  3. Define Gradient function using regularization
  4. Define one vs all optimizer function to train the model
  5. Define predict function to predict classes for each record
  6. Train the model using all the functions on training data
  7. Make predictions on train and test data

In the video lecture, the algorithm has been explained.

Sigmoid Function

Let’s start with Sigmoid function, to help calculate the probabilities of success

sigmoid <- function(x){
  x <- as.matrix(x)
  sig = matrix(rep(0, nrow(x)*ncol(x)),nrow(x))
  
  
  sig <- 1/ (1 + exp(-x))

sig
  
}

Cost Function

Let’s define cost function. The main purpose is to minimize the cost to get the optimum values of parameters. This function requires X, y, theta and regularization term lambda. Returns cost.

costFunction <- function(theta, X, y, lambda){
  
m = length(y) #number of training records

J = 0 #Initializing cost

temp <- theta
temp[1] <- 0  #We don't regularize intercept or bias term

J = (1/m)*(-t(y)%*%log(sigmoid(X%*%theta))-(t(1-y))%*%log(1-sigmoid(X%*%theta))) +
  (lambda/(2*m))*(t(temp))%*%(temp)

J

}

Gradient Function

Let’s define the gradient function. This function requires X, y, theta and regularization term lambda. Returns gradient values for corresponding theta values.

gradFunction <- function(theta, X, y,lambda){

  m = length(y) #number of training records
  
  grad = rep(0,length(theta))
  
  temp <- theta
  temp[1] <- 0 #We don't regularize intercept or bias term
  
  X = as.matrix(X) #Converting X into a matrix for matrix operations
  
  grad = (1/m)*(t(X)%*%(sigmoid(X%*%theta)-y)) +  (lambda/(m))*(temp)
  
  
  grad
  
}

One-vs-All Optimizer for multinomial

We use one-vs-all method for multi-class or multinomial logistic regression model. The idea is to use multiple logistic regression classifiers (each one will have a different class as a reference) and combine them to build a multi-class logistic regression classifier.

This function uses optimizer in R to run the iteration on cost function, gradient function, initial values the theta, X and y and calculate theta for each classifier.

The function takes X, y, classes (categories/classes in response variable) and regularization term lambda. It returns matrix of theta values for each class in a row

oneVAll_Optim <- function(X, y, classes, lambda){
  
  m <- nrow(X) #number of training records
  n <- ncol(X) #number of predictors
  
  n.Class = length(classes)
  
  matTheta <- matrix(rep(0, n.Class*( n + 1)), n.Class) #To store thetas for each class in each row
  
  X <- cbind(rep(1,m),X) #Adding intercept term
  X <- as.matrix(X)
  
  X = as.matrix(X) #Converting X into a matrix for matrix operations
  
  
  #Calculating thetas for each class using optim() in R and storing in matTheta matrix
  for(i in 1:n.Class) { 
  
  initTheta <- rep(0,n+1)
  
  class = classes[i]
  
  costh <- optim(par=initTheta, fn=costFunction, 
                 gr=gradFunction, method="BFGS", X=X,y=as.integer(y==class),lambda=lambda)
  
  theta <- costh$par
  
  matTheta[i,] = theta
  
  }
  
 matTheta 
  
  
}

Prediction function for multi-class one vs all

This function takes thetas for each class in a matrix, matTheta (returned by oneVAll_Optim) and X. Returns the predicted class for each record.

predict <- function(matTheta, X){
  
  m <- nrow(X)
  
  p = c(rep(0,nrow(X)))
  
  X <- cbind(rep(1,m),X) #Adding intercept or bias term
  
  x <- as.matrix(X)
  
  p <- apply(sigmoid(X%*%t(matTheta)),1,which.max)
  
  p
  
  
  
}

Building Multinomial Logistic Regression Model

We will use the data set “Khan” from the package “ISLR”

The data consists of a number of tissue samples corresponding to four distinct types of small round blue cell tumors. For each tissue sample, 2308 gene expression measurements are available.

The format is a list containing four components: xtrain, xtest, ytrain, and ytest. xtrain contains the 2308 gene expression values for 63 subjects and ytrain records the corresponding tumor type. Xtest and ytest contain the corresponding testing sample information for a further 20 subjects.

So, the data has 2308 features which is much larger than the number of records (63 in train and 20 in test). This usually happens in some industries such as biostatistics, genomics, etc.

All the features are numerical variables and response is a categorical variable with four levels coded as 1,2,3,4.

library(ISLR) #Required for reading .mat file

Load Train and Test Data

data <- Khan # data (type list) containing arrays Xtrain, Xtest, ytrain, ytest

names(data)
## [1] "xtrain" "xtest"  "ytrain" "ytest"
train.X <- as.matrix(data[[1]]) #Converting array into matrix for our algorithm 
test.X <- as.matrix(data[[2]]) #Converting array into matrix for our algorithm

train.y <- data[[3]]
test.y <- data[[4]]

dim(train.X); length(train.y)
## [1]   63 2308
## [1] 63
dim(test.X); length(test.y)
## [1]   20 2308
## [1] 20

classes in response variable

train.y = as.factor(train.y)
test.y = as.factor(test.y)

table(train.y); table(test.y)
## train.y
##  1  2  3  4 
##  8 23 12 20
## test.y
## 1 2 3 4 
## 3 6 6 5
classes = levels(train.y)
classes
## [1] "1" "2" "3" "4"

Training One-vs-All Logistic Regression to calculate thetas for each class

We will set the lambda = 0.1 and train the multinomial logistic regression model on train.X and train.y data

lambda = 0.1

matTheta = oneVAll_Optim(train.X, train.y, classes, lambda)

Prediction by One-Vs-All multinomial logistic regression model

Train set accuracy

Let’s calculate the accuracy of prediction on training data

pred.train = predict(matTheta, train.X)

table(train.y, pred.train)
##        pred.train
## train.y  1  2  3  4
##       1  8  0  0  0
##       2  0 23  0  0
##       3  0  0 12  0
##       4  0  0  0 20
cat('\nTrain Set Accuracy: ', mean((pred.train == train.y)) * 100, "%")
## 
## Train Set Accuracy:  100 %

Test set accuracy

Let’s calculate the accuracy of prediction on test data

pred.test = predict(matTheta, test.X)

table(test.y, pred.test)
##       pred.test
## test.y 1 2 3 4
##      1 3 0 0 0
##      2 0 6 0 0
##      3 0 0 6 0
##      4 0 0 0 5
cat('\nTest Set Accuracy: ', mean((pred.test == test.y)) * 100, "%")
## 
## Test Set Accuracy:  100 %

As we can see that the model has performed extremely well on train as well test data with 100% accuracy.

Exercise: Try different data sets with 2-classes as well as multi-classes


Click on the links below for more


Comments

Popular posts from this blog

Metaverse needs better technology, scalable infra, strong governance

Many minds have been intrigued by the idea of metaverse, and its effect is such that the social media giant like Facebook has been rebranded as Meta. Yet, there is a big question mark on the future of this technology. The enablers of metaverse such as augmented reality, mixed reality and virtual reality operating on computers, smartphones and other devices have failed to give the complete real-world like immersive experience to end users. There is a clear lack of standard virtual environment and technical specifications for implementing metaverse  –  a bottleneck in using technologies from different proprietors. Due to the business privacy and transparency concerns, interoperability of services from various providers has become a big challenge. Although, the efforts to standardize virtual reality, such as Universal Scene Description, glTF and OpenXR may help in a long run, but a lot more needs to be put in.  The technologies and devices, such as wireless he...

What is ChatGPT?

Introduction ChatGPT is a language model developed by OpenAI based on the GPT-3.5 architecture. It is designed to perform various natural language processing tasks such as language translation, text summarization, question-answering, and chatbot interactions. In this blog, we will discuss ChatGPT, its architecture, applications, and benefits. Architecture ChatGPT is based on the GPT-3.5 architecture, which is an extension of the GPT-3 architecture. The model has 175 billion parameters, making it one of the largest language models available. The architecture consists of 96 transformer blocks with a hidden size of 12,288 and 10 attention heads. The model is trained using a combination of unsupervised and supervised learning techniques. Applications ChatGPT has a wide range of applications in various fields such as healthcare, finance, customer service, and education. Some of the applications of ChatGPT are as follows: Language translation: ChatGPT can translate text from one language to ...

Exploratory Data Analysis

  Lab_D_2_RM Asmi Ariv 2022-10-14 Exploratory Data Analysis In this lab, we will go through various steps to explore a dataset using descriptive statistics, summary of data, different graphs, etc. Factor Variables (try the following in R): data = read.csv( "patient.csv" );data #Reading patient data ## Patient Gender Age Group ## 1 Dick M 20 2 ## 2 Anna F 25 1 ## 3 Sam M 30 3 ## 4 Jennie F 28 2 ## 5 Joss M 29 3 ## 6 Don M 21 2 ## 7 Annie F 26 1 ## 8 John M 32 3 ## 9 Rose F 27 2 ## 10 Jack M 31 3 data$Gender #It is a string/character variable ## [1] "M" "F" "M" "F" "M" "M" "F" "M" "F" "M" data$Gender = factor(data$Gender,levels=c( "M" , "F" ), ordered= TRUE ) #...