Lab_LRC_2_RM
Asmi Ariv
2022-09-30
Machine Learning - Logistic Regression Algorithm
We have seen, how logistic regression model is built in R using the predefined functions such as, “glm()”. However, it is also important to know how the machine learning works at the back-end. What are the functions running inside the main function. We will walk through various stages of machine learning process in logistic regression. We will create our own machine learning functions step-by-step:
- Define Sigmoid Function
- Define Cost Function using regularization
- Define Gradient function using regularization
- Define one vs all optimizer function to train the model
- Define predict function to predict classes for each record
- Train the model using all the functions on training data
- Make predictions on train and test data
In the video lecture, the algorithm has been explained.
Sigmoid Function
Let’s start with Sigmoid function, to help calculate the probabilities of success
sigmoid <- function(x){
x <- as.matrix(x)
sig = matrix(rep(0, nrow(x)*ncol(x)),nrow(x))
sig <- 1/ (1 + exp(-x))
sig
}Cost Function
Let’s define cost function. The main purpose is to minimize the cost to get the optimum values of parameters. This function requires X, y, theta and regularization term lambda. Returns cost.
costFunction <- function(theta, X, y, lambda){
m = length(y) #number of training records
J = 0 #Initializing cost
temp <- theta
temp[1] <- 0 #We don't regularize intercept or bias term
J = (1/m)*(-t(y)%*%log(sigmoid(X%*%theta))-(t(1-y))%*%log(1-sigmoid(X%*%theta))) +
(lambda/(2*m))*(t(temp))%*%(temp)
J
}Gradient Function
Let’s define the gradient function. This function requires X, y, theta and regularization term lambda. Returns gradient values for corresponding theta values.
gradFunction <- function(theta, X, y,lambda){
m = length(y) #number of training records
grad = rep(0,length(theta))
temp <- theta
temp[1] <- 0 #We don't regularize intercept or bias term
X = as.matrix(X) #Converting X into a matrix for matrix operations
grad = (1/m)*(t(X)%*%(sigmoid(X%*%theta)-y)) + (lambda/(m))*(temp)
grad
}One-vs-All Optimizer for multinomial
We use one-vs-all method for multi-class or multinomial logistic regression model. The idea is to use multiple logistic regression classifiers (each one will have a different class as a reference) and combine them to build a multi-class logistic regression classifier.
This function uses optimizer in R to run the iteration on cost function, gradient function, initial values the theta, X and y and calculate theta for each classifier.
The function takes X, y, classes (categories/classes in response variable) and regularization term lambda. It returns matrix of theta values for each class in a row
oneVAll_Optim <- function(X, y, classes, lambda){
m <- nrow(X) #number of training records
n <- ncol(X) #number of predictors
n.Class = length(classes)
matTheta <- matrix(rep(0, n.Class*( n + 1)), n.Class) #To store thetas for each class in each row
X <- cbind(rep(1,m),X) #Adding intercept term
X <- as.matrix(X)
X = as.matrix(X) #Converting X into a matrix for matrix operations
#Calculating thetas for each class using optim() in R and storing in matTheta matrix
for(i in 1:n.Class) {
initTheta <- rep(0,n+1)
class = classes[i]
costh <- optim(par=initTheta, fn=costFunction,
gr=gradFunction, method="BFGS", X=X,y=as.integer(y==class),lambda=lambda)
theta <- costh$par
matTheta[i,] = theta
}
matTheta
}Prediction function for multi-class one vs all
This function takes thetas for each class in a matrix, matTheta (returned by oneVAll_Optim) and X. Returns the predicted class for each record.
predict <- function(matTheta, X){
m <- nrow(X)
p = c(rep(0,nrow(X)))
X <- cbind(rep(1,m),X) #Adding intercept or bias term
x <- as.matrix(X)
p <- apply(sigmoid(X%*%t(matTheta)),1,which.max)
p
}Building Multinomial Logistic Regression Model
We will use the data set “Khan” from the package “ISLR”
The data consists of a number of tissue samples corresponding to four distinct types of small round blue cell tumors. For each tissue sample, 2308 gene expression measurements are available.
The format is a list containing four components: xtrain, xtest, ytrain, and ytest. xtrain contains the 2308 gene expression values for 63 subjects and ytrain records the corresponding tumor type. Xtest and ytest contain the corresponding testing sample information for a further 20 subjects.
So, the data has 2308 features which is much larger than the number of records (63 in train and 20 in test). This usually happens in some industries such as biostatistics, genomics, etc.
All the features are numerical variables and response is a categorical variable with four levels coded as 1,2,3,4.
library(ISLR) #Required for reading .mat fileLoad Train and Test Data
data <- Khan # data (type list) containing arrays Xtrain, Xtest, ytrain, ytest
names(data)## [1] "xtrain" "xtest" "ytrain" "ytest"train.X <- as.matrix(data[[1]]) #Converting array into matrix for our algorithm
test.X <- as.matrix(data[[2]]) #Converting array into matrix for our algorithm
train.y <- data[[3]]
test.y <- data[[4]]
dim(train.X); length(train.y)## [1] 63 2308## [1] 63dim(test.X); length(test.y)## [1] 20 2308## [1] 20classes in response variable
train.y = as.factor(train.y)
test.y = as.factor(test.y)
table(train.y); table(test.y)## train.y
## 1 2 3 4
## 8 23 12 20## test.y
## 1 2 3 4
## 3 6 6 5classes = levels(train.y)
classes## [1] "1" "2" "3" "4"Training One-vs-All Logistic Regression to calculate thetas for each class
We will set the lambda = 0.1 and train the multinomial logistic regression model on train.X and train.y data
lambda = 0.1
matTheta = oneVAll_Optim(train.X, train.y, classes, lambda)Prediction by One-Vs-All multinomial logistic regression model
Train set accuracy
Let’s calculate the accuracy of prediction on training data
pred.train = predict(matTheta, train.X)
table(train.y, pred.train)## pred.train
## train.y 1 2 3 4
## 1 8 0 0 0
## 2 0 23 0 0
## 3 0 0 12 0
## 4 0 0 0 20cat('\nTrain Set Accuracy: ', mean((pred.train == train.y)) * 100, "%")##
## Train Set Accuracy: 100 %Test set accuracy
Let’s calculate the accuracy of prediction on test data
pred.test = predict(matTheta, test.X)
table(test.y, pred.test)## pred.test
## test.y 1 2 3 4
## 1 3 0 0 0
## 2 0 6 0 0
## 3 0 0 6 0
## 4 0 0 0 5cat('\nTest Set Accuracy: ', mean((pred.test == test.y)) * 100, "%")##
## Test Set Accuracy: 100 %As we can see that the model has performed extremely well on train as well test data with 100% accuracy.
Exercise: Try different data sets with 2-classes as well as multi-classes
Comments
Post a Comment