Lab_NN_1_RM
Asmi Ariv
2022-10-03
Neural Network in R
In this lab we will go through an example of how we can use Neural Network in R to build a predictive model using a data set.
We will use the data set “BreastCancer” from the package “mlbench”. You need to install this package.
The objective is to identify each of a number of benign or malignant classes. Samples arrive periodically as Dr. Wolberg reports his clinical cases. Each variable except for the first was converted into 11 primitive numerical attributes with values ranging from 0 through 10. There are 16 missing attribute values.
A data frame with 699 observations on 11 variables, one being a character variable, 9 being ordered or nominal, and 1 target class.
- Id: Sample code number
- Cl.thickness: Clump Thickness
- Cell.size: Uniformity of Cell Size
- Cell.shape: Uniformity of Cell Shape
- Marg.adhesion: Marginal Adhesion
- Epith.c.size: Single Epithelial Cell Size
- Bare.nuclei: Bare Nuclei
- Bl.cromatin: Bland Chromatin
- Normal.nucleoli: Normal Nucleoli
- Mitoses: Mitoses
- Class: Benign or Malignant
You can get all the details from package “mlbench” by typing helP(package=“mlbench”) in R and then selecting “BreastCancer” from the list of objects appearing in the window.
install.packages("mlbench")Loading and analysis the data
let’s load the data
library(mlbench)
data(BreastCancer)Let’s look at the data
summary(BreastCancer)## Id Cl.thickness Cell.size Cell.shape Marg.adhesion
## Length:699 1 :145 1 :384 1 :353 1 :407
## Class :character 5 :130 10 : 67 2 : 59 2 : 58
## Mode :character 3 :108 3 : 52 10 : 58 3 : 58
## 4 : 80 2 : 45 3 : 56 10 : 55
## 10 : 69 4 : 40 4 : 44 4 : 33
## 2 : 50 5 : 30 5 : 34 8 : 25
## (Other):117 (Other): 81 (Other): 95 (Other): 63
## Epith.c.size Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses
## 2 :386 1 :402 2 :166 1 :443 1 :579
## 3 : 72 10 :132 3 :165 10 : 61 2 : 35
## 4 : 48 2 : 30 1 :152 3 : 44 3 : 33
## 1 : 47 5 : 30 7 : 73 2 : 36 10 : 14
## 6 : 41 3 : 28 4 : 40 8 : 24 4 : 12
## 5 : 39 (Other): 61 5 : 34 6 : 22 7 : 9
## (Other): 66 NA's : 16 (Other): 69 (Other): 69 (Other): 17
## Class
## benign :458
## malignant:241
##
##
##
##
## dim(BreastCancer)## [1] 699 11There are 699 records and 11 variables and most of them look like qualitative as we do not see numerical attributes such as mean median, etc., in the summary.
Under Bare.nuclei, we can see some missing values as well
The fist variable is id and will not be useful in model building
Last variable, Class, is our response variable and rest of them are our independent variables
Let’s get rid of records with missing values
sum(!complete.cases(BreastCancer))## [1] 16BreastCancerComplete = BreastCancer[complete.cases(BreastCancer),]
sum(!complete.cases(BreastCancerComplete))## [1] 0sum(is.na(BreastCancerComplete))## [1] 0dim(BreastCancerComplete)## [1] 683 11The new data has no missing values and has 683 records.
Let’s check the data type of each variable
apply(BreastCancerComplete, 2, FUN = class)## Id Cl.thickness Cell.size Cell.shape Marg.adhesion
## "character" "character" "character" "character" "character"
## Epith.c.size Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses
## "character" "character" "character" "character" "character"
## Class
## "character"for(i in 1:dim(BreastCancerComplete)[2]){
cat("levels of", names(BreastCancerComplete)[i], ":", levels(BreastCancerComplete[,i]),"\n")
}## levels of Id :
## levels of Cl.thickness : 1 2 3 4 5 6 7 8 9 10
## levels of Cell.size : 1 2 3 4 5 6 7 8 9 10
## levels of Cell.shape : 1 2 3 4 5 6 7 8 9 10
## levels of Marg.adhesion : 1 2 3 4 5 6 7 8 9 10
## levels of Epith.c.size : 1 2 3 4 5 6 7 8 9 10
## levels of Bare.nuclei : 1 2 3 4 5 6 7 8 9 10
## levels of Bl.cromatin : 1 2 3 4 5 6 7 8 9 10
## levels of Normal.nucleoli : 1 2 3 4 5 6 7 8 9 10
## levels of Mitoses : 1 2 3 4 5 6 7 8 10
## levels of Class : benign malignantAs we can see that all variables except Id, are factor variables with their respective levels. In fact, 9 of 10 factor variables have the same levels varying from 1 to 10 as they are ordinal factor variables.
This is a good example to see how neural network performs on data set like this in which all variables are categorical (ordinal).
Since, they are ordinal data types, we cannot treat them as continuous variables. Hence, we will use them as factor variables. We do not need “Id” as it doesn’t add any value to the predictive model, so it can be dropped out.
BreastCancerNew = BreastCancerComplete[,-1] # Dropping "Id"
names(BreastCancerNew)## [1] "Cl.thickness" "Cell.size" "Cell.shape" "Marg.adhesion"
## [5] "Epith.c.size" "Bare.nuclei" "Bl.cromatin" "Normal.nucleoli"
## [9] "Mitoses" "Class"barplot(table(BreastCancerNew$Class), col=1:nlevels(BreastCancerNew$Class)+1)table(BreastCancerNew$Class)##
## benign malignant
## 444 239As we can see that the number of cases of benign tumors (444) is almost twice as the number of malignant tumors (239).
We have to build a neural network predictive model that predicts whether the tumor is benign or malignant.
Train and Test Data
Let’s split the data set into train (80%) and test (20%)
m = dim(BreastCancerNew)[1]
set.seed(1);train.idx = sample(1:m, 0.8*m, replace=F)
train.n = length(train.idx)
test.n = m-train.n
train = BreastCancerNew[train.idx,]
test = BreastCancerNew[-train.idx,]
dim(train)## [1] 546 10dim(test)## [1] 137 10Training Neural Network Classifier on Train data
To build the classifier, we will use the nnet() function, which is part of nnet package. Install the package on your R.
install.packages("nnet")nnet() is a very simple neural net function which fits single-hidden-layer neural network, possibly with skip-layer connections. Therefore, it builds a network with one input, one hidden and one output layers. We can decide the number of units in the hidden layer, but we cannot add more hidden layers using this function.
library(nnet)
nn.model = nnet(Class ~ ., data=train, siz=10, decay = 0.01, maxit=400)## # weights: 821
## initial value 461.257211
## iter 10 value 20.710316
## iter 20 value 6.840680
## iter 30 value 4.351963
## iter 40 value 3.746710
## iter 50 value 3.549689
## iter 60 value 3.473456
## iter 70 value 3.417225
## iter 80 value 3.376630
## iter 90 value 3.366162
## iter 100 value 3.353700
## iter 110 value 3.346228
## iter 120 value 3.341420
## iter 130 value 3.338975
## iter 140 value 3.337624
## iter 150 value 3.334670
## iter 160 value 3.332833
## iter 170 value 3.332056
## iter 180 value 3.329981
## iter 190 value 3.328471
## iter 200 value 3.327352
## iter 210 value 3.325841
## iter 220 value 3.324051
## iter 230 value 3.322989
## iter 240 value 3.322285
## iter 250 value 3.321863
## iter 260 value 3.321658
## iter 270 value 3.321562
## iter 280 value 3.321496
## iter 290 value 3.321476
## iter 300 value 3.321472
## iter 310 value 3.321469
## iter 320 value 3.321464
## iter 330 value 3.321447
## iter 340 value 3.321433
## iter 350 value 3.321422
## iter 360 value 3.321418
## final value 3.321417
## convergedHere, size is the number of units (nodes) in the hidden layer which we have set as 10, decay is the regularization term (which determines how quickly the parameters decay) to help in optimization and to avoid over-fitting, maxit is the number iteration.
Brief summary of the model:
print(nn.model)## a 80-10-1 network with 821 weights
## inputs: Cl.thickness.L Cl.thickness.Q Cl.thickness.C Cl.thickness^4 Cl.thickness^5 Cl.thickness^6 Cl.thickness^7 Cl.thickness^8 Cl.thickness^9 Cell.size.L Cell.size.Q Cell.size.C Cell.size^4 Cell.size^5 Cell.size^6 Cell.size^7 Cell.size^8 Cell.size^9 Cell.shape.L Cell.shape.Q Cell.shape.C Cell.shape^4 Cell.shape^5 Cell.shape^6 Cell.shape^7 Cell.shape^8 Cell.shape^9 Marg.adhesion.L Marg.adhesion.Q Marg.adhesion.C Marg.adhesion^4 Marg.adhesion^5 Marg.adhesion^6 Marg.adhesion^7 Marg.adhesion^8 Marg.adhesion^9 Epith.c.size.L Epith.c.size.Q Epith.c.size.C Epith.c.size^4 Epith.c.size^5 Epith.c.size^6 Epith.c.size^7 Epith.c.size^8 Epith.c.size^9 Bare.nuclei2 Bare.nuclei3 Bare.nuclei4 Bare.nuclei5 Bare.nuclei6 Bare.nuclei7 Bare.nuclei8 Bare.nuclei9 Bare.nuclei10 Bl.cromatin2 Bl.cromatin3 Bl.cromatin4 Bl.cromatin5 Bl.cromatin6 Bl.cromatin7 Bl.cromatin8 Bl.cromatin9 Bl.cromatin10 Normal.nucleoli2 Normal.nucleoli3 Normal.nucleoli4 Normal.nucleoli5 Normal.nucleoli6 Normal.nucleoli7 Normal.nucleoli8 Normal.nucleoli9 Normal.nucleoli10 Mitoses2 Mitoses3 Mitoses4 Mitoses5 Mitoses6 Mitoses7 Mitoses8 Mitoses10
## output(s): Class
## options were - entropy fitting decay=0.01So, this model is 80-10-1, which means 80 units (nodes) in input layer, 10 units in hidden layer (as we have set it at that level) and a single unit in output layer (which is obvious as there are only two classes). There are a total of 821 weights (or parameters).
We have so many units in input layer, because all the input variables are ordered factors (or categorical in nature) and hence, the neural net has created many dummy variables.
If you want to see the full summary of the model, try the code below in your R session. It will print the entire network. We have not printed here, as it will take up a lot of space.
summary(nn.model) However, if you wish to plot the model, you can do that using plotnet() function of a package, “NeuralNetTools”
install.packages("NeuralNetTools")library( NeuralNetTools)
plotnet(nn.model)Since the network of this model has 80 input nodes, we cannot clearly see all the connections here. For a smaller number of nodes, it would be possible to visualize the network clearly.
The package also offers, a function, garson(), which plots the relative importance of input variables in the neural net.
garson(nn.model)Again, due to the large number of input units and space constraints, we cannot clearly see the x-axis.
If you wish to see all the components of the model:
names(nn.model)## [1] "n" "nunits" "nconn" "conn"
## [5] "nsunits" "decay" "entropy" "softmax"
## [9] "censored" "value" "wts" "convergence"
## [13] "fitted.values" "residuals" "lev" "call"
## [17] "terms" "coefnames" "contrasts" "xlevels"wts are the parameters, and coefnames are names of input variables.
Predictions
Let’s see how well our model has performed on both training and test data
Predictions on test and train data
pred.train = predict(nn.model, newdata = train, type = "class")
pred.test = predict(nn.model, newdata = test, type = "class")Let’s calculate train and test errors
confMtx.train=table(train$Class,pred.train)
error.train=(confMtx.train[1,2]+confMtx.train[2,1])/train.n
confMtx.test=table(test$Class,pred.test)
error.test=(confMtx.test[1,2]+confMtx.test[2,1])/test.n
confMtx.train## pred.train
## benign malignant
## benign 347 0
## malignant 0 199confMtx.test## pred.test
## benign malignant
## benign 94 3
## malignant 5 35error.train## [1] 0error.test## [1] 0.05839416cat('\nTrain Set Accuracy: ', mean((pred.train == train$Class)) * 100, "%")##
## Train Set Accuracy: 100 %cat('\nTest Set Accuracy: ', mean((pred.test == test$Class)) * 100, "%")##
## Test Set Accuracy: 94.16058 %As we can see that the model has performed really well; with just one hidden layer it has reached 100% accuracy on training set, while 94.16% of accuracy on test data.
Neural Networks are known to solve complex problems, especially non-linear data sets.
Exercise: Try different values of size, decay and maxit to see how the model performs on test data - whether you can improve the performance.
Click the links below for more
Machine Learning - Building Neural Network Algorithm from scratch
Comments
Post a Comment