Neural Network in R

In this lab we will go through an example of how we can use Neural Network in R to build a predictive model using a data set.

We will use the data set “BreastCancer” from the package “mlbench”. You need to install this package.

The objective is to identify each of a number of benign or malignant classes. Samples arrive periodically as Dr. Wolberg reports his clinical cases. Each variable except for the first was converted into 11 primitive numerical attributes with values ranging from 0 through 10. There are 16 missing attribute values.

A data frame with 699 observations on 11 variables, one being a character variable, 9 being ordered or nominal, and 1 target class.

Id: Sample code number
Cl.thickness: Clump Thickness
Cell.size: Uniformity of Cell Size
Cell.shape: Uniformity of Cell Shape
Marg.adhesion: Marginal Adhesion
Epith.c.size: Single Epithelial Cell Size
Bare.nuclei: Bare Nuclei
Bl.cromatin: Bland Chromatin
Normal.nucleoli: Normal Nucleoli
Mitoses: Mitoses
Class: Benign or Malignant

You can get all the details from package “mlbench” by typing helP(package=“mlbench”) in R and then selecting “BreastCancer” from the list of objects appearing in the window.

install.packages("mlbench")

Loading and analysis the data

let’s load the data

library(mlbench)

data(BreastCancer)

Let’s look at the data

summary(BreastCancer)

##       Id             Cl.thickness   Cell.size     Cell.shape  Marg.adhesion
##  Length:699         1      :145   1      :384   1      :353   1      :407  
##  Class :character   5      :130   10     : 67   2      : 59   2      : 58  
##  Mode  :character   3      :108   3      : 52   10     : 58   3      : 58  
##                     4      : 80   2      : 45   3      : 56   10     : 55  
##                     10     : 69   4      : 40   4      : 44   4      : 33  
##                     2      : 50   5      : 30   5      : 34   8      : 25  
##                     (Other):117   (Other): 81   (Other): 95   (Other): 63  
##   Epith.c.size  Bare.nuclei   Bl.cromatin  Normal.nucleoli    Mitoses   
##  2      :386   1      :402   2      :166   1      :443     1      :579  
##  3      : 72   10     :132   3      :165   10     : 61     2      : 35  
##  4      : 48   2      : 30   1      :152   3      : 44     3      : 33  
##  1      : 47   5      : 30   7      : 73   2      : 36     10     : 14  
##  6      : 41   3      : 28   4      : 40   8      : 24     4      : 12  
##  5      : 39   (Other): 61   5      : 34   6      : 22     7      :  9  
##  (Other): 66   NA's   : 16   (Other): 69   (Other): 69     (Other): 17  
##        Class    
##  benign   :458  
##  malignant:241  
##                 
##                 
##                 
##                 
##

dim(BreastCancer)

## [1] 699  11

There are 699 records and 11 variables and most of them look like qualitative as we do not see numerical attributes such as mean median, etc., in the summary.

Under Bare.nuclei, we can see some missing values as well

The fist variable is id and will not be useful in model building

Last variable, Class, is our response variable and rest of them are our independent variables

Let’s get rid of records with missing values

sum(!complete.cases(BreastCancer))

## [1] 16

BreastCancerComplete = BreastCancer[complete.cases(BreastCancer),]

sum(!complete.cases(BreastCancerComplete))

## [1] 0

sum(is.na(BreastCancerComplete))

## [1] 0

dim(BreastCancerComplete)

## [1] 683  11

The new data has no missing values and has 683 records.

Let’s check the data type of each variable

apply(BreastCancerComplete, 2, FUN = class)

##              Id    Cl.thickness       Cell.size      Cell.shape   Marg.adhesion 
##     "character"     "character"     "character"     "character"     "character" 
##    Epith.c.size     Bare.nuclei     Bl.cromatin Normal.nucleoli         Mitoses 
##     "character"     "character"     "character"     "character"     "character" 
##           Class 
##     "character"

for(i in 1:dim(BreastCancerComplete)[2]){
  cat("levels of", names(BreastCancerComplete)[i], ":", levels(BreastCancerComplete[,i]),"\n")
}

## levels of Id : 
## levels of Cl.thickness : 1 2 3 4 5 6 7 8 9 10 
## levels of Cell.size : 1 2 3 4 5 6 7 8 9 10 
## levels of Cell.shape : 1 2 3 4 5 6 7 8 9 10 
## levels of Marg.adhesion : 1 2 3 4 5 6 7 8 9 10 
## levels of Epith.c.size : 1 2 3 4 5 6 7 8 9 10 
## levels of Bare.nuclei : 1 2 3 4 5 6 7 8 9 10 
## levels of Bl.cromatin : 1 2 3 4 5 6 7 8 9 10 
## levels of Normal.nucleoli : 1 2 3 4 5 6 7 8 9 10 
## levels of Mitoses : 1 2 3 4 5 6 7 8 10 
## levels of Class : benign malignant

As we can see that all variables except Id, are factor variables with their respective levels. In fact, 9 of 10 factor variables have the same levels varying from 1 to 10 as they are ordinal factor variables.

This is a good example to see how neural network performs on data set like this in which all variables are categorical (ordinal).

Since, they are ordinal data types, we cannot treat them as continuous variables. Hence, we will use them as factor variables. We do not need “Id” as it doesn’t add any value to the predictive model, so it can be dropped out.

BreastCancerNew = BreastCancerComplete[,-1] # Dropping "Id"

names(BreastCancerNew)

##  [1] "Cl.thickness"    "Cell.size"       "Cell.shape"      "Marg.adhesion"  
##  [5] "Epith.c.size"    "Bare.nuclei"     "Bl.cromatin"     "Normal.nucleoli"
##  [9] "Mitoses"         "Class"

barplot(table(BreastCancerNew$Class), col=1:nlevels(BreastCancerNew$Class)+1)

table(BreastCancerNew$Class)

## 
##    benign malignant 
##       444       239

As we can see that the number of cases of benign tumors (444) is almost twice as the number of malignant tumors (239).

We have to build a neural network predictive model that predicts whether the tumor is benign or malignant.

Train and Test Data

Let’s split the data set into train (80%) and test (20%)

m = dim(BreastCancerNew)[1]

set.seed(1);train.idx = sample(1:m, 0.8*m, replace=F)

train.n = length(train.idx)

test.n = m-train.n

train = BreastCancerNew[train.idx,]
test = BreastCancerNew[-train.idx,]

dim(train)

## [1] 546  10

dim(test)

## [1] 137  10

Training Neural Network Classifier on Train data

To build the classifier, we will use the nnet() function, which is part of nnet package. Install the package on your R.

install.packages("nnet")

nnet() is a very simple neural net function which fits single-hidden-layer neural network, possibly with skip-layer connections. Therefore, it builds a network with one input, one hidden and one output layers. We can decide the number of units in the hidden layer, but we cannot add more hidden layers using this function.

library(nnet)
nn.model = nnet(Class ~ ., data=train, siz=10, decay = 0.01, maxit=400)

## # weights:  821
## initial  value 461.257211 
## iter  10 value 20.710316
## iter  20 value 6.840680
## iter  30 value 4.351963
## iter  40 value 3.746710
## iter  50 value 3.549689
## iter  60 value 3.473456
## iter  70 value 3.417225
## iter  80 value 3.376630
## iter  90 value 3.366162
## iter 100 value 3.353700
## iter 110 value 3.346228
## iter 120 value 3.341420
## iter 130 value 3.338975
## iter 140 value 3.337624
## iter 150 value 3.334670
## iter 160 value 3.332833
## iter 170 value 3.332056
## iter 180 value 3.329981
## iter 190 value 3.328471
## iter 200 value 3.327352
## iter 210 value 3.325841
## iter 220 value 3.324051
## iter 230 value 3.322989
## iter 240 value 3.322285
## iter 250 value 3.321863
## iter 260 value 3.321658
## iter 270 value 3.321562
## iter 280 value 3.321496
## iter 290 value 3.321476
## iter 300 value 3.321472
## iter 310 value 3.321469
## iter 320 value 3.321464
## iter 330 value 3.321447
## iter 340 value 3.321433
## iter 350 value 3.321422
## iter 360 value 3.321418
## final  value 3.321417 
## converged

Here, size is the number of units (nodes) in the hidden layer which we have set as 10, decay is the regularization term (which determines how quickly the parameters decay) to help in optimization and to avoid over-fitting, maxit is the number iteration.

Brief summary of the model:

print(nn.model)

## a 80-10-1 network with 821 weights
## inputs: Cl.thickness.L Cl.thickness.Q Cl.thickness.C Cl.thickness^4 Cl.thickness^5 Cl.thickness^6 Cl.thickness^7 Cl.thickness^8 Cl.thickness^9 Cell.size.L Cell.size.Q Cell.size.C Cell.size^4 Cell.size^5 Cell.size^6 Cell.size^7 Cell.size^8 Cell.size^9 Cell.shape.L Cell.shape.Q Cell.shape.C Cell.shape^4 Cell.shape^5 Cell.shape^6 Cell.shape^7 Cell.shape^8 Cell.shape^9 Marg.adhesion.L Marg.adhesion.Q Marg.adhesion.C Marg.adhesion^4 Marg.adhesion^5 Marg.adhesion^6 Marg.adhesion^7 Marg.adhesion^8 Marg.adhesion^9 Epith.c.size.L Epith.c.size.Q Epith.c.size.C Epith.c.size^4 Epith.c.size^5 Epith.c.size^6 Epith.c.size^7 Epith.c.size^8 Epith.c.size^9 Bare.nuclei2 Bare.nuclei3 Bare.nuclei4 Bare.nuclei5 Bare.nuclei6 Bare.nuclei7 Bare.nuclei8 Bare.nuclei9 Bare.nuclei10 Bl.cromatin2 Bl.cromatin3 Bl.cromatin4 Bl.cromatin5 Bl.cromatin6 Bl.cromatin7 Bl.cromatin8 Bl.cromatin9 Bl.cromatin10 Normal.nucleoli2 Normal.nucleoli3 Normal.nucleoli4 Normal.nucleoli5 Normal.nucleoli6 Normal.nucleoli7 Normal.nucleoli8 Normal.nucleoli9 Normal.nucleoli10 Mitoses2 Mitoses3 Mitoses4 Mitoses5 Mitoses6 Mitoses7 Mitoses8 Mitoses10 
## output(s): Class 
## options were - entropy fitting  decay=0.01

So, this model is 80-10-1, which means 80 units (nodes) in input layer, 10 units in hidden layer (as we have set it at that level) and a single unit in output layer (which is obvious as there are only two classes). There are a total of 821 weights (or parameters).

We have so many units in input layer, because all the input variables are ordered factors (or categorical in nature) and hence, the neural net has created many dummy variables.

If you want to see the full summary of the model, try the code below in your R session. It will print the entire network. We have not printed here, as it will take up a lot of space.

summary(nn.model)

However, if you wish to plot the model, you can do that using plotnet() function of a package, “NeuralNetTools”

install.packages("NeuralNetTools")

library( NeuralNetTools)
plotnet(nn.model)

Since the network of this model has 80 input nodes, we cannot clearly see all the connections here. For a smaller number of nodes, it would be possible to visualize the network clearly.

The package also offers, a function, garson(), which plots the relative importance of input variables in the neural net.

garson(nn.model)

Again, due to the large number of input units and space constraints, we cannot clearly see the x-axis.

If you wish to see all the components of the model:

names(nn.model)

##  [1] "n"             "nunits"        "nconn"         "conn"         
##  [5] "nsunits"       "decay"         "entropy"       "softmax"      
##  [9] "censored"      "value"         "wts"           "convergence"  
## [13] "fitted.values" "residuals"     "lev"           "call"         
## [17] "terms"         "coefnames"     "contrasts"     "xlevels"

wts are the parameters, and coefnames are names of input variables.

Predictions

Let’s see how well our model has performed on both training and test data

Predictions on test and train data

pred.train = predict(nn.model, newdata = train, type = "class")
pred.test = predict(nn.model, newdata = test, type = "class")

Let’s calculate train and test errors

confMtx.train=table(train$Class,pred.train) 
error.train=(confMtx.train[1,2]+confMtx.train[2,1])/train.n

confMtx.test=table(test$Class,pred.test) 
error.test=(confMtx.test[1,2]+confMtx.test[2,1])/test.n

confMtx.train

##            pred.train
##             benign malignant
##   benign       347         0
##   malignant      0       199

confMtx.test

##            pred.test
##             benign malignant
##   benign        94         3
##   malignant      5        35

error.train

## [1] 0

error.test

## [1] 0.05839416

cat('\nTrain Set Accuracy: ', mean((pred.train == train$Class)) * 100, "%")

## 
## Train Set Accuracy:  100 %

cat('\nTest Set Accuracy: ', mean((pred.test == test$Class)) * 100, "%")

## 
## Test Set Accuracy:  94.16058 %

As we can see that the model has performed really well; with just one hidden layer it has reached 100% accuracy on training set, while 94.16% of accuracy on test data.

Neural Networks are known to solve complex problems, especially non-linear data sets.

Exercise: Try different values of size, decay and maxit to see how the model performs on test data - whether you can improve the performance.

Click the links below for more

Machine Learning - Building Neural Network Algorithm from scratch

Business and AI

Search This Blog

Neural Network in R

Lab_NN_1_RM

Asmi Ariv

2022-10-03

Neural Network in R

Loading and analysis the data

Train and Test Data

Training Neural Network Classifier on Train data

Predictions

Click the links below for more

Labels

Comments

Post a Comment

Popular posts from this blog

Metaverse needs better technology, scalable infra, strong governance

What is ChatGPT?

Exploratory Data Analysis