Artificial Neural Networks : Artificial Neural Networks
Slide2 : Seeing the Billions of interconnections in the human
brain, and the way the human brain recognizes different
patterns, it was felt that there was a need to simulate the
human brain.
How did Artificial Neural Networks develop?
Model of a Biological Neuron : Model of a Biological Neuron
Three major components of biological neuron are: : Three major components of biological neuron are: Axon Cell Body Dendrites
Slide5 : At one end of the neuron there are a multitude of tiny filaments called Dendrites
Slide6 : Dendrites join together to form larger larger branches
and trunks where they attach to
the cell body
Slide7 : is a At the other end of the neuron
single filament
leading out of the
cell body called the
axon
Slide8 : Axon has extensive branching end links
in its far end called axon
terminals
Slide9 : Dendrites represented as inputs to the neuron Axon Neuron's output Each neuron has many inputs
through its multiple
Dendrites Only one output
through its
Single Axon
Slide10 : Synapse Each branch of the axon meeting exactly one
dendrite of another cell Synaptic gap Gap between the axon
terminals and dendrites of
another cell. Distance 50 and 200 Angstroms
Connections between neurons are formed at synapses : Connections between neurons are formed at synapses Axon of a neuron Synaptic gap Dendrite of another Neuron Neurons Information processors
Slide12 : Communication between neurons How do they take place ? Communication takes place with the help of electrical
signals
Slide13 : Signals are sent through the axon of one neuron to the dendrites of other neurons
Slide14 : The processing tasks in the brain are distributed among
about 1011 - 1012 elementary nerve cells called
neurons. Even then the brain has very less difficulty in correctly
and immediately recognizing patterns or objects. The crucial difference therefore lies not in the essential
speed of processing but in the organization of processing.
The key is the notion of massive parallelism or
connectionism.
Biological and Artificial Neural Systems : Biological and Artificial Neural Systems
Slide16 : Artificial Neural Networks mimic the brain in several
ways. The storage of information and control of the
system is done in a manner quite similar to that in the
brain The learning phase of artificial Neural Networks is
analogous to the development phase of mental faculties
of humans. Biological neural systems is made up of basic elements
known as neurons. Artificial Neural Networks is made
up of Neurons
Slide17 : The brain is capable of handling complex tasks such as
sensory processing, recognition, classification,
discrimination etc. Likewise Artificial Neural Networks is also capable of
performing all these tasks.
Slide18 : In addition, biological systems are able to learn adaptively
from experience and from representations of knowledge. This is due to their massively parallel processing
architecture. This feature also provides them
with a high degree of fault tolerance even in cases of
considerable damage. Similarly Artificial Neural Networks can learn adaptively
Slide19 : Despite their essentially biological bases, many
developments in Artificial Neural Networks have
stemmed from ideas in fields such as Statistics,Computer
Science,Cognitive learning, Mathematics, Engineering
viz. learning rules such as least mean squares and
generalized Delta rule are used to train multi-layer
feedforward networks. Contributions from other fields Learning algorithms such as Probabilistic Neural
Networks are based on Bayesian theory.
NEURAL MODELLING : NEURAL MODELLING McCullock and Pitts suggested the first synthetic neuron.
In the McCullock-Pitts model the artificial neuron
produces a binary output whose value depends on the
weighted sum of its inputs. However, this model consists of only one output
Classification Models : Classification Models Single Layer Perceptrons
A single Layer Perceptron consists of an input and an output
Layer.
The activation function employed is a hard limiting function.
An output unit will assume the value 1 if the sum of the
weighted inputs is greater than its threshold.
Slide23 : In terms of classification, an object will be classified as
class A if
Wij Xi > j
where Wij is the weight from unit i to unit j and j is the
threshold on unit j. Otherwise, the object will be classified
as class B. The equation
Wij Xi = j
forms a hyperplane in the n dimensional space, dividing the
space into two halves.
Slide24 : When n is 2, it becomes a line.Linear separability refers
to the case when a linear hyperplane exists to place the
instances of one class on one side and those of the other
class on the other side of the plane Unfortunately, many classification problems are not
linearly separable.
The Exclusive OR problem is a good example.
Slide25 : A single layer perceptron cannot simulate an exclusive
OR function.
The function accepts two inputs ( 0 or 1) and produces
an output of one only if either input is one (i.e one of
the inputs one) but not both.
Inputs Output
(1,1) 0
(1,0) 1
(0,1) 1
(0,0) 0
Slide26 : If we plot the above four points in the two dimensional
space, it is impossible to draw a line so that (1,1) and (0,0)
are on one side and (1,0) and (0,1) are on the other side.
To cope up with a problem which is not linearly separable,
a multilayer perceptron is required.
Weight Training : Weight Training 1. Adjust weights by
Wji(t+1) = Wji(t) + Wji
where Wji(t) is the weight from unit i to unit j at time t
(or the tth iteration) and Wji is the weight adjustment.
2. The weight change may be computed by the delta rule:
Wji = jXi (1)
where is the trial-independent learning rate
(0 < < 1, e.g., 0.3) and j is the error at unit j:
Slide28 : j = Tj - Oj
where Tj is the desired (target) output activation and Oj
is the actual output activation at output unit j.
3. Repeat iterations until convergence.
Slide29 : According to the perceptron convergence theorem, if the
data points are linearly separable, the perceptron learning
rule will converge to some solution in a finite number of
steps for any initial choice of weights. The delta rule is a simple generalization of the perceptron
learning rule. The learning rate sets the step size.
If is too small, the convergence is unnecessarily slow,
whereas if is too large, the learning process may diverge.
Drawbacks of the Perceptron Model : Drawbacks of the Perceptron Model It cannot handle problems which are not linearly
separable like the Exclusive OR problem Thus there is need for Multi layer feedforward networks
Multilayered feedfoward Networks : Multilayered feedfoward Networks m no. of neurons in the input layer
n no. of neurons in the hidden layer ,wij input to hidden weights
k no. of neurons in the output layer vjk hidden to output weights
Backpropagation network : Backpropagation network The backpropagation network is probably the most well known and widely used among the current types of neural network systems available. In contrast to earlier work on perceptrons, the backpropagation network is a multilayer feedfoward network with a different transfer function in the artificial neuron and a more powerful learning rule.
The learning rule is known as backpropagation, which is a
kind of gradient descent technique with the backward error
(gradient) propagation, as depicted in the figure.
Backpropagation algorithm : Backpropagation algorithm Supervised training algorithm for multilayered feedfoward Networks Sum of squares of errors between the computed value and the target value is minimized E =½ Σ[Tr OUTr]²
Tr -----> Target value
OUTr ----> Computed value
Slide37 : Backpropagation Algorithm
1. Apply input vector X (x1, x2, …, xm) to the input units.
2. Find net input values to hidden layer units : m N j = Σ Wij Xi i=1
3. Apply activation function:
Oj = F(Nj) = 1/(1+ e-Nj)
Slide38 : Backpropagation Algorithm(contd.)
4. Move to the output layer. Determine the net input values to each unit:
n NETr =ΣVjr Oj j =1
5. Again apply activation function:
OUTr = F(NETr)
Gradient Descent Rule is employed to change the weights. : Gradient Descent Rule is employed to change the weights. Changes in weights from hidden to output layer Vjr = - ∂E/∂Vjr
∂E/∂Vjr = ∂E/∂(OUTr) * ∂(OUTr)/∂Vjr
∂E/∂(OUTr) = -2(Tr-OUTr) OUTr = F(NETr)
OUTr = 1/(1+e - NETr )
OUTr = 1/(1+e - ΣOj Vjr)
∂(OUTr)/∂Vjr = F'(NETr)Oj
Vjr = > Change in weights from the jth hidden unit to the rth output unit : Vjr = > Change in weights from the jth hidden unit to the rth output unit Vjr = 2(Tr-OUTr) F'(NETr)Oj
= 2(Tr-OUTr)F(NET)(1-NET)Oj
where r = (Tr OUTr)F'(NETr)
Vjr = rOj
Oj => Output at the hidden layer
=> Learning rate 0 < < 1
Slide41 : Adjustments of weights :
Vjr(n+1) = Vjr(n) + Vjr
Vjr(n) => Weight from the jth unit in the hidden layer to the rth unit in the Output layer at step n.
Updating weights from Input to Hidden layer : Updating weights from Input to Hidden layer In this case Target Output is not known
E = ½ Σ[Tr OUTr]²
= ½ Σ[Tr F(NETr)]²
E = ½ Σ[Tr (FΣVjr Oj)]²
E = ½ Σ[Tr OUTr]² ∂E/∂wij = - Σ(Tr OUTr)∂OUTr∂wij
= -Σ(Tr OUTr)∂OUTr∂NETr*∂NETr∂Oj*∂Oj∂Nj*∂Nj∂wij
∂OUTr/∂NETr = F'(NETr)
= OUTr(1-OUTr)
Slide43 : ∂NETr/∂Oj = ∂(ΣOjVjr)/ ∂Oj=Vjr
∂Oj/∂Nj = F'(Nj)
∂Nj/∂Wij = ∂(ΣwijxI)/ ∂wij = xi
∂E/∂wij = -k Σ(Tr-OUTr)OUTr(1-OUTr)VjrxiF'(Nj)
wij = * ∂E/∂wij
wij = F'(Nj) xi ΣrVjr
wij(n+1) = wij(n) + wij
= wij(n) + jxi
Slide44 : ∂NETr/∂Oj = ∂(ΣOjVjr)/ ∂Oj=Vjr
∂Oj/∂Nj = F'(Nj)
∂Nj/∂Wij = ∂(ΣwijxI)/ ∂wij = xi
∂E/∂wij = -k Σ(Tr-OUTr)OUTr(1-OUTr)VjrxiF'(Nj)
wij = * ∂E/∂wij
wij = F'(Nj) xi ΣrVjr
wij(n+1) = wij(n) + wij
= wij(n) + jxi
Slide45 : Backpropagation Algorithm carried out on Transformed Data SQRT(X) for Classification of IRIS Data Backprop(10,100,50)
Epoch Mean Square Error
0 2.01221
500 0.0627589
1000 0.047611
1500 0.0436433
2000 0.0402018
2500 0.0369779
3000 0.033533
3500 0.0296887
4000 0.026169
4500 0.0231882
5000 0.0206999
Slide46 : Result
Ist 2nd 3rd 4th 5th 6th 7th
0 0 0 1 0 1 0
1 0 0 0 0 0 1
0 1 1 0 1 0 0
Number correctly classified out of 50 = 47
Slide47 : Columns 1 through 12
2 3 3 1 3 1 2 1 2 1 1 1
2 3 3 1 3 1 2 1 1 1 1 1
Columns 13 through 24
2 2 3 3 1 1 2 3 2 3 3 2
2 2 3 3 1 1 3 3 2 3 3 2
Columns 25 thru 36
2 2 3 1 2 1 2 2 3 2 1 2
2 2 3 1 2 1 2 2 3 2 1 2
Slide48 : Columns 37 through 48
3 2 3 1 3 1 2 1 1 3 2 2
3 2 3 1 3 1 3 1 1 3 2 3
Columns 49 through 50
3 3
3 3 Classification Efficiency = 94%
Slide50 : Classes No. of Patterns No. Correctly No.
Classified Misclassified
I 15 15 0
II 19 16 3
2 3
2 3
2 3
III 16 16 0
50 47 3
Slide51 : In Artificial Neural Networks, different activation functions are used. NNs with the identity function only support linear models.
The sigmoid function lets you model higher order
functions
Breast Cancer Data : Breast Cancer Data Attribute Information
Slide53 : Attribute Information: (class attribute is in last column)
| | # Attribute Domain
| -- -----------------------------------------
| 1. Sample code number id number
| 2. Clump Thickness 1 - 10
| 3. Uniformity of Cell Size 1 - 10
| 4. Uniformity of Cell Shape 1 - 10
| 5. Marginal Adhesion 1 - 10
| 6. Single Epithelial Cell Size 1 - 10
| 7. Bare Nuclei 1 - 10
| 8. Bland Chromatin 1 - 10
| 9. Normal Nucleoli 1 - 10
| 10. Mitoses 1 - 10
| 11. Class: (0 for benign, 1 for malignant)
|
|
Slide54 : 1365328, 1, 1, 2, 1, 2, 1, 2, 1, 1, 0,benign.
242970, 5, 7, 7, 1, 5, 8, 3, 4, 1, 0,benign.
1133041, 5, 3, 1, 2, 2, 1, 2, 1, 1, 0,benign.
183936, 3, 1, 1, 1, 2, 1, 2, 1, 1, 0,benign.
1168278, 3, 1, 1, 1, 2, 1, 2, 1, 1, 0,benign.
1059552, 1, 1, 1, 1, 2, 1, 3, 1, 1,0, benign.
1185610, 1, 1, 1, 1, 3, 2, 2, 1, 1, 0,benign.
1158247, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,benign.
1206841, 10, 5, 6, 10, 6, 10, 7, 7, 10, 1,malignant;
1166654, 10, 3, 5, 1, 10, 5, 3, 10, 2, 1 ,malignant;
1100524, 6, 10, 10, 2, 8, 10, 7, 3, 3, 1 ,malignant;
1253955, 8, 7, 4, 4, 5, 3, 5, 10, 1, 1 ,malignant;
1344121, 8, 10, 4, 4, 8, 10, 8, 2, 1, 1 ,malignant;
760239, 10, 4, 6, 4, 5, 10, 7, 1, 1, 1 ,malignant;
1257470, 10, 6, 5, 8, 5, 10, 8, 6, 1, 1 ,malignant;
1241559, 10, 8, 8, 2, 8, 10, 4, 8, 10, 1 ,malignant;
1173216, 10, 10, 10, 3, 10, 8, 8, 1, 1, 1 ,malignant;
859350, 8, 10, 10, 7, 10, 10, 7, 3, 8, 1 ,malignant;
Slide55 : breast_ann_all
siz =
699 9
TRAINGD, Epoch 0/150, MSE 0.576234/0.01, Gradient 2.57122/0
TRAINGD, Epoch 30/150, MSE 0.0321848/0.01, Gradient 0.0521112/0
TRAINGD, Epoch 60/150, MSE 0.0275306/0.01, Gradient 0.0303056/0
TRAINGD, Epoch 90/150, MSE 0.0254584/0.01, Gradient 0.0228935/0
TRAINGD, Epoch 120/150, MSE 0.0241553/0.01, Gradient 0.0190191/0
TRAINGD, Epoch 150/150, MSE 0.0232051/0.01, Gradient 0.0166925/0
.
errors =
17
Slide56 : INPUT TO HIDDEN LAYER WEIGHT MATRIX
» net.IW{1,1} No. of Input Neurons 9
No. of hidden Neurons 15
ans =Columns 1 through 7
0.4169 -0.0381 -0.4370 -0.0062 -0.0005 0.1794 -0.3079
-0.2043 0.3573 0.2168 0.1998 0.3585 0.0655 -0.3382
0.0354 0.3249 -0.1094 -0.1222 0.2336 -0.0926 0.3026
-0.0204 -0.0829 0.4098 -0.2141 0.1220 0.1731 -0.3123
0.5259 0.4910 0.1407 -0.0679 0.4966 0.3078 0.0568
0.3003 -0.3291 0.0342 -0.1282 0.2238 0.3262 0.2048
-0.1439 -0.2100 0.1936 0.0538 -0.2308 0.3183 -0.2627
-0.4914 0.1737 0.0401 -0.2056 -0.1890 -0.0102 0.0309
0.2800 -0.4105 -0.2244 0.0524 -0.1155 0.3221 -0.3484
-0.1093 -0.2717 0.0618 -0.3655 -0.0672 -0.3446 0.2642
0.1527 -0.1752 0.3118 0.1934 0.2152 0.4706 0.0520
0.3059 -0.2650 -0.4344 -0.0916 -0.1567 -0.1991 -0.0530
0.3573 0.0431 0.1460 0.2996 0.2877 -0.2312 0.0220
0.2374 -0.1740 -0.1277 0.3391 0.0669 0.3750 -0.1687
-0.2967 -0.2844 0.3282 0.1017 -0.1146 0.2248 -0.0591
Slide57 : Columns 8 through 9
-0.2425 -0.0959
0.0793 -0.1498
0.1961 0.3405
0.0165 -0.4758
0.2383 0.3602
-0.1741 0.4633
-0.1372 0.4131
0.2768 0.3363
0.1582 -0.0368
-0.0414 -0.0669
0.0669 -0.3128
0.2926 0.1437
-0.3925 -0.1476
0.0986 0.4142
-0.4222 0.2207
Slide58 :
HIDDEN TO OUTPUT LAYER MATRIX 15 hidden neurons
1 output neuron
» net.LW{2,1}
ans =
Columns 1 through 7
-0.2053 0.3585 -0.4380 -0.2438 0.9043 0.4490 -0.5363
Columns 8 through 14
0.5594 0.1198 -0.7061 -0.4253 0.0276 0.2084 -0.0951
Column 15
0.1637
Slide59 : Various types of Neural Network Models Supervised Training Unsupervised Training Backpropagation Network
General Regression Neural Network
Radial Basis Function Neural Network
Probabilistic Neural Network
Functional Link Neural Network Adaptive resonance Theory model
Kohonen’s Self Organizing Map
Neocognitron Model
Slide60 : Neural Networks Toolbox
.Making new feedforward object
Examples
Let P denote the inputs and T denote the targets
P = [0 1 2 3 4 5 6 7 8 9 10];
T = [0 1 2 3 4 3 2 1 2 3 4];
Slide61 : P =
1 2 3
4 5 6
7 8 9
10 11 12
» T=[ 1 0 1]
T = 1 0 1 Example : If there are 3 patterns with 4 features then P will
be of the form of 4X3 and if the output is a 1 and 0 simply
then the T matrix will be of the 1X3 form.
» P=[1,2,3;4,5,6;7,8,9;10,11,12]
Slide62 : Here a two-layer feed-forward network is created.
The network's input ranges from [0 to 12]. The first layer
has five TANSIG neurons, the second layer has one
PURELIN neuron.
Slide63 :
Other functions are LOGSIG & DLOGSIG. The
TRAINLM network training function is to be used. The
others which can be used are traingd, traingda, traingdx,
trainlm, traingdm etc.
Slide64 : Errors in Heart Data
Slide65 : Error with 6 features of Heart Data
Slide66 : Breast Data with all features
Slide67 : Breast Data with 3 features
Slide68 : Breast data with four features
Slide69 : IRIS Data ANN Training with all features