Artificial Neural Networks : Artificial Neural Networks
Slide 2: ANNs- How They Operate ANNs represent a highly connected network of neurons - the basic processing unit
They operate in a highly parallel manner.
Each neuron does some amount of information processing.
It derives inputs from some other neuron and in return gives its output to other neuron for further processing.
This layer-by-layer processing of the information results in great computational capability.
As a result of this parallel processing, ANNs are able to achieve great results when applied to real-life problems.
Artificial Neural Networks : Artificial Neural Networks The Artificial Neural Networks (ANNs) are such agents that can employ machine learning for the task of desired mapping of the inputs to the outputs.
These networks can hence automatically assume any shape that carries forward the task of determination of the outputs to the presented input.
The ANNs are an inspiration from the human brain. The human brain comprises of as many as 1011 biological neurons.
Each of the biological neuron is connected to the other neurons making massive 1022 connections between the various neurons.
Dataset : Dataset A Dataset is divided into three parts:
Training dataset
Testing dataset
Validation dataset
Slide 5: The Biological Neuron The entire human brain consists of small interconnected processing units called neurons and are connected to each other by nerve fibers
The interneuron information communication makes it possible for multilevel hierarchical information processing, which gives the brain all its problem-solving power.
Each neuron is provided with many inputs, each of which comes from other neurons. Each neuron takes a weighted average of the various inputs presented to it.
The weighted average is then made to pass over a nonlinear inhibiting function that limits the neuron’s output. The nonlinearity in biological neurons is provided by the presence of potassium ions within each neuron cell and sodium ions outside the cell membrane.
The difference in concentrations of these two elements causes an electrical potential difference, which in turn causes a current to flow from outside the cell to inside the cell. This is how the neuron takes its inputs
Slide 6: Structural Components of a Neuron A neuron has four main structural components - the dendrites, the cell body, the axon, and the synapse.
Slide 7: Structural Components of a Neuron Dendrites are responsible for receiving the signals from other neurons.
These signals are passed through a small, thick fiber called a dendron.
The received signals collected at different dendrites are processed within the cell body, and the resulting signal is transferred through a long fiber called the axon.
At the other end of the axon exists an inhibiting unit called a synapse, which controls the flow of neuronal current from the originating neuron to the receiving dendrites of neighboring neurons.
Slide 8: The Artificial Neuron The neural network, by its simulating a biological neural network, is a novel computer architecture and a novel algorithmization architecture relative to conventional computers.
It allows using very simple computational operations (additions, multiplication, and fundamental logic elements) to solve complex, mathematically ill-defined problems, nonlinear problems, or stochastic problems.
The artificial neuron is the most basic computational unit of information processing in ANNs.
Each neuron takes information from a set of neurons, processes it, and gives the result to another neuron for further processing.
These neurons are a direct result of the study of the human brain and attempts to imitate the biological neuron.
Slide 9: Structure of an Artificial Neuron The neuron consists of a number of inputs. The information is given as inputs via input connections, each of which has some weight associated with it.
An additional input, known as bias, is given to the artificial neuron.
Slide 10: Structure of an Artificial Neuron (continued) Figure 2.4 The processing in a single artificial neuron. the inputs are marked I1, I2, I3, … , In; the weights associated with each
Connection are given by W1, W2, W3, … , Wn; b denotes the bias; and the output is denoted by O. Because there is one weight for every input, the number of inputs is equal to the number of weights in a neuron.
Slide 11: The Processing of the Neuron Functionality of neuron that is performed can be broken down into two steps. The first is the weighted summation, and the second is the activation function. The two steps are applied one after the other, as shown in the previous slide.
The function of the summation block is given by Equation
The summation forms the input to the next block. This is the block of the activation function, where the input is made to pass through a function called the activation function.
Slide 12: The Activation Function The activation function performs several important tasks
One of the more important of which is to introduce nonlinearity to the network.
Another important feature of the activation function is its ability to limit the neuron’s output.
The complete mathematical function of the artificial neuron can be given as:
where f is any activation function
Models of ANN : Models of ANN Multi-layer perceptron with back propagation algorithm
Radial basis function networks
Learning vector quantization
Self organizing maps
Recurrent neural networks and many more.
Multi Layer Perceptron : Multi Layer Perceptron An artificial neuron does the task of taking weighted sum of the inputs and passing it through an activation function.
If the problem is non-linear in nature then it is unable to solve the problem using single-layer perceptron.
In MLP, all perceptron are arranged in an orderly manner of layers.
The first layer is the input layer also called as passive layer where the inputs are applied which are further passed to the next layer for processing.
The last layer is the output layer which is also called as the active layer.
The inputs are given to these neurons in the form of neural connections in-between the neurons. Each connection has a weight associated with it. Each neuron first computes the weighted average of the input and then applies the activation function.
Architecture of MLP : Architecture of MLP
Equations : Equations
Back Propagation Algorithm : Back Propagation Algorithm Here the algorithm sets the various weights and biases of the ANN to their most optimal values.
The aim of training is to ensure that the network gives the closest possible outputs, as compared to the target, for the data that it was trained with.
The basic methodology for this algorithm is the application of inputs, computation of outputs and comparison of the outputs to the target that gives us the errors.
The errors are back propagated in a backward direction starting from the output layer to the input layer.
The propagation of errors is used by all the neurons of the various layers to carry forward adjustments in their weights and parameters followed by adjustment of the input to maximize the output.
Back Propagation Algorithm : Back Propagation Algorithm BPA depends highly upon the learning rate and momentum.
Equation for the change of weight from jth to kth neuron,
Role of Learning Rate & Epochs : Role of Learning Rate & Epochs Learning rate : It determines the speed with which the network gets trained. Very large value of learning rate leads to a rapid change in the ANN state towards the local minima.
Epochs : Epochs are the total number of iterations in the training algorithm. Low epochs values are undesirable as network state will march minima. Large values of epochs may further cause problems of performance in the testing data.
Role of Momentum : Role of Momentum Momentum : Momentum does the task of pulling networks out of local minima, in order to enable it achieve global minima.
ANN is moved in the direction in the same direction in which it was moving with a certain force irrespective of the direction and magnitude that the gradient computes.
Large values of the momentum would result in the ANN constantly moving in a direction without trying to converge to minima. This would make the algorithm give high error values.
Very low values of momentum will make the algorithm converge towards the local minima.
Radial Basis Function : Radial Basis Function This model has a simple 3 layer ANN architecture.-input layer, hidden layer and output layer.
RBF Conceptualization : RBF Conceptualization BPA tries to form a unanimous function of specific complexity to map the inputs to outputs, but
RBF networks partition the input space with neuron of hidden layer as its centre of partitioning which is not discrete in nature, it is rather fuzzy i.e. any point in the input space actually belongs to all the partitions with varying degrees in a Gaussian or exponential manner.
At the application of any input they get activated to varying degrees with the activation usually being largest for the neuron that lies closest to the input.
Each hidden neuron has some idea of the output that is prevalent in the region it represents. If the problem is simple, we may assume that the output does not change appreciably with small changes in inputs which are basically, stored as weights of the connections between the hidden and the output layer. These get multiplied by the activation of the corresponding hidden neuron.
This may be taken as a competition between the various hidden nodes to influence the final output of the system.
Working of RBF : Working of RBF I <i1 i2 i3 i4 ….. in>is the input
The output pk of the kth hidden neuron, found at a location of Hk <hk1,hk2,hk3,…hkn> is given by,
The outputs of the various neurons of the hidden layer serve as the inputs of the output layer. This layer is a linear layer and simply computes the weighted sum of the various hidden layer neurons.
Each connection of this layer corresponds to the weight that is multiplied by the associated hidden neuron
Here λij are the weights from the ith output node to jth hidden node
No. of Neurons & Spread : No. of Neurons & Spread No. of neurons: It varies at the time of construction. The larger number of neurons denotes more partitioning of the input space which leads into problems of loss of generality, drop of resistance to noise, etc. However the training time reduces.
Hence the number of neurons should be as less as possible, just was the case with MLP with BPA.
Spread : The larger the spread more is the reach of the hidden neuron which would result in a large intermixing in the input space. Hence any point in input space would be largely affected by many neurons rather than just a very few neighboring neurons. This would add a lot to the increase of generalization.
Reducing the spread or the number of neurons would make the problem or the network very localized in nature. Here a neuron affects only its immediate neighboring points.
Learning Vector Quantization : Learning Vector Quantization These are also three layer networks i.e. input, hidden and output layer.
Once an input is provided to the system, then only one of the output neurons is active that gives an output and the class corresponding to this neuron is stated as final output of the class network.
The major difference with other ANN discussed above is that it has limited connections between the hidden layers. All the connections are of same unity weight.
LVQ Architecture : LVQ Architecture
Working of LVQ : Working of LVQ I <i1 i2 i3 i4 ….. in> is the input
The output pk of the kth hidden neuron, found at a location of Hk <hk1,hk2,hk3,…hkn> is given by,
LVQ follows a winner-takes-all criterion. Here the different neurons in the hidden layers compete with each other for influencing the output in their favor. The final winner is the neuron with the least distance.
The winning neuron produces an output that goes to the next layer where a connection exists between this neuron to the neuron corresponding to the classes to which this neuron belongs to. The output generated by this neuron goes to the stated class of the next layer or the output layer. This is returned as the output of the system.
Self Organizing Maps : Self Organizing Maps These are classifiers that undergo unsupervised learning.
Here only the inputs are given to the system at the time of training, the outputs are not given.
SOMs are able to cluster the given input data and represent them in a simple and less memory intensive manner. These clusters are then used to find out the correct output class to any of the applied input. These are natural clustering agents.
Working is same as of LVQ. The basic difference comes is in the unsupervised mode of training in which every neuron tries to associate itself with the applied input.
The hidden neurons are arranged in Self Organizing Feature Map (SOFM).
Self Organizing Feature Map : Self Organizing Feature Map Rectangular topology is commonly used in which the neurons are found at the corners of a rectangle.
The major challenge is the effective placement strategy of the neurons that is done by the training algorithm.
In an ideal scenario the inter-class distances are high and neurons are placed at the regions where most of the data for a class is found whereas,
In a non-idealistic scenario as well, there is a large placement of neurons where bulk of the data is found and sparsely populated regions are left out.
The basic methodology is the mapping up of the input space to the feature space but the unequal placement strategy makes the algorithm better in efficiency.
Here, Hk <hk1,hk2,hk3,…hkn> are hidden neurons, I <i1 i2 i3 i4 ….. in> are input neurons, η(t) is the learning rate that decreases with time
SOFM Architecture : SOFM Architecture
Adaptive Resonance Theory : Adaptive Resonance Theory The major problem of most of the ANN models is plasticity i.e. it uses the recent data provided to the system and forgets the old one which was given earlier to the system. Hence, uses the recent features. This problem has a large impact on the system in terms of generalization.
ART solves the problem of plasticity. The old data is restrained but there is a possibility that new data may not be learnt perfectly.
This is done by the addition of an orientation system in the architecture of the ANN which is also called as plasticity control.
The modified ANN model hence consists of the input layer, the hidden layer or the comparison layer and the output layer or the competitive layer.
ART Architecture : ART Architecture
Working of ART : Working of ART The orientation system monitors the training process that changes the location of the hidden neurons in the input space. It traces a learning action that can cause forgetting a neuron.
In case of ART the backward phase is executed where the input vector and the location of the neuron are matched.
If the difference between their locations is small, the learning continues otherwise it is inhibited.
The winning neuron is then deactivated and the entire process is repeated for the rest of the neurons.
Again the winning class and neuron is identified and the change is made which is later verified for its closeness. If again the neuron lies far enough, the training is cancelled.
The process is continued until a close neuron is matched.
Recurrent Neural Networks : Recurrent Neural Networks These networks are cyclic in nature.
These networks allow backward connections where every neuron gets the feedback from the forward layers as well as itself. This allows the processing of data and transmit the output in both forward and backward direction.
This algorithm is operated in timestamps i.e. iterations are performed by each of the neurons in single iteration.
Training of the recurrent neural network involves the setting of the various weights. Weights exist between the forward as well as the feedback neurons.BPA algorithm with the incorporation of timestamp is used for training of the weights.
Expensive due to bulk of processing and used nowadays in series forecasting, speech processing, etc.
Recurrent Neural Network Architecture : Recurrent Neural Network Architecture
Problems with ANN : Problems with ANN Training : These are iterative in nature and when trying to minimize the error they may lead the system to local minima instead of reaching the global minima.
Architecture needs to be fixed by designer i.e. various parameters need to be set.
Online learning
Plasticity control
Large time of training
Training v/s memory tradeoff