1.2 Training

Training overview

Training a feed-forward neural network generally involves the following elements:

Training dataset: a set of tuples that consist of expected output and associated input(s)
A loss function: a function which quantifies the difference between the expected output and the predicted output
Computational power to continuously run the forward propagation and backwards propagation processes

An example of a loss function would be the sum-of-squares, which is as follows:

$L = \sum \limits_{i=1}^N (\hat{y} - y)^2$

Where $\hat{y}$ represents the predicted output, $y$ the expected output, and $N$ the total quantity of training dataset samples to run through the model.

Diagram 1.2.1: Overview of the training process for a feed-forward neural network with 1 hidden layer.

How is a feed-forward neural network trained?

A large quantity of data, that features feed-forward neural network inputs, and expected feed-forward neural network outputs, is collated; this is known as training data
A loss function is selected for the feed-forward neural network (both suitable for the quantity of inputs and outputs of the feed-forward neural network, and also appropriate for the type of data input and output), which quantifies the difference between the expected output of the feed-forward neural network (derived via the associated training data input), and the actual output of the feed-forward neural network
The weights within a feed-forward neural network are set to random values, usually between 0 and 1
One instance within the training data is run through the feed-forward neural network, and the difference between the expected and actual output is deduced
The weights within the network are adjusted, via backproagation

The last two steps are then repeated for the whole training dataset. Simultaneously, a method known as gradient descent is generally the most common way of iteratively adjusting the weights (details omitted).

Practice questions

1. What issues may occur if there are inaccuracies within the training dataset?

Answer

The neural network may develop such that the inaccuracies are reproduced, when it is generating predictions.

2. What might a loss function look like for a neural network with multiple neurons in the output layer?

Answer

A combination of $N$ common loss functions which could each be applicable to a neural network with only one neuron, such that each is weighted, is one plausible method; $\sum_{i=0}^N \alpha_i L_i$ .

The most essential thing about any selection of loss function, is that it quantifies the difference between the predicted and expected output of the model.

1.1 Backpropagation 1.3 Implementation