1.0 Models

Simple models

A model is a mathematical function, or set of functions in which outputs are chained to inputs. The purpose of a model is to make predictions based on inputs.

An example of the simplest possible model would be:

y = mx + c

In the context of Machine Learning, this is instead represented as:

y = w_1 x_1 + b

In the above example, $w_1$ is known as a weight, and $b$ as a bias parameter. Parameters of the model are optimised during the training stage.

Neural networks

A common form of model in Machine Learning is known as a feed-forward neural network. The following is a minimalist example of a feed-forward neural network:

Diagram 1.0.1: Example of a basic feed-forward neural network, with only 3 inputs, 1 hidden layer, and 1 output.

Diagram 1.0.2: Continued example; inspection of inputs, contents, and outputs of one neuron (h₁) within the hidden layer of Diagram 1.0.1. Specifically, this is a linear regression function wrapped in an activation function σ, which could alternatively be written σ(f(x)).

What is a feed-forward neural network?

A feed-forward neural network is made up of layers
Each layer is made up of neurons
Each neuron within the same layer is assigned the same function, but different weights
The weights of the functions are adjusted during the training stage of a feed-forward neural network
The types of functions involved are typically linear regression functions, such as in Diagram 1.0.2
A linear regression function within a neuron may contain, at a maximum, a quantity of weights $w_i$ which is equivalent to the quantity of neurons of the previous layer
Inputs are taken from the outputs of a previous layer of functions, processed, and then output to the next layer of functions
Functions within neurons are typically wrapped in activation functions, and they may cause certain neurons within a layer to output zero unless a specific condition is met

The choice of quantities of neurons, layers, and the functions that make up them, is known as the model architecture.

Consider that, as a result of the above, a feed-forward neural network can be represented as one giant mathematical function. But, as neural networks grow, they can quickly become extremely algebraically complicated. And so, instead, the graphical format is common, in teaching.

The notion of the activation function is based on human brain synapses, in which neurons within the brain only pass on information when they are stimulated enough. As a result, neural networks are based on the mechanisms of the human brain.

Practice questions

1. Considering that a brain neuron will pass information on only when excited, what conditions make for a typical activation function?

Answer

Theoretically, any function which passes on information only when the input begins to pass a certain limit. And so we could say any function you are familiar with which is increasing for all values of x, beyond a certain value of x. Alternatively, even a piecewise function could do:

$f(x) = \begin{cases} 0 & x\leq 0 \\ x^2 & 0\lt x \end{cases}$

However, as the reader continues through this textbook, it will become clear why some functions are more preferable than others.

2. Are the weights within a model considered part of the model architecture?

Answer

The positions and quantities of weights are considered part of the architecture, but the actual numerical values that they are eventually replaced with are often handled as a separate data structure.

3. How many hidden layers and neurons can a neural network have?

Answer

Infinite. Allegedly there are ~100 billion neurons in the human brain.

1.1 Backpropagation