Gaussian Process
"Let the data speak, if you bring subjective modelling then it’s dangerous." - Someone in YouTube
To find the non linear relationship between the variables
we can use non-linear functions such as a polynomial or exponential
function. But If we don’t have information on the nature of data then
finding the appropriate function can be challenging. For this we can use
non-parametric methods like the Gaussian process(GP).
“The idea of Gaussian process modelling is, without parameterizing y(x)(Prediction function), to place a prior P(y(x)) directly on the space of functions.” - David J.C. Mackay
GP is a method of supervised machine learning which is used for regression and probabilistic classification. It provides the distribution over the samples of prediction functions. It also gives uncertainties present in the prediction.
To call it nonparametric would be a misnomer. Nonparametric does not imply the absence of parameters but rather predictions are obtained without giving the unknown function y(x) an explicit parameterization. The parameters present usually increase infinitetly as the model sees more data.
We can view it as a generalisation of the multivariate gaussian distribution over the infinite dimension. Infinite dimension here means that we have a prediction function that can map infinitely many input values to outputs. As gaussian distribution is characterised by mean and covariance matrix, GP is also characterised by mean function and a covariance function which is given by a kernel.
We keep the mean function zero as the mean does not carry any important information and it’s the covariance function that captures the information required for modelling the GP. Consequently, the covariance matrix determines which type of functions from the space of all possible functions are more probable.
An important property of gaussian distribution that makes GP possible is that the gaussian distribution is closed under marginalisation and conditioning. Which means that the resulting distribution from these operations are also gaussian and this is an important property as this ensures that the obtained results are mathematically tractable.
“The idea of Gaussian process modelling is, without parameterizing y(x)(Prediction function), to place a prior P(y(x)) directly on the space of functions.” - David J.C. Mackay
GP is a method of supervised machine learning which is used for regression and probabilistic classification. It provides the distribution over the samples of prediction functions. It also gives uncertainties present in the prediction.
To call it nonparametric would be a misnomer. Nonparametric does not imply the absence of parameters but rather predictions are obtained without giving the unknown function y(x) an explicit parameterization. The parameters present usually increase infinitetly as the model sees more data.
We can view it as a generalisation of the multivariate gaussian distribution over the infinite dimension. Infinite dimension here means that we have a prediction function that can map infinitely many input values to outputs. As gaussian distribution is characterised by mean and covariance matrix, GP is also characterised by mean function and a covariance function which is given by a kernel.
We keep the mean function zero as the mean does not carry any important information and it’s the covariance function that captures the information required for modelling the GP. Consequently, the covariance matrix determines which type of functions from the space of all possible functions are more probable.
Kernel Function
Kernel is responsible for giving similarity between two variables. We can have functions like RBF, Periodic, linear or
combination of kernels. Prediction functions that are likely to be sampled are controlled by the kernel.
The kernel function recives two points as an input and returns a similarity score between those two in the form of scaler.
The kernel function recives two points as an input and returns a similarity score between those two in the form of scaler.
An important property of gaussian distribution that makes GP possible is that the gaussian distribution is closed under marginalisation and conditioning. Which means that the resulting distribution from these operations are also gaussian and this is an important property as this ensures that the obtained results are mathematically tractable.
Marginalisation
Marginalization is summing out the probability of random variable X,
given the joint probability distribution of X with other variables.
Conditioning
Conditioning determines the probability of one variable depending on another variable.
This allows us to perform Bayesian inference. Through conditioning, we can update our
prior beliefs to obtain new distribution as we observe new data points.