Comparing Bayesian and ML Approach in Linear Regression Machine Learning
In this article we will be discussing the different models of linear regression and their performance in real life scenarios.
ML estimation, known as the maximum likelihood estimation, is a approach to find the parameter w that maximizes the likelihood function
where P(D|w) is the probability mass or probability density(depending on it being continuous or discrete). The problem with ML approach is that it usually causes overfitting issues when lacking sufficient training data.
On the other hand, the Bayesian paradigm utilizes a sequential view when countering inference problems, the learning of the parameters are updated in every single iteration, basically taking time into account. The posterior serves as the prior in the current instant.
Linear Regression is a simple model in basic machine learning course which uses a series of observations and its target values to predict the values of t for any given x. Today we will see how the ML model fares against the Bayesian model in terms of speed, performance and MMSE(minimum mean squared error).
Suppose we have the ground truth data of the weather information in the US, and a total of 500 different data points recording the exact location of where the data was acquired(noted by x1 and x2) and if it’s land based or sea based.
To train the parameters necessary for the linear regression function, we must setup a feature vector derived from the training data. Feature vectors can be selected to your liking, the common ones are polynomial, Gaussian and Logistic sigmoid feature vectors. In this case we will use a Gaussian one.
With a Gaussian feature vector, we must use a parameter to define the domain size in which the Gaussian basis function could be effective. The domain size should be selected carefully as it could effect the performance of the prediction severely in some scenarios.
The implementation of this comparison is done in Matlab, the W of the ML approach is:
whereas the W of the Bayesian model is:
for m0=0, S0^-1=Identity matrix.
The following is the ground truth of the weather in the states:
When the size of the Gaussian basis vector is chose at (64,36), the following shows the prediction with ML approach:
However, as much as you believe that having the Gaussian feature vector as big as possible brings benefit into the prediction results, below shows the most devastating reason of why ML approach never got popular, overfitting:
In the picture above the prediction certain was overfitted to the training data, making the result useless.
On the other hand, when using the Bayesian iterative approach, we see less overfitting in the outcome:
Bayesian approach at (16,9):
Bayesian approach at (27,48):
Bayesian approach at (36,64):
The Bayesian linear model suffers less from overfitting comparing with the ML approach, the overfitting problem was mitigated. Although square loss is pretty much the same for the two model comparing side by side, the ML approach suffers from inconsistency when the domain isn’t chosen right. That means that the ML approach is somewhat of a tried and true model while the Bayesian model promised more consistency across different settings.
Linear Regression can be viewed as machine learning by simple math, the main goal is to find the predictive distribution from prior training data. However in this article you see the difference in using different approaches. The ML approach is always the easy one to start with yet it is the most inconsistent one, the Bayesian model is a lot better yet its iterative nature takes way more time to process than ML approach, it is vital to find the model that suits you best.