Learn Linear Regression using Excel – Machine Learning Algorithm
Learn Linear Regression using Excel - Machine Learning Algorithm
Beginner guide to learn the most well known and well-understood algorithm in statistics and machine learning. In this post, you will discover the linear regression algorithm, how it works using Excel, application and pros and cons.
Quick facts about Linear Regression
It’s a basic and commonly used type of predictive analysis
Three major uses for regression analysis are
(1) Determining the strength of predictors
(2) Forecasting an effect
(3) Trend Forecasting
There are several types of linear regression analyses available to researchers.
- Simple linear regression
- Multiple linear regression
- Logistic regression
- Ordinal regression
- Multinominal regression
- Discriminant analysis
Watch a video on Linear Regression
What is a linear regression?
As the name implies, linear regression is an approach to modeling the relationship between a dependent variable ‘y’ and one or more independent variables denoted as ‘x’ in a linear form. Linear means that the dependent variable is directly proportional to the independent variables. Keeping other things constant if x is increased/decreased then Y also changes linearly. Mathematically the relationship is expressed in the simplest form as:
y = Ax + B
Here A and B are constant factors. The goal in supervised learning using linear regression is finding the value of constants ‘ A’ and ‘ B’ using the data sets. Then we can use it to predict the values of ‘y’ in the future for any values of ‘x’. Now, the cases where we have a single independent variable is called simple linear regression, while if there is more than one independent variable, then the process is called multiple linear regression.
Application of Linear Regression
Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine. Linear regression can be applied to many situations. Most of the applications fall into one of the following two broad categories:
- Prediction:
In prediction or forecasting, linear regression can be first used to fit a predictive model to an observed data set of ‘y‘ and ‘x‘ values. After developing such a model, the fitted model can be used to make a prediction of the value of ‘y‘ for an additional value of ‘x‘.
For example: If we have a dataset of rainfall amounts and corresponding temperatures, then we can fit a linear model and use it to predict the amount of rainfall for a temperature value whose rainfall amount is not known beforehand. - Finding strength of relationship:
Given a variable y and a number of independent variables x1, …, xp that may be related to y, linear regression analysis can be used to quantify the strength of the relationship between y and the xj, to assess which xj may have no relationship with y at all, and to identify which subsets of the xj contain redundant information about y.
For example: If we have a dataset of rainfall amounts and corresponding humidity and temperatures, then we can use regression analysis to find out how strongly does the amount of rainfall depends upon each of these factors.
To estimate the parameters of the linear regression model various techniques can be used. The most common ones are Least Squares (LS) method and maximum-likelihood estimation methods. Let’s discuss here an example of simple linear regression using ordinary least squares method.
Least squares estimation:
How do we go about picking or finding the parameters of the model? One way is to make the predicted value of ‘y’ as close to the actual value of the training set. For example: Suppose we have a training data (x_data, y_data). Then the reasonable thing to do would be to make the predicted value ‘y’ as close to y_data as possible. Therefore we try to minimize the sum of the square of the error i.e
S = Sj (y_dataj – yj)2
For the simple case of a single independent variable after solving we obtain the following formulas for A0 and A1.
Let’s look at an example: The data in the table below shows the temperature during the race and the corresponding average finish time in minutes of a marathon.
From the scatter plot we can see that the relationship between the x & y is somewhat linear.
Using the formula we get the values of the parameters,
A0 = 0.688 , A1= 191.83
How do we make Prediction?
Prediction for a new test value of x is done simply by putting the value in the equation for the linear regression model. We now have the parameters of the simple linear regression model.
y = 0.688x + 191.83
We can use it to predict the average completion time for different temperatures. For example, when the temperature is 71 F our model predicts the average completion time to be,
y = 0.688*71 + 191.83 = 240.68 minutes.
Where is linear regression applied?
Linear regression has found extensive application in biology, behavioral and social sciences, and economics. In Epidemiology regression analysis helps in identifying the causations and correlations between various practices and diseases. It is the helps to identify the risk factors for disease and target population for preventive health care. Regression analysis was used in early studies to find evidence relating tobacco smoking to mortality and morbidity. It has also been used to identify the risk factors associated with various types of cancer. It has also been used for predicting the risk of various types of cancer in individuals.
In Econometric regression analysis is very extensively used to do prediction in economics. For instance, it is used to predict consumption spending, fixed investment spending, inventory investment, and purchases of a country’s exports, spending on imports, the demand to hold liquid assets, labor demand, and labor supply. Linear regression is a common technique used to forecast future stock values using the least squares fit method. It can be used to determine whether the market is trending up or down and what should the price be, given the recent trend of prices.
The smaller the error more accurate the prediction made by the linear regression model.
Pros and Cons of Linear Regression:
?
Linear regression is an extremely simple method. It is very easy and intuitive to use and understand. A person with only the knowledge of high school mathematics can understand and use it. In addition, it works in most of the cases. Even when it doesn’t fit the data exactly, we can use it to find the nature of the relationship between the two variables.
?
By its definition, linear regression only models relationships between dependent and independent variables that are linear. It assumes there is a straight-line relationship between them which is incorrect sometimes. Linear regression is very sensitive to the anomalies in the data (or outliers).
Take for example most of your data lies in the range 0-10. If due to any reason only one of the data item comes out of the range, say for example 15, this significantly influences the regression coefficients.
Another disadvantage is that if we have a number of parameters than the number of samples available then the model starts to model the noise rather than the relationship between the variables.
You Learn Best By Implementing Algorithms From Scratch in Excel