Learn Logistic Regression using Excel – Machine Learning Algorithm
Learn Logistic Regression using Excel - Machine Learning Algorithm
Logistic Regression using Excel: A Beginner’s guide to learn the most well known and well-understood algorithm in statistics and machine learning. In this post, you will discover everything Logistic Regression using Excel algorithm, how it works using Excel, application and it’s pros and cons.
Quick facts about Logistic Regression
Logistic Regression using Excel is a statistical classification technique that can be used in market research
Logistic Regression algorithm is similar to regular linear regression.
The factual part is, Logistic regression data sets in Excel actually produces an estimate of the probability of a certain event occurring
– Logistic Regression Excel is an add-in also, a multidimensional feature space (features can be categorical or continuous)
– An outcome is discrete, not continuous if you know how Logistic Regression in Excel Works.
– Logistic Regression Software seems plausible that a linear decision boundary (hyperplane) will give good predictive accuracy
Watch a video on Logistic Regression
What is a Logistic regression?
While you happen to use Logistic Regression using Excel, you must know that it is a supervised machine learning algorithm. It is used for classification. Though the ‘Regression’ in its name can be somehow misleading let’s not mistake it as some sort of regression algorithm.
The name logistic regression Excel add-in which is the real statistical data analysis tool in Excel. It actually originated and came from a special function called Logistic Function which plays a central role in this method.
What does Logistic regression do?
A probabilistic model i.e. the term given to Logistic Regression using excel. It finds the probability that a new instance belongs to a certain class. Since it is probability, the output lies between 0 and 1. A Microsoft Excel statistics add-in. When you think of using logistic regression using Excel, as a binary classifier (classification into two classes). We can consider the classes to be a positive class and a negative class.
We can even find the probability using Logistic Regression Online. Well, here Higher the probability (greater than 0.5), it is likelier that it falls to positive class. Similarly, if the probability is low (less than 0.5), we can classify it into the negative class under Logistic Regression datasets Excel.
Let take an example of classifying email into spam malignant and ham (not spam). We assume malignant spam to be positive class and benign ham to be negative class. What we do at the beginning is take several labeled examples of email and use it to train the model. After training it, we can use it to predict the class of new email example. When we feed the example to our model, it returns a value, say y such that 0≤y≤1. Suppose, the value we get is 0.8. From this value, we can say that there is 80% probability that tested example is spam. Thus we can classify it as a spam mail. And this situation clearly gets resolved with Logistic Regression using Excel.
Behind the Scene: Understanding Logistic Regression Mathematics Working
Logistic Regression using Excel uses a method called a logistic function to do its job. Logistic function (also called sigmoid function) is an S-shaped curve which maps any real-valued number to a value between 0 and 1.
The e in the equation is Euler number and z is a boundary function that we will discuss later. We can observe the curve of the logistic function in given figure.
Before applying mentioned function, we need to find a decision boundary. A decision boundary for logistic regression using Excel a linear boundary that separates the input space into two regions. It is a line (hyperplanes for higher dimensions) which can be represented in a similar manner like we did in linear regression, which is:
z=a.x+b , where x is an input variable, a is coefficient and b is biased.
Then we can use our sigmoid function as to make a prediction while you do Logistic Regression online.
Logistic Regression Tool Excel:
Y in the equation is the probability that given example will fall in certain class. Its value ranges from 0 to 1 as the value of sigmoid function ranges from 0 to 1. If the result is near 0, we can say that the example falls to negative class. If it is closer to 1, we can say it falls to positive class. For example, the threshold can be:
prediction= + IF y<0.5
prediction= – IF y≥0.5
Training Logistic Regression Model
Training logistic regression using Excel model involves finding the best value of coefficient and bias of decision boundary z. We find this by using maximum likelihood estimation. Maximum likelihood estimation method estimates those parameters by finding the parameter value that maximizes the likelihood of making given observation given the parameter. Logistic Regression in Excel Example: To elaborate, suppose we have data of the tumor with its labels. We use this data to train our data for the logistic regression model. What maximum likelihood method does is find the best coefficient which makes the model predict a value very close to 1 for positive class (malignant for our case). And then a value very close to 0 for negative class ( benign for our case). When we get the best value of the parameters for our model we can say that the model is properly trained. This model then can be used to make an accurate prediction of some unknown Logistic Regression examples.
Application of Logistic Regression
Xlstat Logistic regression finds its application in various fields. An instance of it is the Trauma and Injury Severity Score (TRISS) developed by Boyd. TRISS is using logistic regression using Excel to make a prediction about mortality in injured patients.
Besides this, it can be used to several other problems like Optical Character recognition, Spam detection, Cancer detection and many more.
Implementation of Logistic Regression in Excel
Fig: Some samples of two classes Technical (1) and Non-technical(0)
We implement logistic regression using Excel for classification. We create a hypothetical example (assuming technical article requires more time to read. Real data can be different than this.) of two classes labeled 0 and 1 representing non-technical and technical article( class 0 is negative class which mean if we get probability less than 0.5 from sigmoid function, it is classified as 0. Similarly, class 1 is positive class and if we get probability greater than 0.5, it is classified as 1). Each class has two features Time, which represent the average time required to read an article in an hour, and Sentences, representing a number of sentences in a book ( here 2.2 mean 2.2k or 2200 sentences). Now we need to train our logistic regression model. Training involves finding optimal values of coefficients which are B0, B1, and B2. While training, we find some value of coefficients i the first step and use those coefficients in another step to optimize their value. We continue to do it until we get consistent accuracy from the model. In our example, we have iterated for 20 times but we can iterate more to get higher accuracy.
From our Logistic Regression using excel implementation, after 20 iteration, we get:
B0 = -0.1068913
B1 = 0.41444855
B3 = -0.2486209
Thus, the decision boundary is given as:
Z = B0+B1*X1+B2*X2
Z = -0.1068913 +0.41444855*Time-0.2486209*Sentences
For, X1 = 1.9 and X2 = 3.1, we get:
Z = -0.101818+0.41444855*1.9 -0.2486209*3.1
Z = -0.085090545
Now, we use sigmoid function from Logistic Regression using Excel to find the probability and thus predicting the class of given variables.
As y is less than 0.5 (y< 0.5), we can safely classify given a sample to class Non-technical.
Pros and Cons of Logistic Regression:
Like other methods, logistic regression using Excel has some benefits and some disadvantages as well. The main points of attraction that encourage us to choose it as our classifying algorithm is that it is very simple yet reliable. And while using this Microsoft Excel Statistics add-in, you won’t need to scratch your head tuning different parameters which is the case in some other methods. In addition to that, it can be quickly trained and can be easily extended to multiple classes.
Despite these advantages, when it comes to handling non-linearity in our data, logistic regression excel fails to satisfy our need.