Logistic Regression

 Logistic Regression

         Logistic regression is widely used for binary classification problems. It gives binary output. A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail, healthy or sick, etc.

         Logistic Regression: Output is in Binary

         It can also be extended to multi-class  classification problems whose output can be classified into more than one category.

         Logistic regression can be used to predict a dependent variable(output) on the basis of continuous and/or categorical independents(input)

         The dependent variable(output) is categorical: y ϵ {0, 1}

·       It is scalable and can be very fast to train. It is used for

    Spam filtering

    Web site classification

    Product classification

    Whether prediction etc.

·       The only caveat is that it can overfit very sparse data, so it is often used with Regularization.

There are two types of logistic regression

       Binary logistic regression-It is used when the dependent variable(output) has only two values, such as 0 and 1 or Yes and No.

e.g Students can pass or fail is based on the number of hours the student has studied.

e.g Mail in the mail box is spam or non-spam

       Multinomial logistic regression It is used when the dependent variable(output) has three or more classes e.g weather prediction is classified into four classes as cloudy, rainy, hot, or cold.

Why and when should we use Logistic Regression

 Logistic regression does not assume a linear relationship between the dependents and the independents

 If we try to plot a Linear regression line then it will not classify the data properly. Instead of linear regression, it requires a logistic regression curve. Linear regression curve can not be used to classify binary data, because it does not have a normal Distribution.

The graph shows the number of study hours vs. students pass or fail. If we plot Linear the curve then it will not classify data properly


If we plot logistic curve then it will classify data properly as shown below



Mathematical model

Steps to find a Logistic Regression curve:

1.Find the probability of events happening

2 Find the ODDS- Odds is the probability of something occurring divided by the probability of not occurring and is given as
                         

  
    

In this equation, t represents data values * number of hours studied and  S(t) represents the probability of passing the exam.

The points lying on the sigmoid function fits are either classified as positive or negative class. A threshold is decided for classifying the cases.

The Estimated Regression equation :

The antilog of the logit function allows us to find the estimated regression equation.

We know that

This is the estimated regression equation. 

Difference between logistic and Linear regression 

Linear

      To predict a continuous dependent variable based on the value of the independent variable

 

      Dependent variable is always continuous

 

      Least square method is used

      Output is linear curve

      Y=bo+ b1*x+e

      Business prediction, Cost prediction

 

Logistic

      To predict categorically dependent variable based on continuous or Categorical independent variable

      Dependent variable is categorical

 

      Maximum likelihood probability is used

      Output is Sigmoid curve

      Predicted binary value

      Classification Problem

e.g Image prediction

 






 

Comments

Popular posts from this blog

WHY DEEP NEURAL NETWORK: Difference between neural network and Deep neural network

Linear Regression and Gradient Decent method to learn model of Linear Regression.