Logistic Regression
Logistic Regression
▪
Logistic regression is widely used for
binary classification problems. It gives binary output. A binary dependent
variable can have only two values, like 0 or 1, win or lose, pass or fail,
healthy or sick, etc.
▪
Logistic
Regression: Output is in Binary
▪
It can also be extended to multi-class classification problems whose output can be
classified into more than one category.
▪
Logistic regression can be used to predict
a dependent variable(output) on the basis of continuous and/or categorical independents(input)
▪
The dependent variable(output) is
categorical: y ϵ {0, 1}
· It
is scalable and can be very fast to train. It is used for
–
Spam filtering
–
Web site classification
–
Product classification
–
Whether prediction etc.
· The only caveat is that it can overfit very sparse data, so it is often used with Regularization.
There are two
types of logistic regression
•
Binary logistic regression-It
is used when the dependent variable(output) has only two values, such as
0 and 1 or Yes and No.
e.g Students can pass or
fail is based on the number of hours the student has studied.
e.g Mail in the mail box
is spam or non-spam
•
Multinomial
logistic regression It is
used when the dependent variable(output) has three or more classes e.g weather prediction is classified into four classes as cloudy, rainy, hot, or cold.
Why
and when should we use Logistic Regression
Logistic
regression does not assume a linear relationship between the dependents and the
independents
If we try to plot a Linear regression line then it will not
classify the data properly. Instead of linear regression, it requires a logistic
regression curve. Linear regression curve can not be used to classify binary
data, because it does not have a normal Distribution.
The graph shows the number of study hours vs. students pass or fail. If we plot Linear the curve then it will not classify data properly
If we plot logistic curve then it will classify data
properly as shown below
Mathematical
model
Steps
to find a Logistic Regression curve:
1.Find the probability of
events happening
2 Find the ODDS- Odds is
the probability of something occurring divided by the probability of not
occurring and is given as
In this equation, t
represents data values * number of hours studied and S(t) represents the probability of passing
the exam.
The points lying on the
sigmoid function fits are either classified as positive or negative class. A threshold is decided for classifying the
cases.
The Estimated Regression
equation :
The antilog of the logit function allows us to find the estimated regression equation.
We know that
This is the estimated regression equation.
Difference
between logistic and Linear regression
Linear •
To predict a
continuous dependent variable based on the value of the independent variable
•
Dependent
variable is always continuous
•
Least square
method is used •
Output is
linear curve •
Y=bo+ b1*x+e •
Business
prediction, Cost prediction
|
Logistic •
To predict
categorically dependent variable based on continuous or Categorical independent
variable •
Dependent
variable is categorical
•
Maximum
likelihood probability is used •
Output is Sigmoid
curve •
Predicted
binary value •
Classification
Problem e.g Image prediction
|
Comments
Post a Comment