Difference Between Linear Regression and Logistic Regression

# Difference Between Linear Regression and Logistic Regression

clickHere
Vikram Singh
Assistant Manager - Content
Updated on Nov 22, 2022 19:26 IST

Linear Regression and Logistic Regression are Supervised Machine Learning models that use labelled datasets to make predictions. However, there’s a fundamental difference in their usage – Linear Regression is used for Regression problems, whereas Logistic Regression is mainly used for solving classification problems.

Linear regression and logistic regression are among the most commonly used models in supervised machine learning. Each model has their application, characteristics, advantages, and limitations. This article will provide a comprehensive understanding of both (linear and logistic regression) models, highlighting their differences, limitations and real-life applications.

## Difference Between Linear Regression and Logistic Regression

Parameter Linear Regression Logistic Regression
Outcome Variable Type Continuous variable (e.g., price, temperature) Categorical variable, typically binary (e.g., yes/no, 0/1)
Model Purpose Regression (predicting numerical values) Classification (categorizing into discrete classes)
Equation/Function Linear equation: Y = β0 + β1X + ε Logistic (Sigmoid) function: p(X) = 1 / (1 + e^-(β0 + β1X))
Output Interpretation Predicted value of the dependent variable Probability of a particular class or event
Relationship Between Variables Assumes a linear relationship between variables Does not assume a linear relationship; models probability
Error Distribution Assumes normally distributed errors Does not assume a normal distribution of errors
Estimation Method Ordinary Least Squares (OLS) Maximum Likelihood Estimation (MLE)
Sensitivity to Outliers More sensitive to outliers Less sensitive to outliers
Homoscedasticity Assumption Assumes homoscedasticity (constant variance of errors) No assumption of homoscedasticity
Application Scope Suitable for forecasting, effect analysis of independent variables Ideal for binary classification in various fields

Check out how these two differ from each other on other parameters

## What is Linear Regression?

A statistical method used in machine learning to predict a continuous output variable based on one or more input variables. It assumes a linear relationship between the variables.

In simple terms, linear regression finds the best-fit line/plane between a dependent and independent continuous variable(s).

The Mathematical Formula of Linear Regression

Y = C0 + C1X1 + C2X2 + ... + CnXn + k

where

Y: Dependent Variable (or target variable)

Xi: Independent Variable

C0: Intercept of the regression line

Ci: Slope of the regression line

k: Error term

### Types of Linear Regression

Linear Regression is broadly classified into two categories based on the number of independent variables.

### Simple Linear Regression

It involves only one independent variable to predict the value of the dependent variable.

The mathematical equation of simple linear regression is:

Y = C0 + C1X1 + k

Example of Simple Linear Regression

Predicting house prices based on their size in square feet.

### Multiple Linear Regression

It involves two or more independent variables used to predict the dependent variable, i.e., it examines the linear relationship between multiple independent variables and a single dependent variable.

The mathematical equation of multiple linear regression is:

Y = C0 + C1X1 + C2X2 + ... + CnXn + k

Example of Multiple Linear Regression

Predicting a car's fuel efficiency based on parameters like engine size, weight, and horsepower.

### Limitation of Linear Regression

• It assumes there is a linear relationship between the dependent and independent variables. This assumption is only sometimes correct.

Example: The relationship between consumer spending and income is not always linear. After a point, an increase in the income might not be directly proportional to the spending.

• Linear regression is always sensitive to outliers, i.e., a few extreme values can distort the regression line and affect the accuracy of the predictions.

Example: In real estate, if any apartment has some extraordinary feature that increases its price and can skew the overall prediction of the house in a neighbourhood.

• In multiple linear regression, if the independent variables are highly correlated, then it will be difficult to determine the individual effect of each variable on the dependent variable.

For example, parameters like market capitalization and trading volume might be correlated in stock market analysis. The correlation of these parameters makes it challenging to assess their individual impact on stock prices.

• Linear regression assumes that the residuals are independent, which is not true in real-world scenarios.

Example: In time series data like temperature forecasting, error can be correlated across time, which violates the above assumption.

• Linear regression assumes that the variance of the error term should be constant across all the independent variable levels.

Example: When you predict the value of a car, the variability in the prices might increase with the car's age.

### Real-Life Application of Linear Regression

• To estimate the market value of the properties. It considers parameters like property size, location, number of bedrooms, etc. This helps the buyers and sellers to make informed decisions about the property in any locality.
• Predict the stock prices based on historical data, which includes parameters like trading volume, market capitalization, and economic indicators. This helps investors and financial analysts make investment decisions and manage portfolios.

## What is Logistic Regression?

A statistical technique used for classification problems. Unlike linear regression, it predicts categorical outcomes (usually binary outcomes like 0/1, Yes/No, True/False).

In simple terms, Logistic Regression learns a model for the binary classification of discrete variables. Here, the dependent variable is categorical in nature.

### The Mathematical Formula for Logistic Regression

The function used in logistic regression is known as Sigmoid Function. The sigmoid function is given by:

p(X) = 1/1+e-(c0 + c1X)

where,

p(X): predicts the probability that the outcome is 1 for a given value of X.

e: the base of natural logarithms

c0 & c1: coefficient for intercept and the slope, respectively.

X: Independent Variable

### Limitations of Logistic Regression

• It assumes the linear relationship between the dependent and independent variables' log odds(logit). This can be restrictive if the actual relationship is non-linear.

Example: The effectiveness of a treatment might not linearly correlate with the patient's age or the dosage.

• Logistic Regression can overfit when irrelevant independent variables are added and underfit if the important variables are omitted.

Example: In credit score, if you include too many irrelevant financial attributes, it might overfit the model to the training data.

• Similar to linear regression, logistic regression is highly sensitive to multicollinearity.

Example: While analyzing the attrition in any company, parameters like job satisfaction and employee engagement might be correlated. Both these parameters may not give their individual effects on attrition.

### Real-Life Application of Logistic Regression

• To predict the likelihood of having any disease (Heart Disease or Diabetes) using parameters like age, weight, cholesterol level, blood pressure, etc.
• To classify emails as spam or not spam using parameters such as specific words, sender's address, email content, etc.
• Companies like Walmart and Amazon used logistic regression to predict the likelihood of customer discounting services based on usage patterns, customer service interaction, billing information, etc.

## Similarities Between Linear and Logistic Regression

• Both linear and logistic regression are used in Supervised Learning, where the model is trained on a labeled dataset with input feature and the target variable.
• Both are forms of regression analysis. Linear regression is used to predict continuous outcomes, while logistic regression is used for binary classification, but both essentially model the relationship between dependent and independent variables.
• Both equations use coefficients (like slope and intercept in Linear Regression and weights in Logistic Regression).
• Both models can suffer from overfitting if not properly regularized or if trained on datasets with too many irrelevant features.

## Key Difference Between Linear Regression and Logistic Regression

• Linear regression is used for regression problems, while logistic regression is used for classification problems.
• Linear regression predicts a continuous outcome variable, whereas logistic regression predicts a categorical outcome variable.
• Linear regression uses ordinary least squares for parameter estimation, while logistic regression uses maximum likelihood estimation.
• Linear regression assumes constant variance (homoscedasticity) of errors, whereas logistic regression assumes no such homoscedasticity assumption.
• Linear regression is ideal for forecasting and determining the linear effect on independent variables. In contrast, logistic regression is best suited for binary classification problems in medicine, marketing, and finance.

## Conclusion

Linear Regression and Logistic Regression are two of the most popularly used algorithms in Supervised Machine Learning. I hope this blog on Linear Regression vs Logistic Regression was helpful in understanding the difference between the two and it aroused your interest in Machine Learning.

Keep Learning!!

Keep Sharing!!