10.3.1 Linear Regression, Logistic Regression

These two algorithms are fundamental in supervised learning and are often among the first ones learned when studying machine learning. They are used for prediction tasks.

Linear Regression

Imagine you have a scatter plot of data points showing how much ice cream is sold at different temperatures. As the temperature goes up, ice cream sales generally go up too. Linear Regression is like drawing the "best fit" straight line through these points to predict sales for a new temperature.

What it does: Predicts a continuous output value (a number) based on one or more input features. It assumes a linear relationship between the input(s) and the output.
Think of it like: Finding a straight line that best describes the relationship between two things.
How it works: The algorithm tries to find the optimal coefficients (slope and intercept for a simple line) that minimize the difference between the predicted values and the actual values in the training data. This is often done using a method called "least squares."
Use Cases:
- Predicting house prices based on square footage.
- Predicting sales figures based on advertising spend.
- Estimating a student's exam score based on study hours.

Bibliography:

IBM - What is linear regression?: https://www.ibm.com/topics/linear-regression
GeeksforGeeks - Linear Regression in Machine Learning: https://www.geeksforgeeks.org/linear-regression-in-machine-learning/
Wikipedia - Linear regression: https://en.wikipedia.org/wiki/Linear_regression

Logistic Regression

Now, imagine you want to predict if a student will pass or fail an exam based on their study hours. The answer isn't a continuous number; it's a "yes" or "no" (or "pass" or "fail"). Logistic Regression is like drawing a special S-shaped curve that helps you decide which side of the curve a student falls on, indicating their likelihood of passing.

What it does: Predicts the probability of a binary outcome (e.g., Yes/No, True/False, 0/1) based on input features. Despite "regression" in its name, it's used for classification problems.
Think of it like: Drawing a soft, S-shaped boundary to separate two groups.
How it works: Instead of directly predicting a value, it predicts the probability that an input belongs to a certain class. It then uses a threshold (ee.g., if probability > 0.5, classify as "Yes") to make a final classification. It uses a "sigmoid" function to map any real-valued prediction into a probability between 0 and 1.
Use Cases:
- Email spam detection (spam or not spam).
- Disease prediction (patient has disease or not).
- Customer churn prediction (customer will leave or stay).
- Predicting if a loan applicant will default or not default.

Bibliography:

IBM - What is logistic regression?: https://www.ibm.com/topics/logistic-regression
GeeksforGeeks - Logistic Regression in Machine Learning: https://www.geeksforgeeks.org/machine-learning/understanding-logistic-regression/
Wikipedia - Logistic regression: https://en.wikipedia.org/wiki/Logistic_regression