10.2.2 Unsupervised Learning (Clustering, Dimensionality Reduction)

Imagine you have a big box full of mixed LEGO bricks of different colors and shapes, but no one tells you what to do with them. You start playing around and notice that certain colors tend to go together, or certain shapes fit nicely with others. You start sorting them into groups based on what you observe, without any instructions.

Unsupervised Learning is a type of Machine Learning where the computer is given unlabeled data. This means there's no "teacher" telling it the right answers. Instead, the computer's job is to find hidden patterns, structures, or relationships within the data all by itself. It's like letting the computer explore and discover things on its own.

This is especially useful when we have a lot of data but don't know what patterns to look for, or when it's too difficult to label all the data manually.

There are two main types of problems that unsupervised learning solves:

Clustering:
- What it does: Groups similar data points together into "clusters" or categories, without being told what those categories are beforehand.
- Think of it like: Sorting those mixed LEGO bricks by color and shape without a guide.
- Examples:
  - Grouping customers into different segments based on their shopping habits (e.g., "bargain hunters," "luxury buyers").
  - Organizing news articles into topics (e.g., "sports," "politics," "entertainment") automatically.
  - Finding groups of similar genes in biological data.
Dimensionality Reduction:
- What it does: Simplifies complex data by reducing the number of "features" or "dimensions" while trying to keep the most important information.
- Think of it like: Taking a very detailed map and creating a simpler, zoomed-out version that still shows the main roads and cities, but removes tiny details like every single tree. This makes the map easier to understand.
- Examples:
  - Reducing the number of colors in an image to make its file size smaller, while still looking similar to the original.
  - Simplifying complex scientific data with many measurements into a few main factors that explain most of the variation.
  - Making data easier to visualize when it has too many characteristics to plot on a simple graph.

Unsupervised learning helps us make sense of large amounts of raw data, discover hidden insights, and prepare data for other types of analysis.

Bibliography:

IBM - What is unsupervised learning?: https://www.ibm.com/cloud/learn/unsupervised-learning
GeeksforGeeks - Unsupervised Machine Learning: https://www.geeksforgeeks.org/machine-learning/unsupervised-learning/
Wikipedia - Unsupervised learning: https://en.wikipedia.org/wiki/Unsupervised_learning