From the course: Introduction to Artificial Intelligence

Cluster data

From the course: Introduction to Artificial Intelligence

Cluster data

- Classifying your data doesn't fit every challenge. For starters, the system won't always have access to massive amounts of labeled data. So sometimes you want to have the system create its own data clusters. Clusters are when the machine uses unsupervised learning to create its own groups of data. If you've ever bought something online, you might notice that the store will include something called frequently bought together. Maybe you're buying a computer mouse and it recommends a keyboard. This is a very powerful feature and it helps customers find what they need and increases sales for the company. This is an example of a system using unsupervised learning to create clusters based on what it sees in your purchasing history. The big difference between clustering and classifying is whether you're working with human-created categories or machine-created groups. In general, if you're using supervised learning, you're classifying, and if you're using unsupervised learning, then you're clustering. Think of it this way. Every Halloween, my son goes trick-or-treating. This is when kids wear costumes and go around the neighborhood to get candy. At the end of the evening, my son comes home with hundreds of little pieces of candy. The first thing he wants to do is classify the candy by what he likes the best. Now in the past, my son has benefited from my experience. I've been able to supervise his learning. I can help him create output categories for the candy such as chocolate, peanut butter, mints, and gummies. Then he does his best to classify some of the unknown candies into these categories. This would be the same as supervise machine learning. Now, he also has grandparents that live in a different country. They feel bad that they can't participate in trick-or-treating. So each year, they send a bag of Serbian candy. With this bag, we can't use supervised learning because it's unlabeled. Neither here I have ever seen the candy before and the wrappers are in Cyrillic. So in this case, he does a form of unsupervised learning. He looks at the bag and creates his own clusters. He does this based on his own study of the data. He might create a cluster based on the candy size or the color. In fact, one year, he created a cluster that I'd never considered. It was a small cluster that he called perfume candy made from roses and orange blossoms. This is a key part of unsupervised learning. Like my son, the machine studies the data and then comes up with its own clusters. One of the biggest advantages of clustering is that there's a lot more unlabeled data. There's a lot of candy in the world that I've never seen. So I'm not going to be able to create these output categories. There are also many ways that you might want to use machine learning to create clusters. You might want to have your machine learning algorithms cluster your customers. Then a human can go through and see if there's any patterns. At first glance, these clusters might not seem important. They may even seem trivial, but keep in mind that some of the largest companies are built around creating data clusters. Companies like Amazon, Netflix, and Twitter are all using machine learning to cluster your friends, your search history, and buying habits. These systems see patterns that would be impossible for a human to create.

Contents