From the course: Introduction to Artificial Intelligence

Classify data

From the course: Introduction to Artificial Intelligence

Classify data

- As humans, we classify things all the time. We put our Microsoft Word docs into folders. We separate our business contacts from our personal contacts. We list out things alphabetically. Without these classifications, we'd have a hard time organizing the data. Businesses need to organize the data the same way. Airline companies want to classify their customers by frequent flyers. Retailers want to classify their highest spenders. Search engines want to classify the likelihood you'll buy something online. Binary classification is one of the most popular supervised machine learning challenges. That's because it's simple and it's powerful. With binary classification, there are only two possible outcomes. Is the hotel room going to be booked next week? Will the stock market go up this afternoon? Is this email message spam? All binary classification uses supervised machine learning. Remember that supervised learning depends on labeled data. That means that the machine learning system is trained to classify the two answers. So to use these systems, you need to first create a training data set. Credit card fraud detection systems are one of the most popular ways to use binary classification. Every time you use your credit card, a machine learning algorithm classifies your transaction as fraud or not fraud. Since this is supervised machine learning, the credit card companies had to start out with tens of thousands of examples of fraudulent transactions. The data science team would train the system on how to recognize the patterns in future transactions. Email providers use supervised machine learning to classify spam messages. They start with a labeled training set of messages marked as spam. Once the network processes enough messages, it'll classify your spam email. These techniques are inputting massive amounts of data and then using machine learning algorithms to classify your data into human-created categories. Categories like booking data, fraudulent transactions, and unwanted email. A data scientist creates these categories, and then your AI system classifies the data that has been trained to recognize. Now, classification is one of the most popular forms of machine learning, but it also takes a lot of upfront effort to train the system. It can be a challenge to get tens of thousands of fraudulent credit card transactions or tens of thousands of spam email messages. Plus, there's no guarantee that'll be enough for the system to make accurate predictions. That means your data science team might find itself going back and getting another 10,000 transactions. Your team will have to feed the machine learning algorithm until it's extremely accurate at classifying your data. That's why even now, after several years of development, your credit card company might send you a fraud warning even though it's not a fraudulent transaction. Data scientists are constantly training these systems to make the classifications more accurate. Credit card fraud, spam detection, and online purchasing might all seem like very different challenges, but to your machine learning system, they're all just different ways of doing the same thing. You're classifying your labeled data into predefined categories.

Contents