How To Software The Use of Classification in Data Mining Share Pin Email Print Hero Images/Getty Images Software Databases Documents Spreadsheets Presentations Desktop Publishing Graphic Design Animation & Video by Mike Chapple An IT professional with more than 10 years of experience in the fields of databases and cybersecurity. Updated June 13, 2019 33 33 people found this article helpful Classification is a data mining technique that assigns categories to a collection of data in order to aid in more accurate predictions and analysis. Also called sometimes called a Decision Tree, classification is one of several methods intended to make the analysis of very large datasets effective. Why Classification? Very large databases are becoming the norm in today's world of big data. Imagine a database with multiple terabytes of data — a terabyte is one trillion bytes of data. Facebook alone crunches 600 terabytes of new data every single day (as of 2014, the last time it reported these specs). The primary challenge of big data is how to make sense of it. And sheer volume is not the only problem: big data also tends to be diverse, unstructured and fast-changing. Consider audio and video data, social media posts, 3D data or geospatial data. This kind of data is not easily categorized or organized. To meet this challenge, a range of automatic methods for extracting useful information has been developed, among them classification. How Classification Works At the danger of moving too far into tech-speak, let's discuss how classification works. The goal is to create a set of classification rules that will answer a question, make a decision, or predict behavior. To start, a set of training data is developed that contains a certain set of attributes as well as the likely outcome. The job of the classification algorithm is to discover how that set of attributes reaches its conclusion. Scenario: Perhaps a credit card company is trying to determine which prospects should receive a credit card offer. This might be its set of training data: Name Age Gender Annual Income Credit Card Offer John Doe 25 M $39,500 No Jane Doe 56 F $125,000 Yes Training Data The "predictor" columns Age, Gender, and Annual Income determine the value of the "predictor attribute" Credit Card Offer. In a training set, the predictor attribute is known. The classification algorithm then tries to determine how the value of the predictor attribute was reached: what relationships exist between the predictors and the decision? It will develop a set of prediction rules, usually an IF/THEN statement, for example: Obviously, this is a simple example, and the algorithm would need a far larger data sampling than the two records shown here. Further, the prediction rules are likely to be far more complex, including sub-rules to capture attribute details. Next, the algorithm is given a "prediction set" of data to analyze, but this set lacks the prediction attribute (or decision): Name Age Gender Annual Income Credit Card Offer Jack Frost 42 M $88,000 Mary Murray 16 F $0 Predictor Data This predictor data helps estimate the accuracy of the prediction rules, and the rules are then tweaked until the developer considers the predictions effective and useful. Day to Day Examples of Classification Classification, and other data mining techniques, is behind much of our day-to-day experience as consumers. Weather predictions might make use of classification to report whether the day will be rainy, sunny or cloudy. The medical profession might analyze health conditions to predict medical outcomes. A type of classification method, Naive Bayesian, uses conditional probability to categorize spam emails. From fraud detection to product offers, classification is behind the scenes every day analyzing data and producing predictions. Continue Reading Use Regression to Predict Where You Might Live Whether 'gaming addiction' is real or not, the excitement is high Machine Learning is Making Computers Smarter Every Day What to Know When Data Mining With the K-Means Algorithim How Companies Use Data Discovery to Learn About You Ways to Use Your Personal Computer With GPS Technology What Exactly Is 'Big Data'? And Why Is It a Big Deal? What Is the Nagle Algorithm for TCP Network Communication? What Is Quantum Computing? What Are File Attributes and What Do They Mean? Wondering About Biometrics? Here’s Everything That You Should Know Natural Language Processing (NLP) and the Future of AI Power Pivot For Excel: What It Is and How to Use It What is End-to-End Encryption? Deep Learning is Machine Learning at its Finest How Artificial Intelligence Makes a Smartphone 'Smart'