classification_kNN/

directory
v0.0.0-...-361c87b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 4, 2017 License: Apache-2.0

README

Classification - k Nearest Neighbors

Classification is distinct from Regression in that the target variable is typically categorical or labeled. For example, a classification model may classify emails into spam and not-spam categories or classify network traffic as fraudulent or not fraudulent. Generally, these models may classify into any number of categories. Here we will explore classification using the k Nearest Neighbor algorithm.

k Nearest Neighbors

alt tag
from Machine Learning Mastery

Notes

  • Classification is typically a "supervised" machine learning task. That is, typically you need to have a labeled data set (e.g., spam, not spam).
  • Closeness or similarity implies a metric. We will use Euclidean distance here for kNN, but, in general, the choice of metric will depend on what types of features you have (categorical, numeric, etc.). For more on distance metrics see here.
  • kNN is easy to understand and thus a good place to start for classification problems.
  • kNN calculates distances on each prediction, so everything happens on the fly. There isn't really a "trained" model.

How the kNN algorithm works
k Nearest Neighbors - Classification

Code Review

Profile the data
Train and use cross-validation to validate a kNN model

Exercises

Exercise 1

Find an optimal k value for the above model of iris species. That is, search over various k values and evaluate the predictions.

Template | Answer


All material is licensed under the Apache License Version 2.0, January 2004.

Directories

Path Synopsis
Sample program to profile our data set.
Sample program to profile our data set.
Sample program to train and validate a kNN model with cross validation.
Sample program to train and validate a kNN model with cross validation.
exercises
exercise1
Program for finding an optimal k value for a k nearest neighbors model.
Program for finding an optimal k value for a k nearest neighbors model.
template1
Template programe for finding an optimal k value for a k nearest neighbors model.
Template programe for finding an optimal k value for a k nearest neighbors model.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL