A lazy learner that classifies new instances by majority vote of their k closest training examples.
Nearest Neighbor Pattern Classification
T.M. Cover & P.E. Hart · IEEE Transactions on Information Theory, Vol. 13 · 1967
View paper →IEEE Xplore — institutional access may be required
Algorithm
k-NN is a lazy learner — it stores the entire training set and classifies a new instance by finding the k closest training examples (by distance) and taking a majority vote of their class labels. There is no training phase; all computation happens at query time.
Algorithm: k-NN Classify
for each training instance xi: compute d(x,xi)
Sort by distance; select the k nearest neighbours Nk(x)
returnargmaxc#{c∈Nk(x)} (majority vote)
Cover & Hart prove that the 1-NN error rate asymptotically cannot exceed twice the Bayes error rate — making k-NN a theoretical lower bound for non-parametric classification.
Distance Metric
Numeric attributes use normalized Euclidean distance — each attribute is scaled to [0, 1] by its training-set min/max range so that no single attribute dominates. Nominal attributes use the overlap metric (0 if equal, 1 otherwise). Missing values contribute a worst-case penalty of 1.
Four implementation steps map directly to the paper's assumptions:
1
Normalize so attribute scale does not dominate distance
// buildNorm() precomputes per-attribute min/max over the training set
const range = norm.maxs[i] - norm.mins[i] || 1;
sum += ((a[i] - b[i]) / range) ** 2; // normalized Euclidean contribution
2
Mixed-attribute distance — numeric, nominal, and missing values
if (attributes[i].type === 'numeric') {
sum += ((a[i] - b[i]) / range) ** 2; // normalized Euclidean
} else {
sum += a[i] === b[i] ? 0 : 1; // overlap metric for nominal
}
// a[i] === null → sum += 1 (maximum possible distance contribution)
3
Lazy model — the dataset IS the model (no training phase)
export function train(ds) {
// No learning happens here. train() captures the dataset and
// precomputes normalization bounds — O(n·d) once, not per query.
return { instances: ds.instances, attributes: ds.attributes,
classIndex: ds.classIndex,
norm: buildNorm(ds.instances, ds.attributes, ds.classIndex) };
}
4
k-majority vote at classify time
export function classify(model, instance, k = 1) {
const dists = instances.map(tr => ({ d: dist(tr, instance, ...), c: tr[classIndex] }));
dists.sort((a, b) => a.d - b.d); // sort ascending by distance
const votes = {};
for (const { c } of dists.slice(0, k)) // tally top-k class labels
votes[c] = (votes[c] || 0) + 1;
return Object.entries(votes)
.sort((a, b) => b[1] - a[1])[0][0]; // return plurality class
}
Theory
Theorem 1.(Cover & Hart, 1967) Let R∗ be the Bayes error rate and R1 the 1-NN error rate. As the training set size n→∞:
R∗≤R1≤2R∗(1−R∗)≤2R∗
The upper bound shows 1-NN can never be worse than twice the Bayes error — making it a theoretical lower bound on classification difficulty for any non-parametric method.
Corollary 1.For odd k with k→∞ and k/n→0 as n→∞, the k-NN error rate converges to the Bayes error rate R∗.
Complexity
Complexity
Training
O(n⋅d)— store dataset, precompute per-attribute min/max
Query
O(n⋅d)— full distance scan; no index structure
Space
O(n⋅d)— entire training set kept in memory
k-NN pays no cost at training time — all work happens at query time. For large datasets, approximate nearest-neighbour structures (kd-trees, ball trees) reduce query cost toO(dlogn) on average, but are not implemented here.
Parameters
k
Number of neighbours. Default 3. Odd values avoid ties on binary classification tasks.
Evaluation
The Explorer supports leave-one-out cross-validation and a percentage hold-out split. Reported metrics include overall accuracy, confusion matrix, and per-class precision, recall, and F1 score.
k-NN is sensitive to irrelevant attributes and non-normalised data. Try the Preprocess tab first to check attribute distributions.