k-Nearest Neighbor

A lazy learner that classifies new instances by majority vote of their k closest training examples.

Nearest Neighbor Pattern Classification

T.M. Cover & P.E. Hart · IEEE Transactions on Information Theory, Vol. 13 · 1967

View paper →IEEE Xplore — institutional access may be required

Algorithm


k-NN is a lazy learner — it stores the entire training set and classifies a new instance by finding the k closest training examples (by distance) and taking a majority vote of their class labels. There is no training phase; all computation happens at query time.

Algorithm: k-NN Classify

for each training instance xi\mathbf{x}_i: compute d(x,xi)d(\mathbf{x},\, \mathbf{x}_i)
Sort by distance; select the kk nearest neighbours Nk(x)\mathcal{N}_k(\mathbf{x})
return argmaxc  #{cNk(x)}\arg\max_c \;\#\{c \in \mathcal{N}_k(\mathbf{x})\} (majority vote)

Cover & Hart prove that the 1-NN error rate asymptotically cannot exceed twice the Bayes error rate — making k-NN a theoretical lower bound for non-parametric classification.

Distance Metric


Numeric attributes use normalized Euclidean distance — each attribute is scaled to [0, 1] by its training-set min/max range so that no single attribute dominates. Nominal attributes use the overlap metric (0 if equal, 1 otherwise). Missing values contribute a worst-case penalty of 1.

d(a,b)=idi(ai,bi)2d(\mathbf{a},\, \mathbf{b}) = \sqrt{\sum_{i} d_i(a_i, b_i)^2}

Per-attribute distance dᵢ

numeric: di=aibirange(i)d_i = \dfrac{a_i - b_i}{\operatorname{range}(i)} (normalised Euclidean)
nominal: di=0 if ai=bi,  else 1d_i = 0 \text{ if } a_i = b_i,\; \text{else } 1 (overlap metric)
missing: di=1d_i = 1 (maximum penalty)

Theory → Code


Four implementation steps map directly to the paper's assumptions:

1

Normalize so attribute scale does not dominate distance

// buildNorm() precomputes per-attribute min/max over the training set const range = norm.maxs[i] - norm.mins[i] || 1; sum += ((a[i] - b[i]) / range) ** 2; // normalized Euclidean contribution

2

Mixed-attribute distance — numeric, nominal, and missing values

if (attributes[i].type === 'numeric') { sum += ((a[i] - b[i]) / range) ** 2; // normalized Euclidean } else { sum += a[i] === b[i] ? 0 : 1; // overlap metric for nominal } // a[i] === null → sum += 1 (maximum possible distance contribution)

3

Lazy model — the dataset IS the model (no training phase)

export function train(ds) { // No learning happens here. train() captures the dataset and // precomputes normalization bounds — O(n·d) once, not per query. return { instances: ds.instances, attributes: ds.attributes, classIndex: ds.classIndex, norm: buildNorm(ds.instances, ds.attributes, ds.classIndex) }; }

4

k-majority vote at classify time

export function classify(model, instance, k = 1) { const dists = instances.map(tr => ({ d: dist(tr, instance, ...), c: tr[classIndex] })); dists.sort((a, b) => a.d - b.d); // sort ascending by distance const votes = {}; for (const { c } of dists.slice(0, k)) // tally top-k class labels votes[c] = (votes[c] || 0) + 1; return Object.entries(votes) .sort((a, b) => b[1] - a[1])[0][0]; // return plurality class }

Theory


Theorem 1.(Cover & Hart, 1967) Let RR^* be the Bayes error rate and R1R_1 the 1-NN error rate. As the training set size nn \to \infty:
R    R1    2R ⁣(1R)    2RR^* \;\le\; R_1 \;\le\; 2R^*\!\left(1 - R^*\right) \;\le\; 2R^*

The upper bound shows 1-NN can never be worse than twice the Bayes error — making it a theoretical lower bound on classification difficulty for any non-parametric method.

Corollary 1.For odd kk with kk \to \infty and k/n0k/n \to 0 as nn \to \infty, the k-NN error rate converges to the Bayes error rate RR^*.

Complexity


Complexity

Training

O(nd)O(n \cdot d)store dataset, precompute per-attribute min/max

Query

O(nd)O(n \cdot d)full distance scan; no index structure

Space

O(nd)O(n \cdot d)entire training set kept in memory

k-NN pays no cost at training time — all work happens at query time. For large datasets, approximate nearest-neighbour structures (kd-trees, ball trees) reduce query cost toO(dlogn)O(d \log n) on average, but are not implemented here.

Parameters


k
Number of neighbours. Default 3. Odd values avoid ties on binary classification tasks.

Evaluation


The Explorer supports leave-one-out cross-validation and a percentage hold-out split. Reported metrics include overall accuracy, confusion matrix, and per-class precision, recall, and F1 score.

machinelearning.js.org · open source · MIT · Marin's Web Site