MachineLearning.js

k-NN is a lazy learner — it stores the entire training set and classifies a new instance by finding the k closest training examples (by distance) and taking a majority vote of their class labels. There is no training phase; all computation happens at query time.

Algorithm: k-NN Classify

for each training instance

\mathbf{x}_i

: compute

d(\mathbf{x},\, \mathbf{x}_i)

Sort by distance; select the

k

nearest neighbours

\mathcal{N}_k(\mathbf{x})

return

\arg\max_c \;\#\{c \in \mathcal{N}_k(\mathbf{x})\}

(majority vote)

Cover & Hart prove that the 1-NN error rate asymptotically cannot exceed twice the Bayes error rate — making k-NN a theoretical lower bound for non-parametric classification.

Distance Metric

Numeric attributes use normalized Euclidean distance — each attribute is scaled to [0, 1] by its training-set min/max range so that no single attribute dominates. Nominal attributes use the overlap metric (0 if equal, 1 otherwise). Missing values contribute a worst-case penalty of 1.

d(\mathbf{a},\, \mathbf{b}) = \sqrt{\sum_{i} d_i(a_i, b_i)^2}

Per-attribute distance dᵢ

numeric:

d_i = \dfrac{a_i - b_i}{\operatorname{range}(i)}

(normalised Euclidean)

nominal:

d_i = 0 \text{ if } a_i = b_i,\; \text{else } 1

(overlap metric)

missing:

d_i = 1

(maximum penalty)

Theory → Code

Four implementation steps map directly to the paper's assumptions:

Normalize so attribute scale does not dominate distance

// buildNorm() precomputes per-attribute min/max over the training set const range = norm.maxs[i] - norm.mins[i] || 1; sum += ((a[i] - b[i]) / range) ** 2; // normalized Euclidean contribution

Mixed-attribute distance — numeric, nominal, and missing values

if (attributes[i].type === 'numeric') { sum += ((a[i] - b[i]) / range) ** 2; // normalized Euclidean } else { sum += a[i] === b[i] ? 0 : 1; // overlap metric for nominal } // a[i] === null → sum += 1 (maximum possible distance contribution)

Lazy model — the dataset IS the model (no training phase)

export function train(ds) { // No learning happens here. train() captures the dataset and // precomputes normalization bounds — O(n·d) once, not per query. return { instances: ds.instances, attributes: ds.attributes, classIndex: ds.classIndex, norm: buildNorm(ds.instances, ds.attributes, ds.classIndex) }; }

k-majority vote at classify time

export function classify(model, instance, k = 1) { const dists = instances.map(tr => ({ d: dist(tr, instance, ...), c: tr[classIndex] })); dists.sort((a, b) => a.d - b.d); // sort ascending by distance const votes = {}; for (const { c } of dists.slice(0, k)) // tally top-k class labels votes[c] = (votes[c] || 0) + 1; return Object.entries(votes) .sort((a, b) => b[1] - a[1])[0][0]; // return plurality class }

Theory

Theorem 1.(Cover & Hart, 1967) Let

R^*

be the Bayes error rate and

R_1

the 1-NN error rate. As the training set size

n \to \infty

R^* \;\le\; R_1 \;\le\; 2R^*\!\left(1 - R^*\right) \;\le\; 2R^*

The upper bound shows 1-NN can never be worse than twice the Bayes error — making it a theoretical lower bound on classification difficulty for any non-parametric method.

Corollary 1.For odd

k

with

k \to \infty

and

k/n \to 0

n \to \infty

, the k-NN error rate converges to the Bayes error rate

R^*

Complexity

Training

O(n \cdot d)

— store dataset, precompute per-attribute min/max

Query

O(n \cdot d)

— full distance scan; no index structure

Space

O(n \cdot d)

— entire training set kept in memory

k-NN pays no cost at training time — all work happens at query time. For large datasets, approximate nearest-neighbour structures (kd-trees, ball trees) reduce query cost to $O(d \log n)$ on average, but are not implemented here.

Parameters

Number of neighbours. Default 3. Odd values avoid ties on binary classification tasks.

Evaluation

The Explorer supports leave-one-out cross-validation and a percentage hold-out split. Reported metrics include overall accuracy, confusion matrix, and per-class precision, recall, and F1 score.

k-NN is sensitive to irrelevant attributes and non-normalised data. Try the Preprocess tab first to check attribute distributions.

On this page

Original Paper Algorithm Distance Metric Theory → Code Theory Complexity Parameters Evaluation

Overview Naïve Bayes

machinelearning.js.org · open source · MIT · Marin's Web Site