MachineLearning.js

Recursively splits the training data on the attribute that gives the highest information gain, building a tree that can be traversed at classify time.

Induction of Decision Trees

J.R. Quinlan · Machine Learning, Vol. 1 · 1986

View paper →Springer — institutional access may be required

Algorithm

ID3 (Iterative Dichotomiser 3) builds a decision tree top-down by greedily selecting, at each node, the attribute that maximises information gain — the reduction in entropy when the data is partitioned by that attribute.

Algorithm: ID3 BuildTree

BuildTree(

S

A

if all instances in

S

share one class → return Leaf(class)

A = \emptyset

or max depth → return Leaf(majority class in

S

)

a^* \leftarrow \arg\max_{a \in A}\, IG(S, a)

create node splitting on

a^*

for each value

v

a^*

S_v \leftarrow \{\,\mathbf{x} \in S : \mathbf{x}[a^*] = v\,\}

attach BuildTree(

S_v

A \setminus \{a^*\}

) for branch

v

Information Gain

IG(S,\, a) = H(S) - \sum_{v} \frac{|S_v|}{|S|} \cdot H(S_v)

Shannon Entropy

H(S) = -\sum_{c} p_c \log_2 p_c

Numeric attributes are handled with binary splits — the algorithm tries every midpoint between adjacent sorted values and picks the threshold with the highest information gain.

Theory → Code

Four implementation steps map to Quinlan's paper:

Entropy — measure impurity of a set of instances

Best split — find the threshold/value with maximum information gain

// Numeric: try every midpoint between adjacent sorted values for (let i = 0; i < sorted.length - 1; i++) { const t = (sorted[i][attrIdx] + sorted[i+1][attrIdx]) / 2; const gain = parentH - (left.length / n) * entropy(left, classIndex) - (right.length / n) * entropy(right, classIndex); if (gain > bestGain) { bestGain = gain; bestThreshold = t; } }

Recursive tree building — greedy top-down splitting

Classify — traverse the tree by following branches

export function classify(model, instance) { let node = model.tree; while (!node.leaf) { const val = instance[node.attrIdx]; if (node.type === 'numeric') { node = (val !== null && val <= node.threshold) ? node.children.lte : node.children.gt; } else { node = node.children[val] ?? { leaf: true, value: node.fallback }; } } return node.value; }

Theory

Lemma 1.Information gain is non-negative:

IG(S, a) \ge 0

for all attributes

a

Proof sketch.Entropy

H

is concave on probability distributions. By Jensen's inequality applied to the concave function

H

H(S) \;\ge\; \sum_{v} \frac{|S_v|}{|S|} H(S_v)

Therefore

IG(S, a) = H(S) - \sum_v \frac{|S_v|}{|S|}H(S_v) \ge 0

.□

Lemma 2.ID3 is guaranteed to reduce training error at each split, but provides no bound on generalisation error. The tree will overfit if grown without depth or leaf-size limits.

Complexity

Training

O(n \cdot d \cdot \log n \cdot h)

— n instances, d attributes, h max depth

Query

O(h)

— single root-to-leaf traversal

Space

O(n)

— tree nodes proportional to training set

At each node, scoring all $d$ attributes over the remaining $n$ instances costs $O(n \cdot d)$ ; with $O(\log n)$ nodes per level and depth $h$ , total training is $O(n \cdot d \cdot h \cdot \log n)$ . Numeric attributes add a $O(n \log n)$ sort per attribute per node.

Parameters

Max depth

Maximum tree depth (default 20). Shallower trees are more interpretable but may underfit.

Notes

Overfitting:ID3 grows trees until leaves are pure, which often overfits noisy training data. The max-depth limit is the only regularisation here. C4.5 (Quinlan's successor) adds post-pruning to address this.

Bias toward high-cardinality attributes: Information gain favours attributes with many values. C4.5 corrects this with Gain Ratio instead of raw gain.

Decision trees are highly interpretable — you can follow the path from root to leaf to understand exactly why an instance was classified as it was.

On this page

Original Paper Algorithm Theory → Code Theory Complexity Parameters Notes

Naïve Bayes Logistic Regression

machinelearning.js.org · open source · MIT · Marin's Web Site