MachineLearning.js

Naïve Bayes applies Bayes' theorem with the assumption that all attributes are conditionally independent given the class. Despite this rarely being true in practice, it often performs surprisingly well — especially on text and high-dimensional data.

Bayes Rule (conditional independence)

P(c \mid \mathbf{x}) \propto P(c)\prod_{i} P(x_i \mid c)

Numeric — Gaussian likelihood (John & Langley §3)

P(x_i \mid c) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)

Nominal — Laplace-smoothed counts

P(x_i = v \mid c) = \frac{\#(v,\,c) + 1}{\#(c) + |V_i|}

John & Langley's key contribution is extending the classic discrete Naïve Bayes model to handle continuous attributes with Gaussian density estimation, making it practical for real-world datasets with mixed attribute types.

Theory → Code

Four steps map John & Langley's formulation to the implementation:

Compute class priors P(class) from training frequencies

for (const cv of classValues) { const subset = instances.filter(i => i[classIndex] === cv); priors[cv] = subset.length / instances.length; // MLE prior }

Gaussian likelihood for numeric attributes (John & Langley §3)

// Estimate μ and σ² per (class, attribute) pair const mean = vals.reduce((s, v) => s + v, 0) / (vals.length || 1); const variance = vals.reduce((s, v) => s + (v - mean) ** 2, 0) / (vals.length || 1) || 1e-9; // floor prevents div-by-zero likelihoods[cv][i] = { mean, variance }; // At classify time — log-form avoids floating-point underflow: function gaussianLog(x, mean, variance) { return -0.5 * Math.log(2 * Math.PI * variance) - ((x - mean) ** 2) / (2 * variance); }

Laplace-smoothed counts for nominal attributes

// Seed every nominal value with count=1 before observing any data. // Prevents P(unseen value | class) = 0 from wiping out the posterior. const counts = {}; for (const v of attrVals) counts[v] = 1; // Laplace prior for (const v of vals) counts[v] = (counts[v] || 1) + 1; likelihoods[cv][i] = { counts, total: vals.length + attrVals.length }; // At classify time: logP += Math.log((lk.counts[instance[i]] || 1) / lk.total);

Classify in log-probability space — prevents floating-point underflow

let logP = Math.log(priors[cv]); // log P(class) for (let i = 0; i < attributes.length; i++) { if (attributes[i].type === 'numeric') logP += gaussianLog(instance[i], lk.mean, lk.variance); else logP += Math.log((lk.counts[instance[i]] || 1) / lk.total); } if (logP > bestScore) { bestScore = logP; best = cv; } // argmax

Theory

Theorem 1.(Optimality under independence) When the class-conditional distributions are truly independent, Naïve Bayes achieves Bayes-optimal classification. Formally, if

P(\mathbf{x} \mid c) = \prod_i P(x_i \mid c)

exactly, then the MAP classifier

\arg\max_c P(c \mid \mathbf{x})

is the Bayes classifier.

Theorem 2.(Domingos & Pazzani, 1997) Even when the independence assumption is violated, Naïve Bayes achieves the Bayes-optimal decision boundary under milder conditions than those required for correct probability estimates — it needs only the correct ranking of class posteriors, not their calibrated values.

Complexity

Training

O(n \cdot d)

— single pass: compute priors, means, variances, counts

Query

O(c \cdot d)

— score each of c classes over d attributes

Space

O(c \cdot d)

— one Gaussian or count table per (class, attribute) pair

Notes

Laplace smoothing prevents zero-probability issues for nominal values not seen during training for a given class. Without it, a single unseen value drives the entire posterior to zero regardless of all other evidence.

The variance floor (1e-9) prevents division by zero in the Gaussian formula when all training instances in a class share exactly the same value for a numeric attribute.

Naïve Bayes trains in a single pass over the data — O(n·d) — making it very fast even on large datasets. It's a good baseline before trying more complex models.

On this page

Original Paper Algorithm Theory → Code Theory Complexity Notes

k-Nearest Neighbor Decision Tree (ID3)

machinelearning.js.org · open source · MIT · Marin's Web Site