Classification
Regression
Clustering
Structure Learning
Naïve Bayes
Applies Bayes' theorem under conditional independence: Gaussian for numeric attributes, Laplace-smoothed counts for nominal.
Estimating Continuous Distributions in Bayesian Classifiers
G.H. John & P. Langley · UAI-95 — 11th Conference on Uncertainty in Artificial Intelligence · 1995
Algorithm
Naïve Bayes applies Bayes' theorem with the assumption that all attributes are conditionally independent given the class. Despite this rarely being true in practice, it often performs surprisingly well — especially on text and high-dimensional data.
Bayes Rule (conditional independence)
Numeric — Gaussian likelihood (John & Langley §3)
Nominal — Laplace-smoothed counts
John & Langley's key contribution is extending the classic discrete Naïve Bayes model to handle continuous attributes with Gaussian density estimation, making it practical for real-world datasets with mixed attribute types.
Theory → Code
Four steps map John & Langley's formulation to the implementation:
1
Compute class priors P(class) from training frequencies
2
Gaussian likelihood for numeric attributes (John & Langley §3)
3
Laplace-smoothed counts for nominal attributes
4
Classify in log-probability space — prevents floating-point underflow
Theory
Complexity
Complexity
Training
— single pass: compute priors, means, variances, countsQuery
— score each of c classes over d attributesSpace
— one Gaussian or count table per (class, attribute) pairNotes
Laplace smoothing prevents zero-probability issues for nominal values not seen during training for a given class. Without it, a single unseen value drives the entire posterior to zero regardless of all other evidence.
The variance floor (1e-9) prevents division by zero in the Gaussian formula when all training instances in a class share exactly the same value for a numeric attribute.
machinelearning.js.org · open source · MIT · Marin's Web Site