Classification
Regression
Clustering
Structure Learning
Support Vector Machine
Finds the maximum-margin hyperplane separating classes, with a soft-margin penalty for misclassifications — trained by stochastic gradient descent on hinge loss.
Support Vector Networks
C. Cortes & V. Vapnik · Machine Learning, Vol. 20 · 1995
Algorithm
A linear SVM finds the hyperplane w·x + b = 0 that separates classes with the largest possible margin (2/‖w‖). The soft-margin variant (Cortes & Vapnik) allows misclassifications via slack variables ξᵢ, controlled by the penalty parameter C.
Primal objective (soft-margin)
Hinge loss (equivalent unconstrained form)
SGD update per instance
Multi-class problems use one-vs-rest: one SVM per class. The class whose decision function w·x + b gives the highest score wins.
Theory → Code
1
Hinge loss — zero when correctly classified outside the margin
2
SGD with decaying learning rate per epoch
3
One-vs-rest — train one linear SVM per class
4
Classify — argmax of the raw margin scores
Theory
where is the margin, is the radius of the smallest enclosing ball, and is the number of training examples. Maximising the margin minimises this generalisation bound — the theoretical justification for the SVM objective.
Complexity
Complexity
Training (SGD)
— n instances, d features, e epochsTraining (QP)
— exact dual — not used hereQuery
— one dot product per class (one-vs-rest)Space
— one weight vector per classParameters
Notes
This implementation is a linear SVM only — it learns a hyperplane boundary. Kernel SVMs (RBF, polynomial) can separate non-linearly separable data but require computing a kernel matrix; they are not yet implemented here.
Only numeric attributes are used as features. Feature normalisation to [0, 1] is applied automatically — this is critical for SGD convergence and for the margin to be meaningful across attributes.
machinelearning.js.org · open source · MIT · Marin's Web Site