Classification
Regression
Clustering
Structure Learning
ARFF Format
The Attribute-Relation File Format — Weka's native dataset format, supported directly by the Explorer.
Introduction
ARFF was developed by the Machine Learning Project at the University of Waikato for use with the Weka machine learning software. It is a human-readable ASCII text format that describes instances sharing a fixed set of attributes.
An ARFF file has two sections: the Header (relation name and attribute declarations) and the Data section (one instance per line, comma-separated).
Structure
The @relation line names the dataset. Each @attribute line declares one attribute by name and type. The class attribute is conventionally last.
Attribute Types
Continuous real-valued attributes. May also be declared as 'real' or 'integer'.
Nominal (categorical) — lists all possible values in braces. Used for the class attribute.
Free-form string values. Treated as nominal in this tool.
Date/time values with an optional Java SimpleDateFormat pattern. Parsed as numeric timestamps.
For the full specification see the official Weka ARFF documentation.
Missing Values
Missing values are represented by a question mark ? in the data section. The Preprocess tab reports missing-value counts per attribute. During classification and clustering, missing values receive a maximum-distance penalty.
machinelearning.js.org · open source · MIT · Marin's Web Site