Revision as of 13:47, 24 March 2012

Stream mining

Core assumption is that training examples can be briefly inspected for a single time only.
Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
Algorithm must update its model incrementally as each element is inspected.
It should ideally be able to apply the model any time between training examples.
the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
The labels that can be applied to the data are limited, typically less than ten.

@@ Line 7: / Line 7: @@
 * the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
 * Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
+* The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
 * The labels that can be applied to the data are limited, typically less than ten.