Revision as of 14:10, 24 March 2012

Stream mining

Conventional machine learning assumes that the training data is available as a whole set.
In stream mining, core assumption is that training examples can be briefly inspected for a single time only.
Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
Algorithm must update its model incrementally as each element is inspected.
It should ideally be able to apply the model any time between training examples.
the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
The labels that can be applied to the data are limited, typically less than ten.

@@ Line 10: / Line 10: @@
 * The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
 * The labels that can be applied to the data are limited, typically less than ten.
+== Approaches ==
+* Wrapper approach : collect a stream into a batch and then use batch ML.
+* Adaptation approach: custom built for stream learning.