Latest revision as of 13:12, 24 March 2012

Stream mining

Conventional machine learning assumes that the training data is available as a whole set.
In stream mining, core assumption is that training examples can be briefly inspected for a single time only.
Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
Algorithm must update its model incrementally as each element is inspected.
It should ideally be able to apply the model any time between training examples.
the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
The labels that can be applied to the data are limited, typically less than ten.

Wrapper approach : collect a stream into a batch and then use batch ML. Not very good.
Adaptation approach: custom built for stream learning.
- Decision Tree: Good choice.
- Neural Networks: Also good.

@@ Line 1: / Line 1: @@
 = Stream mining =
-* Core assumption is that '''training''' examples can be briefly inspected for a single time only.
+* Conventional machine learning assumes that the training data is available as a whole set.
+* In stream mining, core assumption is that '''training''' examples can be briefly inspected for a single time only.
 * Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
 * Algorithm must update its model incrementally as each element is inspected.
+* It should ideally be able to apply the model any time between training examples.
+* the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
+* Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
+* The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
+* The labels that can be applied to the data are limited, typically less than ten.
+== Approaches ==
+* Wrapper approach : collect a stream into a batch and then use batch ML. Not very good.
+* Adaptation approach: custom built for stream learning.
+** Decision Tree: Good choice.
+** Neural Networks: Also good.