Difference between revisions of "Online machine learning"
From Suhrid.net Wiki
Jump to navigationJump to search(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Stream mining = | = Stream mining = | ||
− | * | + | * Conventional machine learning assumes that the training data is available as a whole set. |
+ | * In stream mining, core assumption is that '''training''' examples can be briefly inspected for a single time only. | ||
* Arrive in a high speed stream and then must be discarded to make room for subsequent examples. | * Arrive in a high speed stream and then must be discarded to make room for subsequent examples. | ||
* Algorithm must update its model incrementally as each element is inspected. | * Algorithm must update its model incrementally as each element is inspected. | ||
+ | * It should ideally be able to apply the model any time between training examples. | ||
+ | * the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied. | ||
+ | * Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples. | ||
+ | * The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely. | ||
+ | * The labels that can be applied to the data are limited, typically less than ten. | ||
+ | |||
+ | == Approaches == | ||
+ | |||
+ | * Wrapper approach : collect a stream into a batch and then use batch ML. Not very good. | ||
+ | * Adaptation approach: custom built for stream learning. | ||
+ | ** Decision Tree: Good choice. | ||
+ | ** Neural Networks: Also good. |
Latest revision as of 14:12, 24 March 2012
Stream mining
- Conventional machine learning assumes that the training data is available as a whole set.
- In stream mining, core assumption is that training examples can be briefly inspected for a single time only.
- Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
- Algorithm must update its model incrementally as each element is inspected.
- It should ideally be able to apply the model any time between training examples.
- the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
- Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
- The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
- The labels that can be applied to the data are limited, typically less than ten.
Approaches
- Wrapper approach : collect a stream into a batch and then use batch ML. Not very good.
- Adaptation approach: custom built for stream learning.
- Decision Tree: Good choice.
- Neural Networks: Also good.