Difference between revisions of "Online machine learning"

From Suhrid.net Wiki
Jump to navigationJump to search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Stream mining =  
 
= Stream mining =  
  
* Core assumption is that '''training''' examples can be briefly inspected for a single time only.
+
* Conventional machine learning assumes that the training data is available as a whole set.
 +
* In stream mining, core assumption is that '''training''' examples can be briefly inspected for a single time only.
 
* Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
 
* Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
 
* Algorithm must update its model incrementally as each element is inspected.
 
* Algorithm must update its model incrementally as each element is inspected.
Line 7: Line 8:
 
* the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
 
* the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
 
* Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
 
* Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
 +
* The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
 
* The labels that can be applied to the data are limited, typically less than ten.
 
* The labels that can be applied to the data are limited, typically less than ten.
 +
 +
== Approaches ==
 +
 +
* Wrapper approach : collect a stream into a batch and then use batch ML. Not very good.
 +
* Adaptation approach: custom built for stream learning.
 +
** Decision Tree: Good choice.
 +
** Neural Networks: Also good.

Latest revision as of 14:12, 24 March 2012

Stream mining

  • Conventional machine learning assumes that the training data is available as a whole set.
  • In stream mining, core assumption is that training examples can be briefly inspected for a single time only.
  • Arrive in a high speed stream and then must be discarded to make room for subsequent examples.
  • Algorithm must update its model incrementally as each element is inspected.
  • It should ideally be able to apply the model any time between training examples.
  • the goal of classification is to produce a model that can predict the class of unlabelled examples by training on examples whose label/class is supplied.
  • Data is assumed to have small and fixed no. of columns or attributes/features. This can be thought of as tuples.
  • The number of examples (rows) are very large and they are assumed to keep arriving in a stream infinitely.
  • The labels that can be applied to the data are limited, typically less than ten.

Approaches

  • Wrapper approach : collect a stream into a batch and then use batch ML. Not very good.
  • Adaptation approach: custom built for stream learning.
    • Decision Tree: Good choice.
    • Neural Networks: Also good.