Skip to main content


Showing posts from November, 2010

Data Streams and VFML

We live in a technological world crowded of information. Every device we can think of can give us a bunch of such data, usually in the form of a flow or stream of information in, more or less, real time. In this particular situation classical knowledge discovery mechanisms (like our loved C4.5, a decision tree developed by Quinlan) are completely unable of extract a correct model of the situation. But, what is so special with flows of data?
Following the words of Gama and Rodriques: a data stream is an ordered sequence of instances that can be read only once or a small number of times using limited computing and storage capabilities. These sources of data are characterized by being open-ended, following at high speed, and generated by non-stationary distributions in dynamic environments.
So, to properly handle this kind of knowledge the learning algorithm has to learn on line and process massive amounts of data increasing the challenges to be faced. Let's hold one's breath with…