Data Streams and VFML

We live in a technological world crowded of information. Every device we can think of can give us a bunch of such data, usually in the form of a flow or stream of information in, more or less, real time. In this particular situation classical knowledge discovery mechanisms (like our loved C4.5, a decision tree developed by Quinlan) are completely unable of extract a correct model of the situation. But, what is so special with flows of data?


Following the words of Gama and Rodriques: a data stream is an ordered sequence of instances that can be read only once or a small number of times using limited computing and storage capabilities. These sources of data are characterized by being open-ended, following at high speed, and generated by non-stationary distributions in dynamic environments.


So, to properly handle this kind of knowledge the learning algorithm has to learn on line and process massive amounts of data increasing the challenges to be faced. Let's hold one's breath with the following example: the nuclear device controller. The decay of heavier particles inside a nuclear reactor generates a flow of data which the controller must keep an eye on and adjust physical parameters, such as the neutron moderators, in order to sustain (or stop if necessary) the nuclear fission inside the reaction chamber. It is obvious that a classical, off line approach could be a very dangerous business.


Usually, data streams come from sensor networks containing an undesirable amount of noise, degrading the model of the system. And this is not the only affair. There is a major challenge: the variations on the distributions of categories of the problem at a given time. This effect is called concept drift, and, with the help of a noisy input, it can destroy completely the predictions of classical knowledge discovery algorithms.


Several on line learning algorithms have been proposed so far, but not all of those can handle concept drift. The current state of the art is the so called "Concept Very Fast Decision Tree" (CVFDT), a branch of C4.5 to handle serious data stream problems with concept drifts designed by Hulten and others. To test its capabilities, Hulten and Domingos developed a toolkit for mining high-speed data streams and very large data sets. This software is called "Very Fast Machine Learning" (VFML) and is available under BSD license here.


I tested this software and I have to advice that you will probably have some trouble compiling it. The makefiles provided have some minor mistakes and You have to compile it several times (I did make four times until I got the binaries).

Comments

  1. hi! i need to run vfml toolkit and i have a lot of errors when i have tried compile... how do you make the binaries ? the changes on makefiles, etc... the vfml oficial email is deactivate...
    i have found some erros on makefile inside the src fold, need to install bison on my linux, etc...
    please, email-me! andre.p.bandeira at gmail.com

    ReplyDelete
  2. I think that it will be easier if I just send you the compiled package via email. It is a tar.gz file containing everything. You will find the binaries inside the folder /vfml/bin/.

    Anyway, for your information, the things I did in order to properly compile the original package were, more or less, the following:
    Firstly, you have to modify the file /vfml/src/core/BeliefNet.c at line 929: ((int)(child->tmpInternalData))--;
    Modify this with:
    child->tmpInternalData--;
    After this, you have to work around with the makefiles. The line 6 of the makefile placed in /vfml/ is not set up right: it exports the system variable PYTHON wrongly. Change to: export PYTHON=/usr/bin/python ("usr" not "urs"!).
    Next, you have to enter to the makefile placed in /vfml/src/ and add the following to the file, at the beginning:
    export ARCH = UNIX
    export PYTHON=/usr/bin/python
    export CC=gcc
    Then, do make to the first makefile (I did this twice) and then do make inside /vfml/src/. Next, you have to copy /vfml/lib/*.a to /vfml/src/.
    Finally, you have to do make again and, eventually, the package should work.

    So, check the email :)

    ReplyDelete
  3. Hello Andreu

    I tried compiling the toolkit too but I received the same errors as Bandiera. I tried fixing the bugs as you said but i get new errors on this step
    "Then, do make to the first makefile (I did this twice) and then do make inside /vfml/src/."

    This is what I get when I do the first make:

    risto@nyah:~/vfml$ make
    (cd src ; make all)
    make[1]: Entering directory `/home/risto/vfml/src'
    cp vfml.a ../lib/libvfml.a
    cp: cannot create regular file `../lib/libvfml.a': No such file or directory
    make[1]: *** [libvfml.a] Error 1
    make[1]: Leaving directory `/home/risto/vfml/src'
    make: *** [all] Error 2

    This is what I get with the second make (inside /vfml/src/):

    risto@nyah:~/vfml/src$ make
    cp vfml.a ../lib/libvfml.a
    cp: cannot create regular file `../lib/libvfml.a': No such file or directory
    make: *** [libvfml.a] Error 1


    Could you tell me PLEASE how I can fix this or maybe send me the binaries too. My e-mail is anastasovskigoce@gmail.com

    I'd appreciate any help I can get.

    Thanks,
    Goce

    ReplyDelete
  4. Ooops sorry Goce, I was off all this time. I send you an email with the modified VFML package.

    I think I may do a special section with the package here in this blog. Maybe some day...

    ReplyDelete
  5. Hello Andreu,

    I followed the steps you mentioned and I get the following error :

    make[2]: Entering directory `/home/kirthanaa/vfml/src/learners/naivebayes'
    gcc -DUNIX -I ../../../include/ naivebayes.c ../../../lib/vfml.a -lm -o naivebayes
    ../../../lib/vfml.a: file not recognized: File truncated
    collect2: ld returned 1 exit status
    make[2]: *** [naivebayes] Error 1
    make[2]: Leaving directory `/home/kirthanaa/vfml/src/learners/naivebayes'
    make[1]: *** [naivebayes] Error 2
    make[1]: Leaving directory `/home/kirthanaa/vfml/src/learners'
    make: *** [learners] Error 2

    Could you please tell me how to fix it?? Can you send me the modified package?? My email id is kirthan18@gmail.com

    Thanks a lot in advance!
    Kirthanaa.

    ReplyDelete
  6. Oh sorry Kirthanaa, I completely forgot to check the comments on the site; I know it is a bit late, but I'll send to you the binaries anyway. Again, sorry!

    ReplyDelete
    Replies
    1. Andreu can you also send me the complete package of VFML alongwith binaries. my email ID is rabia.msis6@mcs.edu.pk

      Delete
    2. Hello Rabia Latif and sorry for the delay, I'll send you the package ASAP.

      Delete

Post a Comment

Popular posts from this blog

Toward ensemble methods: A primer with Random Forest

Optimization: Simplex Algorithm