Andreu Sancho Homepage

Posts

Stanford offers online machine learning course for free!

- August 19, 2011

" Machine learning is the science of getting computers to act without being explicitly programmed. " This is a very good definition of what ML is. But let's see what topics are included in the course: (1) supervised learning, (2) unsupervised learning, (3) best practices in machine learning, and (4) reinforcement learning (the course syllabus and the scheduling is found here .) A quite complete course; I only miss evolutive computation, anyway I think this course is a good way to enter to this wonderful discipline. You can sign up freely here (I did and I will post the experience here.) Stanford also offers another two free online courses: introduction to artificial intelligence and introduction to data bases .

Not so fast...

- May 01, 2011

After nearly a two years' trip into the deep mysteries of LCSs one expects to comprehend the intrinsics of these family of techniques. Dozens of papers and unimaginable hours behind and I am, still, hooked with the very basics. These are the thoughts of a wannabe researcher that makes baby steps into the research world. But first let me introduce myself briefly: I have spent a reasonable amount of time and effort with XCS-alike algorithms. I implemented myself, from the very scratch, the most relevant ones, and tested them uncountable times. But I am still learning about the basics. Every single time. Recently I re-implemented the good old XCSR, and I found a lot of trouble with it. Devil is in every detail, and I get him face to face. What is one supposed to do when stuck with a technique he is supposed to be a master of (or, at least, with a certain amount of experience behind with)? I confess I was very upset with myself. Depressed, with thoughts of failure flooding i...

Schemata, Building Blocks, and Everything Else

- March 19, 2011

Genetic Algorithms (GAs), are a search and optimization method inspired in the way nature works with living entities, using evolutionary-based operators. These operators exchange genetic information through different generations until an ending condition, typically the desired solution, is found. In this entry, the formalism of why GAs work is described as proposed by Holland in the middle seventies and later by Goldberg. To do so, we first need to introduce some key concepts, assuming the classical ternary representation {0, 1, *} , where * is the don't care symbol. A fundamental concept in GA theory is the one of schema . A schema is a particular subset among the set of all possible binary strings described by a template composed of the ternary alphabet {0, 1, *} . For instance, the schema 01**1 corresponds to the set of strings of length five (that is, strings composed of five symbols from the ternary alphabet) with a 0 in the first position, an 1 in the second position ...

From Market Baskets to Databases: Association Rule Mining

- March 07, 2011

What do the customers buy? Which products are bought together? With these two short questions the field of association rule (AR) mining makes its appearance. In this field of ML, the original aim was to find associations and correlations between the different items that customers place in their shopping market. More generally, the goal of AR is to find frequent and interesting patterns , associations , correlations , or causal structures among sets of items or elements in large databases and put these relationships in terms of association rules . AR is an important part of the unsupervised learning paradigm, so the algorithm has not the presence of an expert to teach it during the training stage. Why AR mining may be so important ? Many commercial applications generate huge amounts of unlabeled data (just think of Facebook for a moment), so our favorite classifier system will not work in this environment. With AR we can exploit such databases and extract any kind of useful in...

Modern Learning Classifier Systems

- January 09, 2011

From the classic point of view, machine learning algorithms are classified based on the desired outcome of the algorithm. There are three main types of learning: supervised learning, where an expert or teacher provides feedback in the learning process, unsupervised learning, where there is no expert or teacher when the learning process is running, and reinforcement learning, where the program learns interacting with the environment. The latter technique of learning is a fundamental mechanism in learning classifier systems (LCSs). These are cognitive systems that receive perceptions from the environment and, in response to these, perform actions to solve the problem that are facing. Originally proposed by John Holland and later simplified by David Goldberg and others, LCSs are computer programs that are based on observations of how natural selection processes and Darwinian evolution solve complex tasks. The original purpose was to create true artificial intelligence mimicking the adapt...

Data Streams and VFML

- November 11, 2010

We live in a technological world crowded of information. Every device we can think of can give us a bunch of such data, usually in the form of a flow or stream of information in, more or less, real time . In this particular situation classical knowledge discovery mechanisms (like our loved C4.5, a decision tree developed by Quinlan) are completely unable of extract a correct model of the situation. But, what is so special with flows of data? Following the words of Gama and Rodriques: a data stream is an ordered sequence of instances that can be read only once or a small number of times using limited computing and storage capabilities. These sources of data are characterized by being open-ended, following at high speed, and generated by non-stationary distributions in dynamic environments . So, to properly handle this kind of knowledge the learning algorithm has to learn on line and process massive amounts of data increasing the challenges to be faced. Let's hold one's breath w...

$\LaTeX$ on Blogger? Yes!

- September 20, 2010

$\LaTeX$ is a powerful tool to properly express our ideas and thoughts that every wannabe researcher should know. A question I had, sometime ago, were if $\LaTeX$ worked in Blogger. Today I have found the answer: yes, we can use $\LaTeX$ here: $y_{k} = \sum_{i}^{l} \alpha_{i} + 1$ The two things you have to do are: (1) create a new HTML/Javascript third-party application, and (2) enter the code found here . The script replaces the $\LaTeX$ code for images, easy and fast :-)