A Vectorized Version of the Mighty Logistic Regressor

Neural nets, with their flexible data representation capable of approximate any arbitrary function, are in a new renaissance with the development of Deep Leaners. These stack many layers (typically dozens of quasi-independent layers) of traditional neurons to achieve something very remarkable: self detection of important features from input data. The application of Deep Learning is not restricted to raw classification or regression; these family of techniques are applied to much broader fields such as machine vision and speech recognition. Recent advances went much further and combined a Deep Learning architecture with a Reinforcement Learning algorithm generating a computer program capable of beating classic Atari video games without being explicitly programmed for this task, scoring as a top human player. It is not surprising, therefore, that they appear in the last Nature’s cover (vol. 518, Num. 7540, pp. 456-568). But how do they work? First, one must take a close look to the humble beginnings: the mighty logistic regressor. I programmed a vectorized version in R of this classic algorithm which is the very basis for any neural net architecture. This is depicted in the following. 


1:  rm(list = ls())  
2:  #' It computes the sigmoid function of the given input.  
3:  #' @param z is the input scalar.  
4:  #' @return the sigmoid.  
5:  sigmoid <- function(z) {  
6:    return(1/(1 + exp(-z)))  
7:  }  
8:  # Start the process by generating 250 samples.
9:  set.seed(10097)  
10:  x1 <- runif(250)  
11:  x2 <- runif(250)  
12:  y <- ifelse(x1 + x2 > 0.8, 1, 0)  
13:  dataset <- data.frame("BIAS" = rep(1, length(y)), "X1" = x1, "X2" = x2)  
14:  a1 <- as.matrix(dataset)  
15:  features <- ncol(a1) - 1  
16:  numExamples <- nrow(a1)  
17:  epsilon <- 0.05 # The amount of standard deviation for random initialization.  
18:  alpha <- 0.8 # The learning rate.  
19:  lambda <- 0.001 # The regularization penalty.  
20:  epochs <- 5000  
21:  frac <- 1 / numExamples  
22:  # Let's plot the data set  
23:  plot(x = a1[,2], y = a1[,3], col = (y + 1), pch = 19, xlab = "X1", ylab = "X2")  
24:  W <- matrix(runif((features + 1), min = -epsilon, max = epsilon), nrow = 1, ncol = features + 1, byrow = T)  
25:  # Train the logistic regressor.  
26:  for(epoch in 1:epochs) {  
27:    # First compute the hypothesis... 
28:    z2 <- W %*% t(a1)  
29:    # Next, compute the gradient  
30:    gradW <- frac * ( (sigmoid(z2) - y) %*% a1 + lambda * cbind(0,t(W[,-1])) )  
31:    W <- W - alpha * gradW  
32:  }  
33:  h <- sigmoid(W %*% t(a1))  
34:  l_x <- log(h)  
35:  l_inv <- log(1 - h)  
36:  J <- -frac * sum(sum(y * l_x) + sum( (1 - y) * l_inv)) + (lambda / (2 * numExamples)) * W[,-1]^2  
37:  # Plot the separating hyperplane.  
38:  abline(a = -W[,1]/W[,3], b = -W[,2]/W[,3], col = "green", lwd = "3")  

Figure 1 depicts the problem space and the solution given by logistic regressor, shown as a green line. In this problem we have two input variables (x1 and x2) that take random values in the range [0, 1]. The class is labeled as 1 if x1 + x2 > 0.8 and 0 otherwise.

Figure 1 The data set  consisting of two random variables and the solution given by the algorithm. 

Comments

Popular posts from this blog

Data Streams and VFML

High Performance Computing, yet another brief installation tutorial

A conceptual machine for cloning a human driver