Disculpa, pero esta entrada está disponible sólo en Inglés Estadounidense. For the sake of viewer convenience, the content is shown below in the alternative language. You may click the link to switch the active language.

The homework for this week:

Using the Multilayer Perceptron implementation I’ve provided (or your own), identify a learning problem and dataset that is of interest to you, determine how to integrate this data with the MLP implementation and write any code necessary to train and test the MLP on your data

After browsing the UCI dataset repository, I decided to use the Wine Quality dataset. Each row in the dataset corresponds to 11 measurements of wine, together with a 12th value of “quality” of the wine. Therefore for this task I would be solving a regression problem: given a set of measurements, what is the numerical quality of the wine?

Getting the CSV data into python was relatively easy thanks to the csv module documentation. However I had to try different versions of algorithms to load it successfully (e.g. first I was loading separately the inputs and outputs, but when realizing that shuffling them would mess everything up, I had to change to a general load, shuffle, and then separation into inputs and outputs).

I struggled for a while with the array dimensions. At the end I had to do a weird reshape+transposing so that the “shape” of the outputs, which should have two dimensions but one of them with a size of 1, effectively showed that size. I suspect there would be another way of achieving it, but I couldn’t find the wording of my problem and/or a good answer.

I adapted the code from the examples given; the one in the Github repository had strange dependencies, and the other didn’t have the function to evaluate the error so that a testing set could be used.

Once everything was apparently working, I made some tests with different learning rates to see how the training error and the testing error would change. The full results are here, but below is a table as a summary.

Experiment Learning Rate Final Training error Final Testing error
#1, 1 hidden layer, 100k epochs 0.01 4.817587 24.049231
#2, 1 hidden layer, 20k epochs 0.01 4.817587 24.049233
#3, 1 hidden layer, 20k epochs 0.10 4.820666 23.893846
#4, 1 hidden layer, 20k epochs 0.90 4.812007 24.211538
#5, 1 hidden layer, 20k epochs 0.05 4.823552 23.825385
#6, 2 hidden layers, 20k epochs 0.05 4.819319 23.893077

The lowest training error corresponded to the learning rate of 0.9, and it also had the worst testing error. On the other hand, the lowest testing error corresponded to the learning rate of 0.05. I tried with the same learning rate and another hidden layer, but the improvement was reflected in the training error and not in the testing error. In any case I still don’t completely understand the units of the error and what they imply when trying with a real world case. At the end I wanted to include an explicit test of one example to compare the given output with the expected output, but I had more problems with the dimensions and I had to let it go (?).

The code that I wrote/remixed can be found here.

import numpy as np
import csv
#
# Activation function definitions:
#
def sigmoid_fn(x):
    return 1.0 / ( 1.0 + np.exp( -x ) )
#
def sigmoid_dfn(x):
    y = sigmoid_fn( x )
    return y * ( 1.0 - y )
#
def tanh_fn(x):
    return np.sinh( x ) / np.cosh( x )
#
def tanh_dfn(x):
    return 1.0 - np.power( tanh_fn( x ), 2.0 )
#
# Remix between https://github.com/Hebali/learning_machines/blob/master/hyperparameter_hunt/Supervised.py and http://www.patrickhebron.com/learning-machines/code/mlp.txt
# MLP Layer Class:
#
class MlpLayer:
    def __init__(self,input_size,output_size):
        self.weights = np.random.rand( output_size, input_size ) * 2.0 - 1.0
        self.bias    = np.zeros( ( output_size, 1 ) )
#
# MLP Class:
#
class Mlp:
    def __init__(self,layer_sizes,activation_fn_name):
        # Create layers:
        self.layers = []
        for i in range( len( layer_sizes ) - 1 ):
            self.layers.append( MlpLayer( layer_sizes[ i ], layer_sizes[ i + 1 ] ) )
        # Set activation function:
        if activation_fn_name == "tanh":
            self.activation_fn  = tanh_fn
            self.activation_dfn = tanh_dfn
        else:
            self.activation_fn  = sigmoid_fn
            self.activation_dfn = sigmoid_dfn
#
    def predictSignal(self,input):
        # Setup signals:
        activations = [ input ]
        outputs     = [ input ]
        # Feed forward through layers:
        for i in range( 1, len( self.layers ) + 1 ):
            # Compute activations:
            curr_act = np.dot( self.layers[ i - 1 ].weights, outputs[ i - 1 ] ) + self.layers[ i - 1 ].bias
            # Append current signals:
            activations.append( curr_act )
            outputs.append( self.activation_fn( curr_act ) )
        # Return signals:
        return activations, outputs
#
    def predict(self,input):
        # Feed forward:
        activations, outputs = self.predictSignal( input )
        # Return final layer output:
        return outputs[ -1 ]
#
    def getErrorRate(self, labels, guesses):
        return np.mean( np.square( labels - guesses ) )
#
    def trainEpoch(self,input,target,learn_rate):
        num_outdims  = target.shape[ 0 ]
        num_examples = target.shape[ 1 ]
        # Feed forward:
        activations, outputs = self.predictSignal( input )
        # Setup deltas:
        deltas = []
        count  = len( self.layers )
        # Back propagate from final outputs:
        deltas.append( self.activation_dfn( activations[ count ] ) * ( outputs[ count ] - target ) )
        # Back propagate remaining layers:
        for i in range( count - 1, 0, -1 ):
            deltas.append( self.activation_dfn( activations[ i ] ) * np.dot( self.layers[ i ].weights.T, deltas[ -1 ] ) )
        # Compute batch multiplier:
        batch_mult = learn_rate * ( 1.0 / float( num_examples ) )
        # Apply deltas:
        for i in range( count ):
            self.layers[ i ].weights -= batch_mult * np.dot( deltas[ count - i - 1 ], outputs[ i ].T )
            self.layers[ i ].bias    -= batch_mult * np.expand_dims( np.sum( deltas[ count - i - 1 ], axis=1 ), axis=1 )
        # Return error rate:
        return ( np.sum( np.absolute( target - outputs[ -1 ] ) ) / num_examples / num_outdims )
#
    def train(self,input,target,validation_samples,validation_labels,learn_rate,epochs,batch_size = 10,report_freq = 10):
        num_examples = target.shape[ 1 ]
        # Iterate over each training epoch:
        print ("Epoch,\tTraining Error,\tValidation Error")
#
        for epoch in range( epochs ):
            error = 0.0
            # Iterate over each training batch:
            for start in range( 0, num_examples, batch_size ):
                # Compute batch stop index:
                stop = min( start + batch_size, num_examples )
                # Perform training epoch on batch:
                batch_error = self.trainEpoch( input[ :, start:stop ], target[ :, start:stop ], learn_rate )
                # Add scaled batch error to total error:
                error += batch_error * ( float( stop - start ) / float( num_examples ) )
            # Report error, if applicable:
            if epoch % report_freq == 0:
                validation_guesses = self.predict( validation_samples )
                validation_error = self.getErrorRate( validation_labels, validation_guesses )
                # Print report:
                print ("%d,\t%f,\t%f" % ( epoch, error, validation_error ))
#
# Testing with Wine Quality (?)
# https://archive.ics.uci.edu/ml/datasets/Wine+Quality
#Hyperparameters
number_of_samples = 6497
input_size = 11
hidden_size = 20
hidden_size_2 = 20
output_size = 1
percentage_training_data = 0.8
batch_size  = 100
epoch_cnt   = 20000
report_freq = 100
learn_rate  = 0.05
#
training_size = int(number_of_samples*percentage_training_data)
testing_size = number_of_samples - training_size
#
# Create MLP
mlp = Mlp([input_size, hidden_size, output_size],"tanh")
#
# dataset
# Create empty matrix for inputs
data = np.zeros( (number_of_samples, input_size+output_size) )
counter = 0
# CSV reference: https://docs.python.org/3/library/csv.html
with open("winequality-red.csv",newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=';')
    for index,row in enumerate(reader):
        if(index != 0): # If it is not the first row
            data[counter] = row
            counter += 1
#
with open("winequality-white.csv",newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=';')
    for index,row in enumerate(reader):
        if(index != 0): # If it is not the first row
            data[counter] = row 
            counter += 1
#
# After the data has been loaded, shuffle it
np.random.shuffle(data)
#
# Separate the data in inputs and outputs
inputs = data[:, 0:-1]
outputs = data[:, -1]
#
# Separate further into training and testing
training_inputs = inputs[0:training_size]
training_outputs = outputs[0:training_size]
#
# Adjusting dimensions (?)
training_inputs = training_inputs.T
training_outputs = np.reshape(training_outputs, (training_outputs.shape[0],1)).T
#
testing_inputs = inputs[training_size:]
testing_outputs = outputs[training_size:]
#
testing_inputs = testing_inputs.T
testing_outputs = np.reshape(testing_outputs, (testing_outputs.shape[0],1)).T
#
# Train
#mlp.train( training_inputs, training_outputs, learn_rate, epoch_cnt, batch_size, report_freq )
mlp.train( training_inputs, training_outputs, testing_inputs, testing_outputs, learn_rate, epoch_cnt, batch_size, report_freq )
# Predict (not working)
#print("Given output")
#print(mlp.predict(testing_inputs[0]))
#print("Expected output")
#print(testing_outputs[0])