I'd Rather Be Coding

Building a Matrix-based Neural Network From the Ground Up

24 Aug 2017

Earlier in the summer I began my quest to learn more about machine learning and had originally planned to just dive into the deep end with TensorFlow. I very quickly realized the error in this approach and updated my decision. I instead sought to build a neural network from the ground up. Not only did I wish to better understand the fundamental math, but also to familiarize myself with the mathematics libraries needed to effectively work with the mountain of data used to train a network.

Relevant xkcd.

Looking for guidance in my endeavor, I naturally turned to Google. Glancing over the search results, I found that almost all of them could be described by a Venn diagram with two categories: “Mathematically rigorous” and “Computationally well-designed.” Sadly, this Venn diagram features very little overlap. Wanting to have my cake and eat it too, I found the best of each category and spent what little free time I found myself with this past summer melding them together (Because c’mon, who doesn’t like Cake? They’re probably my favorite '90s alt-rock band).

In the “Mathematically rigorous” corner, Michael Nielsen’s neuralnetworksanddeeplearning.com is one of the best resources for anyone looking to get their feet wet in artificial neural networks, and for the can’t-be-beat price of free (But seriously, if you do read this book, please kick the author a few bucks. If a broke college student can do it, you can to)! Obviously, having some calculus and linear algebra under your belt is nice, but even if you don’t, Nielsen’s explanations of the underlying mathematical principles is very readable. The code in the book on the other hand is exactly what you would expect research code to look like. Although this isn’t inherently bad, PEP 8 isn’t too complicated and it greatly improves readability, a central tenant of Python. Indentations and whitespaces aside, the amount that Python’s memory management leaves to be desired grows proportionally to the scale of the amount of data it’s storing. The time complexity of using vanilla Python is a similar situation. Thankfully Nielsen addresses this early on, challenging the reader to implement the code using a more efficient library, such as NumPy.

Great! What’s NumPy? This led me to our top contender in the “Computationally well-designed” corner, a series of articles by machine learning researcher Brian Dolhansky. His code is beautiful, fully implementing NumPy and no God objects to be found! …nor an explanation of why a softmax activation function was applied in the output layer while the rest are sigmoid. It’s either assumed that you already know the basics or that you can learn them later. This is perfectly fine, as Nielsen’s book gives an excellent explanation of softmax neurons, as well as other more advanced techniques like weight regularization. Reverse-engineering Dolhansky’s code gave me an understanding of the framework in which to implement neural-network-based algorithms in a computationally efficient way.

So without further ado, here is my attempt at a ground-up implementation of a neural network.

 import numpy as np
import activations as sigma
import costs as cost
 
 
class Layer:
    def __init__(self, size, minibatch_size, input_layer=False,
                 output_layer=False, activation=sigma.prelu):
        self.size = size
        self.input_layer = input_layer
        self.output_layer = output_layer
        self.activation = activation
        self.z = np.zeros((minibatch_size, size[0]))
 
        if not input_layer:
            self.s = np.zeros((minibatch_size, size[1]))
            self.delta = np.zeros((minibatch_size, size[1]))
            # Initilize weights with gaussian distribution proportional to the
            # number of inputs of the input neuron
            self.weights = np.random.normal(size=size,
                                            scale=1/np.sqrt(size[0]))
            self.del_w = np.zeros_like(self.weights)
 
        if not input_layer and not output_layer:
            self.f_prime = np.zeros((minibatch_size, size[1]))
            self.biases = np.zeros((1, size[1]))
            # Initilize neurons with no bias, allowing biases to be learned
            # through backpropagation
            self.del_b = np.zeros_like(self.biases)
 
    def forward_propagate(self):
        if self.input_layer:  # No activation function applied to input layer
            return self.z
 
        if not self.input_layer and not self.output_layer:
            self.s = np.dot(self.s, self.weights)
            self.s = np.add(self.s, self.biases)
            self.f_prime = self.activation(self.s, deriv=True)
 
        else:
            self.s = np.dot(self.s, self.weights)
 
        self.z = self.activation(self.s)
 
        return self.z
 
 
class Network:
    def __init__(self, sizes, minibatch_size, cost=cost.cross_entropy):
        self.sizes = sizes
        self.num_layers = len(sizes)
        self.layers = np.empty(self.num_layers, dtype=object)
        self.minibatch_size = minibatch_size
        self.cost = cost
 
        print "Initilizing network..."
        for i in range(self.num_layers-1):
            if i == 0:
                print "\tInitilizing input layer of size {0}.".format(
                    sizes[i])
                self.layers[i] = Layer([sizes[i]], minibatch_size,
                                       input_layer=True)
            else:
                print "\tInitilizing hidden layer of size {0}.".format(
                    sizes[i])
                self.layers[i] = Layer([sizes[i-1], sizes[i]], minibatch_size)
 
        print "\tInitilizing output layer of size {0}.".format(sizes[-1])
        self.layers[-1] = Layer([sizes[-2], sizes[-1]], minibatch_size,
                                output_layer=True, activation=sigma.softmax)
 
        print "Done!"
 
    def forward_propagate(self, input_data):
        self.layers[0].z = input_data
        for i in range(self.num_layers-1):
            self.layers[i+1].s = self.layers[i].forward_propagate()
        return self.layers[-1].forward_propagate()
 
    def backpropagate(self, y_hat, label):
        # Calculate derivative of cost function
        self.layers[-1].delta = (self.cost).error(y_hat, label)
 
        for i in range(self.num_layers-2, 0, -1):
            self.layers[i].delta = np.dot(self.layers[i+1].delta,
                                          self.layers[i+1].weights.T) * \
                self.layers[i].f_prime
 
    def update_weights(self, n_examples, eta, lmbd):
        for i in range(1, self.num_layers):
            if i < self.num_layers-1:
                self.layers[i].del_b = \
                    np.dot(np.ones((1, self.minibatch_size)),
                           self.layers[i].delta)
 
                self.layers[i].biases = self.layers[i].biases - \
                    (eta / self.minibatch_size) * self.layers[i].del_b
 
            self.layers[i].del_w = np.dot(self.layers[i-1].z.T,
                                          self.layers[i].delta)
 
            # Apply L2 regularization to weight updates with regularization
            # rate lambda
            self.layers[i].weights = (1 - eta * (lmbd / n_examples)) * \
                self.layers[i].weights - (eta / self.minibatch_size) * \
                self.layers[i].del_w
 
    def train(self, data, n_examples, eta, lmbd):
        inputs, labels = data
        output = self.forward_propagate(inputs)
        self.backpropagate(output, labels)
        self.update_weights(n_examples, eta=eta, lmbd=lmbd)

The activations module contains a few different activation functions to play around with.

 # Module of activation functions for use in matrix-based neural networks
 
import numpy as np
 
 
def sigmoid(s, deriv=False):
    if not deriv:
        return 1.0 / (1.0 + np.exp(-s))
    else:
        return sigmoid(s) * (1.0 - sigmoid(s))
 
 
def softmax(s):
    z = np.sum(np.exp(s), axis=1)
    z = z.reshape(z.shape[0], 1)
    return np.exp(s) / z
 
 
def relu(s, deriv=False):
    if not deriv:
        for i in range(np.ma.size(s, axis=0)):
            for j in range(np.ma.size(s, axis=1)):
                if s[i, j] <= 0:
                    s[i, j] = 0
 
    else:
        for i in range(np.ma.size(s, axis=0)):
            for j in range(np.ma.size(s, axis=1)):
                if s[i, j] <= 0:
                    s[i, j] = 0
                else:
                    s[i, j] = 1
 
    return s
 
 
def prelu(s, alpha=0.1, deriv=False):
    if not deriv:
        for i in range(np.ma.size(s, axis=0)):
            for j in range(np.ma.size(s, axis=1)):
                if s[i, j] <= 0:
                    s[i, j] *= alpha
 
    else:
        for i in range(np.ma.size(s, axis=0)):
            for j in range(np.ma.size(s, axis=1)):
                if s[i, j] <= 0:
                    s[i, j] = alpha
                else:
                    s[i, j] = 1
 
    return s

Likewise, the cost module includes a couple of different cost functions for use with different activation functions.

 # Module of cost functions for use in matrix-based neural networks
 
import numpy as np
 
 
class quadratic(object):
    @staticmethod
    def func(y_hat, label):
        # Return cost of actual output "y_hat" with respect to expected
        # output "label"
        return 0.5*np.linalg.norm(y_hat - label)**2
 
    @staticmethod
    def error(y_hat, label):
        # Return the error signal "delta" from the output layer
        return (y_hat - label)
 
 
class cross_entropy(object):
    @staticmethod
    def func(y_hat, label):
        # Return cost of actual output "y_hat" with respect to expected
        # output "label"
        return np.sum(np.nan_to_num(-label * np.log(y_hat) - (1 - label) *
                                    np.log(1 - y_hat)))
 
    @staticmethod
    def error(y_hat, label):
        # Return the error signal "delta" from the output layer
        return (y_hat - label)

Although I’m not working with the raw MNIST dataset, it still needs to be loaded, and requires some additional pre-processing for training.

 import cPickle
import gzip
import numpy as np
 
 
def load_data():
    data = gzip.open('../data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = cPickle.load(data)
    data.close()
 
    training_inputs, training_labels = training_data
 
    vectorized_labels = np.zeros((len(training_labels), 10))
 
    for i in range(len(training_labels)):
        vectorized_labels[i] = vectorize_label(training_labels[i])
 
    training_data = zip(training_inputs, vectorized_labels)
    validation_data = zip(*validation_data)
    test_data = zip(*test_data)
 
    return training_data, validation_data, test_data
 
 
def vectorize_label(i):
    vectorized_label = np.zeros((10))
    vectorized_label[i] = 1
    return vectorized_label

The oh-so-cleverly named evaluator module handles the training and evaluation of the network, returning the results of each epoch.

 # Evaluates neural networks with given hyper-parameters and returns results
 
import numpy as np
 
 
def evaluate(network, epochs, eta, lmbd, training_data,
             validation_data=None, test_data=None):
 
    n_training = len(training_data)
 
    print "Training for {0} epochs...".format(epochs)
    for t in range(epochs):
        np.random.shuffle(training_data)
 
        out_str = "\tEpoch {0:2d}:".format(t+1)
 
        for i in range(n_training/network.minibatch_size):
            data = create_minibatch(training_data, i, network.minibatch_size,
                                    training_data=True)
            network.train(data, n_training, eta, lmbd)
 
        if validation_data:
            n_validation = len(validation_data)
            n_correct = 0
 
            for i in range(n_validation/network.minibatch_size):
                inputs, labels = create_minibatch(validation_data, i,
                                                  network.minibatch_size)
                output = network.forward_propagate(inputs)
                y_hat = np.argmax(output, axis=1)
                n_correct += np.sum(y_hat == labels)
 
            out_str = "{0} Training accuracy: {1:.2f}%".format(
                out_str, float(n_correct)/n_validation * 100)
 
        if test_data:
            n_test = len(test_data)
            n_correct = 0
            for i in range(n_test/network.minibatch_size):
                inputs, labels = create_minibatch(test_data, i,
                                                  network.minibatch_size)
                output = network.forward_propagate(inputs)
                y_hat = np.argmax(output, axis=1)
                n_correct += np.sum(y_hat == labels)
 
            out_str = "{0} Test accuracy: {1:.2f}%".format(
                out_str, float(n_correct)/n_test * 100)
 
        print out_str
 
 
def create_minibatch(data, i, minibatch_size, training_data=False):
    inputs, labels = zip(*data)
 
    n = np.size(inputs[0], axis=0)
 
    minibatch_inputs = np.zeros((minibatch_size, n))
    if training_data:
        minibatch_labels = np.empty((minibatch_size, 10))
 
        for j in range(minibatch_size):
            minibatch_inputs[j, :] = inputs[i+j]
            minibatch_labels[j, :] = labels[i+j]
 
    else:
        minibatch_labels = np.empty(minibatch_size)
 
        for j in range(minibatch_size):
            minibatch_inputs[j, :] = inputs[i+j]
            minibatch_labels[j] = int(labels[i+j])
 
    return minibatch_inputs, minibatch_labels

Finally, I wrote a quick script to initialize networks and evaluate them on the MNIST dataset with given hyper-parameters, greatly reducing the amount of repetitive typing, though sadly not eliminating it entirely.

 import mnist_loader
import matrix_network
import evaluator
 
training_data, validation_data, test_data = mnist_loader.load_data()
 
network = matrix_network.Network([784, 100, 10], 100)
 
evaluator.evaluate(network, 30, 1.0, 0.1, training_data,
                  validation_data, test_data)

There’s quite a bit going on in the code overall, despite the relatively simple algorithms underpinning it. The power of these networks lies in the scale of data that can be leveraged by these algorithms when implemented using modern matrix libraries. In the next few weeks, I’ll attempt to shed some light on what’s happening under the hood.

❋❋❋

Roll Your Own Python IDE

17 May 2017

I’ve found myself using Python significantly more in the past week and quickly grew tired of working in a shell in Terminal, so I started looking around for a good IDE that supported Python. I first tried Xcode, surprised to discover that you can specify whatever build tool you’d like for a project, as I already use it for iOS development, however wrestling with auto-indentation appearing in all of the wrong places became frustrating to say the least. Going back to the drawing board, I started scouring python.org’s wiki of various IDEs, trying just about every macOS-supported IDE, extension, and plugin listed. My wishlist of requirements and features might be a little long, but hey, almost all of my time spent when coding in Python will be within this integrated development environment (Hence the name), so I think I’m allowed to be a bit picky. Here it is:

Source code editor
Syntax-highlighting
Linter
Autocomplete
Compiler
Debugger
Autosave
Version control (Preferably through Git)
Free and open-source
Native application

Ok, so those last two aren’t deal-breakers, but they’d be greatly appreciated, especially that last one, given how atrocious Java app UIs tend to be (Looking at you, CrashPlan).

I had all but given up hope of finding an IDE, and started looking into text-editors that could hopefully fulfill at least part of my wishlist, which is when I found Atom, “A hackable text editor for the 21st century.” Now I had heard of Atom before as one of the contenders in the modern incarnation of the “vim vs emacs” war, facing off against Sublime. As a result, I had always just written it off as another glorified iteration of TextEdit, until I found this post about configuring it for Python. Configuring Atom with just the packages mentioned in the post alone crossed off quite a few of my wishlist requirements, even the optional points regarding a native, open source application. Seeing the plethora of other packages available for Atom, I tried taking things a step further, rolling my own Python IDE in Atom.

Atom, a hackable text editor for the 21st century.

Installing Packages

There are two ways to install packages in atom, you can either search for packages published on atom.io from the Install tab in Preferences, or use apm in Terminal. For example, to install Linter for Atom from Terminal, just enter:

apm install linter

Atom needs to be restarted after packages are installed for changes to take effect.

Configuring Atom

If you haven’t already, install Linter for Atom, mentioned above. Additionally, we’re going to need a Python source code checker, flake8, which we can get through pip:

pip install flake8

We’ll also need an interface between linter and flake8, linter-flake8. Again, you can use the Install tab in Preferences, or:

apm install linter-flake8

For autocomplete, you can install the autocomplete-python package. The packages script and python-debugger provide compiling and debugging functionality, respectively. There are a few dependency packages that Atom might ask you to install after restarting, such as intentions and busy-signal. Finally, autosave is a pre-installed package in Atom but is disabled by default. To enable, go to the Packages tab in Preferences, and search for “autosave,” which will appear under Core Packages. In its settings, select “Enabled.” But what about version control? I almost forgot! Atom comes with Git support and core APIs already baked in.

Atom is incredibly customizable thanks to the wealth of packages available, and this process could easily be applied to other languages. I look forward to seeing how far this text editor can go, as I explore and tinker with everything Atom has to offer.

❋❋❋

Machine Learning 101

5 May 2017

Computers have become ubiquitous in this day and age, permeating every facet of our lives. Considered to be Turing complete, they are capable of performing a set of given instructions, unwaveringly. While this is perfect for tasks involving exact mathematical processes, say for instance, calculating pi to over 22 trillion digits, they can’t perform simple tasks a child could, such as distinguishing a cat from a bird, or reading handwriting. Machine learning is the field of computer science aiming to achieve these feats, by giving computers the ability to operate without explicit instruction.

There are a variety of different approaches and algorithms being studied, from decision trees to inductive logic programming. One technique in particular, artificial neural networks, tries to emulate what some might say is the most advanced computer ever known, the human brain. Neural networks, as the name implies, are simply networks of interconnected nodes known as neurons. Neurons process multiple inputs, and decide whether or not to pass a signal on to the next one. The simplicity of individual neurons allows them to be interconnected into vast networks, amplifying their processing power with each layer.

A simple three layer neural network, courtesy Wikimedia Commons.

As ever-larger artificial neural networks are built with more and more layers, hidden layers grow deeper and deeper, this is where the term “Deep Learning” comes from.

From Biology to Bits

Biological neurons communicate through a type of chemical called neurotransmitters. These molecules bind with receptors on the cell’s dendrites and alter the electric potential across its membrane by allowing ions to flow in and out of the cell. When a large enough electric potential is produced, it propagates through the neuron, down its axon, where it releases neurotransmitters across a synapse to the dendrites of the next neuron.

All of that is a little convoluted to try to simulate, but we can reduce the process to a mathematically model-able one. Inputs can be anything as complicated as a dimension of a dataset to as simple as a binary yes or no. An artificial neuron will multiply each input by a variable weight, then sum the results. Input weights are adjusted through a process called backpropagation (We’ll get more into this later), which allows the network to evolve and produce “better” results (by whatever metric that’s measured) over time. To determine whether or not the neuron will “fire,” and what signal to pass to the next, the sum of the weighted inputs are passed through an activation function. A biological neuron’s activation function is simply a threshold that must be surpassed, but an artificial neuron’s can be more complicated. One such activation function commonly employed is the sigmoid function.

The sigmoid function, courtesy Wikimedia Commons.

The sigmoid function is an “S”-shaped curve and can map the weighted input sum to a value between 0 and 1 (Which helps simplify calculations with normalized data).

Making Mistakes

Much like how no one is born knowing how to ride a bike, artificial neural networks must also be taught and trained how to process inputs correctly. There are two schools of thought regarding learning: supervised and unsupervised. Backpropagation is a common training method that can be used in both settings, though is generally considered a supervised learning technique. A two part process, it begins with error propagation. The network’s output is compared to a desired output, and the error in the output is calculated by a loss function. This error is propagated backward through the network to determine each neuron’s contribution. These error values are then used to calculate the gradient of the loss function. The gradient is the slope or “steepness” of the loss function. The lowest point of the loss function is where error is smallest. The goal of the second part of the process is to find the input weights that will lead to this minimum. Optimization methods, such as gradient descent, are used to calculate new input weights for the network. These input weights are updated with the new, “optimized” values, and the process is repeated.

3-D plot of a neuron's error with two inputs, courtesy Wikimedia Commons.

For a step-by-step example of backpropagation, check out this tutorial.

The Tip of the Iceberg

Machine learning and artificial neural networks are exciting tools that are already integrating into the technology we use everyday. The scope of their use and abilities is only growing. While there are still many unique and perplexing challenges to overcome, one day our interactions with them may become as casual and universal as smartphones are today.

	import numpy as np
	import activations as sigma
	import costs as cost


	class Layer:
	def __init__(self, size, minibatch_size, input_layer=False,
	output_layer=False, activation=sigma.prelu):
	self.size = size
	self.input_layer = input_layer
	self.output_layer = output_layer
	self.activation = activation
	self.z = np.zeros((minibatch_size, size[0]))

	if not input_layer:
	self.s = np.zeros((minibatch_size, size[1]))
	self.delta = np.zeros((minibatch_size, size[1]))
	# Initilize weights with gaussian distribution proportional to the
	# number of inputs of the input neuron
	self.weights = np.random.normal(size=size,
	scale=1/np.sqrt(size[0]))
	self.del_w = np.zeros_like(self.weights)

	if not input_layer and not output_layer:
	self.f_prime = np.zeros((minibatch_size, size[1]))
	self.biases = np.zeros((1, size[1]))
	# Initilize neurons with no bias, allowing biases to be learned
	# through backpropagation
	self.del_b = np.zeros_like(self.biases)

	def forward_propagate(self):
	if self.input_layer: # No activation function applied to input layer
	return self.z

	if not self.input_layer and not self.output_layer:
	self.s = np.dot(self.s, self.weights)
	self.s = np.add(self.s, self.biases)
	self.f_prime = self.activation(self.s, deriv=True)

	else:
	self.s = np.dot(self.s, self.weights)

	self.z = self.activation(self.s)

	return self.z


	class Network:
	def __init__(self, sizes, minibatch_size, cost=cost.cross_entropy):
	self.sizes = sizes
	self.num_layers = len(sizes)
	self.layers = np.empty(self.num_layers, dtype=object)
	self.minibatch_size = minibatch_size
	self.cost = cost

	print "Initilizing network..."
	for i in range(self.num_layers-1):
	if i == 0:
	print "\tInitilizing input layer of size {0}.".format(
	sizes[i])
	self.layers[i] = Layer([sizes[i]], minibatch_size,
	input_layer=True)
	else:
	print "\tInitilizing hidden layer of size {0}.".format(
	sizes[i])
	self.layers[i] = Layer([sizes[i-1], sizes[i]], minibatch_size)

	print "\tInitilizing output layer of size {0}.".format(sizes[-1])
	self.layers[-1] = Layer([sizes[-2], sizes[-1]], minibatch_size,
	output_layer=True, activation=sigma.softmax)

	print "Done!"

	def forward_propagate(self, input_data):
	self.layers[0].z = input_data
	for i in range(self.num_layers-1):
	self.layers[i+1].s = self.layers[i].forward_propagate()
	return self.layers[-1].forward_propagate()

	def backpropagate(self, y_hat, label):
	# Calculate derivative of cost function
	self.layers[-1].delta = (self.cost).error(y_hat, label)

	for i in range(self.num_layers-2, 0, -1):
	self.layers[i].delta = np.dot(self.layers[i+1].delta,
	self.layers[i+1].weights.T) * \
	self.layers[i].f_prime

	def update_weights(self, n_examples, eta, lmbd):
	for i in range(1, self.num_layers):
	if i < self.num_layers-1:
	self.layers[i].del_b = \
	np.dot(np.ones((1, self.minibatch_size)),
	self.layers[i].delta)

	self.layers[i].biases = self.layers[i].biases - \
	(eta / self.minibatch_size) * self.layers[i].del_b

	self.layers[i].del_w = np.dot(self.layers[i-1].z.T,
	self.layers[i].delta)

	# Apply L2 regularization to weight updates with regularization
	# rate lambda
	self.layers[i].weights = (1 - eta * (lmbd / n_examples)) * \
	self.layers[i].weights - (eta / self.minibatch_size) * \
	self.layers[i].del_w

	def train(self, data, n_examples, eta, lmbd):
	inputs, labels = data
	output = self.forward_propagate(inputs)
	self.backpropagate(output, labels)
	self.update_weights(n_examples, eta=eta, lmbd=lmbd)

	# Module of activation functions for use in matrix-based neural networks

	import numpy as np


	def sigmoid(s, deriv=False):
	if not deriv:
	return 1.0 / (1.0 + np.exp(-s))
	else:
	return sigmoid(s) * (1.0 - sigmoid(s))


	def softmax(s):
	z = np.sum(np.exp(s), axis=1)
	z = z.reshape(z.shape[0], 1)
	return np.exp(s) / z


	def relu(s, deriv=False):
	if not deriv:
	for i in range(np.ma.size(s, axis=0)):
	for j in range(np.ma.size(s, axis=1)):
	if s[i, j] <= 0:
	s[i, j] = 0

	else:
	for i in range(np.ma.size(s, axis=0)):
	for j in range(np.ma.size(s, axis=1)):
	if s[i, j] <= 0:
	s[i, j] = 0
	else:
	s[i, j] = 1

	return s


	def prelu(s, alpha=0.1, deriv=False):
	if not deriv:
	for i in range(np.ma.size(s, axis=0)):
	for j in range(np.ma.size(s, axis=1)):
	if s[i, j] <= 0:
	s[i, j] *= alpha

	else:
	for i in range(np.ma.size(s, axis=0)):
	for j in range(np.ma.size(s, axis=1)):
	if s[i, j] <= 0:
	s[i, j] = alpha
	else:
	s[i, j] = 1

	return s

	# Module of cost functions for use in matrix-based neural networks

	import numpy as np


	class quadratic(object):
	@staticmethod
	def func(y_hat, label):
	# Return cost of actual output "y_hat" with respect to expected
	# output "label"
	return 0.5np.linalg.norm(y_hat - label)*2

	@staticmethod
	def error(y_hat, label):
	# Return the error signal "delta" from the output layer
	return (y_hat - label)


	class cross_entropy(object):
	@staticmethod
	def func(y_hat, label):
	# Return cost of actual output "y_hat" with respect to expected
	# output "label"
	return np.sum(np.nan_to_num(-label * np.log(y_hat) - (1 - label) *
	np.log(1 - y_hat)))

	@staticmethod
	def error(y_hat, label):
	# Return the error signal "delta" from the output layer
	return (y_hat - label)

	import cPickle
	import gzip
	import numpy as np


	def load_data():
	data = gzip.open('../data/mnist.pkl.gz', 'rb')
	training_data, validation_data, test_data = cPickle.load(data)
	data.close()

	training_inputs, training_labels = training_data

	vectorized_labels = np.zeros((len(training_labels), 10))

	for i in range(len(training_labels)):
	vectorized_labels[i] = vectorize_label(training_labels[i])

	training_data = zip(training_inputs, vectorized_labels)
	validation_data = zip(*validation_data)
	test_data = zip(*test_data)

	return training_data, validation_data, test_data


	def vectorize_label(i):
	vectorized_label = np.zeros((10))
	vectorized_label[i] = 1
	return vectorized_label

	# Evaluates neural networks with given hyper-parameters and returns results

	import numpy as np


	def evaluate(network, epochs, eta, lmbd, training_data,
	validation_data=None, test_data=None):

	n_training = len(training_data)

	print "Training for {0} epochs...".format(epochs)
	for t in range(epochs):
	np.random.shuffle(training_data)

	out_str = "\tEpoch {0:2d}:".format(t+1)

	for i in range(n_training/network.minibatch_size):
	data = create_minibatch(training_data, i, network.minibatch_size,
	training_data=True)
	network.train(data, n_training, eta, lmbd)

	if validation_data:
	n_validation = len(validation_data)
	n_correct = 0

	for i in range(n_validation/network.minibatch_size):
	inputs, labels = create_minibatch(validation_data, i,
	network.minibatch_size)
	output = network.forward_propagate(inputs)
	y_hat = np.argmax(output, axis=1)
	n_correct += np.sum(y_hat == labels)

	out_str = "{0} Training accuracy: {1:.2f}%".format(
	out_str, float(n_correct)/n_validation * 100)

	if test_data:
	n_test = len(test_data)
	n_correct = 0
	for i in range(n_test/network.minibatch_size):
	inputs, labels = create_minibatch(test_data, i,
	network.minibatch_size)
	output = network.forward_propagate(inputs)
	y_hat = np.argmax(output, axis=1)
	n_correct += np.sum(y_hat == labels)

	out_str = "{0} Test accuracy: {1:.2f}%".format(
	out_str, float(n_correct)/n_test * 100)

	print out_str


	def create_minibatch(data, i, minibatch_size, training_data=False):
	inputs, labels = zip(*data)

	n = np.size(inputs[0], axis=0)

	minibatch_inputs = np.zeros((minibatch_size, n))
	if training_data:
	minibatch_labels = np.empty((minibatch_size, 10))

	for j in range(minibatch_size):
	minibatch_inputs[j, :] = inputs[i+j]
	minibatch_labels[j, :] = labels[i+j]

	else:
	minibatch_labels = np.empty(minibatch_size)

	for j in range(minibatch_size):
	minibatch_inputs[j, :] = inputs[i+j]
	minibatch_labels[j] = int(labels[i+j])

	return minibatch_inputs, minibatch_labels

	import mnist_loader
	import matrix_network
	import evaluator

	training_data, validation_data, test_data = mnist_loader.load_data()

	network = matrix_network.Network([784, 100, 10], 100)

	evaluator.evaluate(network, 30, 1.0, 0.1, training_data,
	validation_data, test_data)