My last blog post was about neural networks and the work I was doing with them in a College Course.
That project is now over. Here are some of the experiments I’ve made and what I’ll do next regarding this topic.
The first experiment was very simple. I used it more to see the implementation in action as whole as opposed to test the correctness of the Neural Network and the cost functions, etc.
I wanted to see if I could create a network, train it, feed some input and get some output, etc. One interesting thing implement and also tested were was serialization. Suppose the following code:
net = NeuralNetwork([1, 1]) net.train(training, its=300, step=0.5, verbose=True) with open("neg_network_file", "wb") as f: net.dump(f)
So what is happening here is that a very simple network is created, trained, and dumped into a file. Dumping the network to a file means that the configuration of the network along with the values of weights and biases are stored in memory for later use. After training a network and dumping it, it is possible to load it from memory for later use in some other execution, like so:
new_net = NeuralNetwork() with open("neg_network_file", "rb") as f: new_net.load(f) new_net.feedForward(np.array())
The negation experiment itself just creates a network with 1 input, 1 output, and no middle layer. It then creates a dummy training dataset which contains 100 entries of each:
[1, 0] [0, 1]
So 100 entries state that for and input of 1, the output should be 0, and another 100 entries state the opposite. The idea is that the input is negated. Feeding any values between 0 and 1 to this network will yield the opposite value with little to no error. Feeding a value like, 0.2, for example, will yield a value close to 0.8.
This experiment was made as a toy problem to actually test if the network was being trained and if it could actually make correct classifications. To begin with, a random dataset is created. The dataset consists of points in a 2D space. Each coordinate is between 0.0 and 1.0. It is stated that the points should be placed in one of four classes. So, if a point’s x coordinate is between, say, 0.0 and 0.5, and y is between 0.0 and 0.1, then that point is in class 1. In some other space, the point would be class 2, and so on.
This experiment was made with different class configurations, iterations number, learning steps, etc. In the end it was verified that the implementation worded.
The point of the project was really solving the classical digit recognition problem using the MNIST dataset. Off course, being a classical problem, the very same dataset has been used with many different models with great success, and problems more complex then this has been solved over the years. This project was just for educational purposes.
I’ve made a few experiments with networks without middle layer, with a middle with 15 neurons (like in the book), 10, 100, 500, a few other things. The best results was the network without a middle layer, which I find to be quite strange, and with networks with big middle layers.
In the end I noticed that Kaggle had a competition opened for this dataset. So I made a submission. My submission is in place 597 out of 651 with a hit-rate of 87%, which sucks. But has this was just an educational project I am satisfied for now.
In the future I may try to get better results by trying other training methods and different network configuration.
This as been a very busy semester at UÉ, so I haven’t written much for this blog. In this post I give an update on something in which I’ve been recently occupied, Neural Networks.
I’ve taken a course on Machine Learning. One of the topics that I wanted to explore was Neural Networks. To do so I started by reading books on the subject. Next I found a few things online and work from that.
The book “Machine Learning”, by Tom Mitchel, was a great starting point. It talked about Perceptrons and Sigmoids, gave an introduction for Gradient Descent, introduced the concept of Neural Network built from Sigmoid Neurons, and finally talked a about the Back Propagation and how this algorithm is used to train the network.
After that I’ve looked for things online. Off course, there are many resources available. I found two particular websites to be of interest. These:
The first website, “Neural Networks and Deep Learning”, really caught my attention. It explains NNs in full detail. It goes on to show an example of using NNs to recognize handwritten numbers using the famous MNIST Dataset. What the network does is to take a handwritten number from 0 to 9 in an image of 28×28 pixels and recognize which number it actually represents. The site explains all the equations used in details and also provides a Python+Numpy implementation.
I myself wanted to implement my own version of the algorithms. I prefer to do so because I learn way much more then just by using an implementation found on the Internet. My version is very similar to the one by the author of the book, but it does have a few differences.
Implementing this algorithm took a few hours of understanding every little detail of the equations that describe how Back Propagation calculates the derivatives of the quadratic error function, how Gradient Descent uses those results, how training is done, etc. It was worth it.
The code can be found in my GitHub page.
I’ve made two simple experiments. They consist in creating a training set of random points that get classified as being in a given class. The network then learns from that training set and is able to classify new points into their correct classes.
I don’t have anything to formal to show yet, or any real example. In the following weeks I will be looking into real applications of NNs and I will use my implementation in the MMNIST Dataset, like in the book.