My last blog post was about neural networks and the work I was doing with them in a College Course.
That project is now over. Here are some of the experiments I’ve made and what I’ll do next regarding this topic.
The first experiment was very simple. I used it more to see the implementation in action as whole as opposed to test the correctness of the Neural Network and the cost functions, etc.
I wanted to see if I could create a network, train it, feed some input and get some output, etc. One interesting thing implement and also tested were was serialization. Suppose the following code:
net = NeuralNetwork([1, 1]) net.train(training, its=300, step=0.5, verbose=True) with open("neg_network_file", "wb") as f: net.dump(f)
So what is happening here is that a very simple network is created, trained, and dumped into a file. Dumping the network to a file means that the configuration of the network along with the values of weights and biases are stored in memory for later use. After training a network and dumping it, it is possible to load it from memory for later use in some other execution, like so:
new_net = NeuralNetwork() with open("neg_network_file", "rb") as f: new_net.load(f) new_net.feedForward(np.array())
The negation experiment itself just creates a network with 1 input, 1 output, and no middle layer. It then creates a dummy training dataset which contains 100 entries of each:
[1, 0] [0, 1]
So 100 entries state that for and input of 1, the output should be 0, and another 100 entries state the opposite. The idea is that the input is negated. Feeding any values between 0 and 1 to this network will yield the opposite value with little to no error. Feeding a value like, 0.2, for example, will yield a value close to 0.8.
This experiment was made as a toy problem to actually test if the network was being trained and if it could actually make correct classifications. To begin with, a random dataset is created. The dataset consists of points in a 2D space. Each coordinate is between 0.0 and 1.0. It is stated that the points should be placed in one of four classes. So, if a point’s x coordinate is between, say, 0.0 and 0.5, and y is between 0.0 and 0.1, then that point is in class 1. In some other space, the point would be class 2, and so on.
This experiment was made with different class configurations, iterations number, learning steps, etc. In the end it was verified that the implementation worded.
The point of the project was really solving the classical digit recognition problem using the MNIST dataset. Off course, being a classical problem, the very same dataset has been used with many different models with great success, and problems more complex then this has been solved over the years. This project was just for educational purposes.
I’ve made a few experiments with networks without middle layer, with a middle with 15 neurons (like in the book), 10, 100, 500, a few other things. The best results was the network without a middle layer, which I find to be quite strange, and with networks with big middle layers.
In the end I noticed that Kaggle had a competition opened for this dataset. So I made a submission. My submission is in place 597 out of 651 with a hit-rate of 87%, which sucks. But has this was just an educational project I am satisfied for now.
In the future I may try to get better results by trying other training methods and different network configuration.