My last blog post was about neural networks and the work I was doing with them in a College Course.
That project is now over. Here are some of the experiments I’ve made and what I’ll do next regarding this topic.
The first experiment was very simple. I used it more to see the implementation in action as whole as opposed to test the correctness of the Neural Network and the cost functions, etc.
I wanted to see if I could create a network, train it, feed some input and get some output, etc. One interesting thing implement and also tested were was serialization. Suppose the following code:
net = NeuralNetwork([1, 1]) net.train(training, its=300, step=0.5, verbose=True) with open("neg_network_file", "wb") as f: net.dump(f)
So what is happening here is that a very simple network is created, trained, and dumped into a file. Dumping the network to a file means that the configuration of the network along with the values of weights and biases are stored in memory for later use. After training a network and dumping it, it is possible to load it from memory for later use in some other execution, like so:
new_net = NeuralNetwork() with open("neg_network_file", "rb") as f: new_net.load(f) new_net.feedForward(np.array())
The negation experiment itself just creates a network with 1 input, 1 output, and no middle layer. It then creates a dummy training dataset which contains 100 entries of each:
[1, 0] [0, 1]
So 100 entries state that for and input of 1, the output should be 0, and another 100 entries state the opposite. The idea is that the input is negated. Feeding any values between 0 and 1 to this network will yield the opposite value with little to no error. Feeding a value like, 0.2, for example, will yield a value close to 0.8.
This experiment was made as a toy problem to actually test if the network was being trained and if it could actually make correct classifications. To begin with, a random dataset is created. The dataset consists of points in a 2D space. Each coordinate is between 0.0 and 1.0. It is stated that the points should be placed in one of four classes. So, if a point’s x coordinate is between, say, 0.0 and 0.5, and y is between 0.0 and 0.1, then that point is in class 1. In some other space, the point would be class 2, and so on.
This experiment was made with different class configurations, iterations number, learning steps, etc. In the end it was verified that the implementation worded.
The point of the project was really solving the classical digit recognition problem using the MNIST dataset. Off course, being a classical problem, the very same dataset has been used with many different models with great success, and problems more complex then this has been solved over the years. This project was just for educational purposes.
I’ve made a few experiments with networks without middle layer, with a middle with 15 neurons (like in the book), 10, 100, 500, a few other things. The best results was the network without a middle layer, which I find to be quite strange, and with networks with big middle layers.
In the end I noticed that Kaggle had a competition opened for this dataset. So I made a submission. My submission is in place 597 out of 651 with a hit-rate of 87%, which sucks. But has this was just an educational project I am satisfied for now.
In the future I may try to get better results by trying other training methods and different network configuration.
This as been a very busy semester at UÉ, so I haven’t written much for this blog. In this post I give an update on something in which I’ve been recently occupied, Neural Networks.
I’ve taken a course on Machine Learning. One of the topics that I wanted to explore was Neural Networks. To do so I started by reading books on the subject. Next I found a few things online and work from that.
The book “Machine Learning”, by Tom Mitchel, was a great starting point. It talked about Perceptrons and Sigmoids, gave an introduction for Gradient Descent, introduced the concept of Neural Network built from Sigmoid Neurons, and finally talked a about the Back Propagation and how this algorithm is used to train the network.
After that I’ve looked for things online. Off course, there are many resources available. I found two particular websites to be of interest. These:
The first website, “Neural Networks and Deep Learning”, really caught my attention. It explains NNs in full detail. It goes on to show an example of using NNs to recognize handwritten numbers using the famous MNIST Dataset. What the network does is to take a handwritten number from 0 to 9 in an image of 28×28 pixels and recognize which number it actually represents. The site explains all the equations used in details and also provides a Python+Numpy implementation.
I myself wanted to implement my own version of the algorithms. I prefer to do so because I learn way much more then just by using an implementation found on the Internet. My version is very similar to the one by the author of the book, but it does have a few differences.
Implementing this algorithm took a few hours of understanding every little detail of the equations that describe how Back Propagation calculates the derivatives of the quadratic error function, how Gradient Descent uses those results, how training is done, etc. It was worth it.
The code can be found in my GitHub page.
I’ve made two simple experiments. They consist in creating a training set of random points that get classified as being in a given class. The network then learns from that training set and is able to classify new points into their correct classes.
I don’t have anything to formal to show yet, or any real example. In the following weeks I will be looking into real applications of NNs and I will use my implementation in the MMNIST Dataset, like in the book.
During the last semester of my undergraduate I had a course on Artificial Intelligence. A cool thing about this course is that Prolog was used to implement everything. This had a great advantage. Because it is a logical programming language, Prolog makes it almost trivial to implement any of the algorithms used in that course. The first problem deals with a simple path finding algorithm, the second with moving boxes around.
Every code in this post is hosted at GitHub.
There are two problems to be solved. The first one is very simple. There is a n-by-m maze like the following one:
+-----+ |A XX| |X | | X | | XXX| | B| +-----+
So it is a 5-by-5 space with a few obstacles marked by X making it an incredible maze. The objective here is to have an intelligent agent going from one location A to a location B. So, a simple path finder algorithm, barely intelligent, should solve the problem.
The second problem is similar. Having another maze, like the one before, the objective of the agent is to push a box around from its starting position to a final position, a bit like Sokoban.
A formal language is used to describe the actions of the agent. In the first example those actions are just, move up, move left, etc. Formally, they are written has:
m(a(X, Y), Action).
The first argument of the m/2 term is a term describing the current position of the agent. The second is the action it self.
The second problem has not only the move action but also four more actions that state how the block is pushed.
The Prolog implementation should result in a list of these actions that allow the agent to go from its starting point to its ending point.
A trivial and unintelligent solution for both problems would just consider the maze, the starting position of the agent, and start spamming possible paths. Like this a tree of states can be built.
The root state is represented with the location of the agent at the beginning of the simulation. Supposing the maze in given in the example, the starting state would just be:
m(a(0, 0), start).
So the agent starts in coordinates (0, 0) with action Start. The ending state could be:
m(a(4, 4), \_).
Which means that the agent must end when it reaches the coordinates (4, 4). The second argument of the term is a singleton because it doesn’t matter from where the agent comes, only that it gets there. If it did matter, one could assert that the agent could only finish the maze if it reached the (4, 4) coordinates if it entered from the left, in which case the end state would be:
m(a(4, 4), left).
This is a terrible solution. It will start spamming paths all over the place and soon it would have calculated millions of paths. Even if the system is programmed to not spam related paths, which it is, it is still not a good solution.
The first intelligent thing the agent can do in this situation is to try and take a guess about where it should go next. So instead of trying every possible path until it finds the end, try only the paths that reduce the distance between the starting point and the ending point.
Supposing the agent is in an empty 10-by-10 maze at coordinates (5, 5), and the ending point is at coordinates (5, 0). The naive approach would have the agent going in every direction, spamming paths from (5, 5) to every other point in the maze. The informed search would make the agent only try to go in the direction of (5, 0).
So in this situation, the agent would go to coordinates (5, 4), because these are the ones closer to the ending state. Continuing with the same reasoning, the agent would now go to (5, 3), and so on until it reached the ending state.
To calculate the distance between the current state and the ending state simple metrics can be used. In this implementation the Euclidean Distance and the Manhattan Distance are used. They yield basically the same results, for they are somewhat similar.
This is consistent with the A* algorithms. There is still not a very intelligent solution, but it is a starting point to solve such a simple problem.
There are five files in this implementation. The first file to be considered is boards.pl. This file simply asserts a few boards to be used by the agents. The name of this should be mazes, but I don’t fell like refracting those names right now.
Next are the files part1.pl and part2.pl. This files are the ones that implement some basic rules that describe what the agents can do. This is done with the move predicate. In part 1 there are four move predicates, one of them is this:
move(m(a(X, Y), M), m(a(NX, Y), right), 1) :- M \= left, NX is X + 1, validateMove(NX, Y).
This states that the agent can go from (X, Y) to (X+1, Y) if it didn’t came from the left
(M \= left) and if the move is somehow valid, which is tested by the
validateMove/2 predicate. The third argument in this predicate, that number 1 in the code, just states that this move has cost 1.
The file bfs_algo.pl implements the search algorithms, both naive and informed. bfs actually stands for Breath First Search, but the file has all. Again, I don’t feel like refracting this right now.
The util.pl file just has a few simple predicates that are used around the implementation.
This implementation runs on Swipl and gprolog. It almost runs on Yap, but I need to do a few things first. (May not be implemented as of the writing of this post.
To execute, just import either part1 or part2, like so:
Then you can execute one of the implemented tests. The tests are implemented in the bottom of both files. One example from the first part is:
test1 :- test(nai_s, b1, m(a(1, 6), start), m(a(4, 0), _)).
Analysing each argument, this test uses the naive search (arg 1), and board 1 (arg2). The starting position for the agent (arg 3), is coordinates (1, 6). The ending position (arg 4) should be (4, 0), where the last action is irrelevant.
The tests for the second part are similar.
test1 :- test(nai_s, b4, m(a(0, 0), b(1, 1), start), m(_, b(2, 2), _)).
Again, this particular example uses the naive search and uses board one. The starting position states that the agent is at (0, 0) and there is a box which can be moved at (1, 1). The ending position states that the box needs to be at (2, 2), the last action and the last agent position are irrelevant, only the position of the box is important.
And that’s it. A very simple intelligent agent that moves in mazes and moves boxes. It should be noted that going from part 1 to part 2 in Prolog is actually very simple, once you mastered the usage of logical programming. You don’t have to work on new structures and functions and adapt the algorithms, you just define new predicates that define movement. I happen to find this very interesting! This allows reasoning of algorithms in new and interesting ways!!
Refracting the code to their proper names is left as an exercise to the reader.