Archive | October 2014

ProbPy Tests – Part 1

Now that ProbPy is working and being used, (by me) two things need to be assured which are paramount for the library. One is its correctness, meaning that the library must produce correct results and must be reliable in doing so. The other is temporal and spacial efficiency. In this blog post I’ll make a first address regarding the second point.

Scripting languages such as Python always have hidden things working behind the scenes that may increase execution time without us realizing. In Python’s example, and in many other scripting languages, we see simple syntactic constructions that take up so few lines of code and end up consuming a lot of execution time. Naturally we need to be careful with what is used in a Python program in order to make it as fast as possible.

To begin with, ProbPy intends to offer an abstraction for factors and probabilistic operations. Every time a product between two factors is made, it takes a little more then just the element wise product between the factors to get to the result. So how is ProbPy in terms of speed?

To test that the same script was executed in the current version of ProbPy and in an older version that was in the repository. That older version was made right before a few improvements that speedup the algorithms. To set up these experiments the main ProbPy repository was cloned to another directory.

Setting Up

To clone the repository just do:

git clone /path/to/ProbPy

Now, there are two important commits for this test, one is the current master:


And the other is just an older commit made before some improvements:


To jump from one commit to another just do one of the following commands:

git checkout e366ecd78f91702a836efb30a9771bb8190326a2
git checkout aec6f44b1cd557bc238bc8eb5f3a2211705872bd

To get results in terms of execution time two scripts were made. They are exactly the same, but one is used for the older version of ProbPy while the other is used for the current master. They are in this repository at github. The scripts do three operations at each time.

The first operation creates two factors with n random variables. Both factors will have the same variables and in the same order. Their values are irrelevant. The script will iterate through n from 0 to 20 and measure how long will a factor product take to finish. The first iteration will just be the product between two scalar factors, that is, factors that have no variables and are just scalars. The second will have only one variable, the next will have two, and so on all the up to twenty random variables. Each script executes for the two commits specified.

The second operation is similar. The only difference is the order of the variables in the factor. The second factor will have exactly the same variables but in a reverse order. This is done to test whether the order of the variables is important or not.

Finally, the third test will run the product between two factors with the same variable in the same order, but it will make it so using Python’s profile tool. With this it will be easy to observe where is python taking to much time to execute the code, which functions are being called a lot, among other things.

Results of the First Test

The results of the first test will be displayed from 15 variables to 20. The execution time before 15 is to small so it is not displayed. The results for the older version:

15  0.47489118576049805
16  1.0099499225616455
17  2.180561065673828
18  4.577056884765625
19  9.783910989761353
20  20.155898094177246

For the master version:

15  0.3872871398925781
16  0.8108508586883545
17  1.7335929870605469
18  3.6766469478607178
19  7.790879011154175
20  16.35800790786743

In here it simple to see how the master version improves quite a bit. In the last example the gain between the master and the older version is almost four seconds. In this situation we are looking at the product between two factors that each have 20 random variables. That is a lot of variables and it is probable that no one will work with factors of these dimensions. However, the observed gain shows that the newer algorithm is factors, which is good.

Results of the Second Test

The second test, where one of the factors has the random variables reversed, has the following results. For the older version:

15  0.4831809997558594
16  1.0258629322052002
17  2.1770758628845215
18  4.642666816711426
19  9.749383211135864
20  20.390637159347534

For the master version:

15  0.3916609287261963
16  0.8321959972381592
17  1.8294868469238281
18  3.765069007873535
19  7.900938034057617
20  16.56574010848999

The differences between these results are irrelevant. It is simple to see that the order of the variables may not be very important, which was what was expected. The fact that the operations must take the order of variables into account is important because it is something that makes the algorithms a little more complex. However, choosing the order of the variables is not important.

Results of the Third Test

The third test uses the Python’s Profile tool and runs the product between two factors each with 20 variables. The results for the older:

        52429122 function calls in 132.335 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.008    0.008  132.335  132.335 <string>:1(<module>)
        1    0.000    0.000  132.327  132.327
  2097152    1.972    0.000    1.972    0.000<lambda>)
        1   88.247   88.247  132.327  132.327
        1    0.000    0.000  132.327  132.327
        1    0.453    0.453    0.453    0.453
        1    0.000    0.000  132.335  132.335 {built-in method exec}
 48234748   39.872    0.000   39.872    0.000 {built-in method len}
  2097215    1.782    0.000    1.782    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

For the master:

        4194670 function calls in 42.300 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.008    0.008   42.300   42.300 <string>:1(<module>)
        1    0.000    0.000   42.292   42.292
  2097152    1.992    0.000    1.992    0.000<lambda>)
        1   38.110   38.110   42.292   42.292
        1    0.445    0.445    0.445    0.445
        1    0.000    0.000    0.000    0.000
        1    0.000    0.000   42.292   42.292
        1    0.000    0.000   42.300   42.300 {built-in method exec}
      274    0.000    0.000    0.000    0.000 {built-in method len}
  2097236    1.744    0.000    1.744    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This results are very complete. They contain lots of useful information that may help decide where improvement needs to be done.

The first big difference between the two results is the usage of the len function. In the older implementation that functions was placed in the middle of loops where it wasn’t needed. The simple decision of taking the function out of the loops and calling them only once saved a lot of time. This is a trivial mistake anyone can make, placing a len function inside a loop.

Improvements Made Taking Profile Into Account

One of the worst things in ProbPy’s code is the final loop of the factorOp function. That function is the one that takes two factors and makes whatever operation it receives by parameter. The parameter is either a function or a lambda. The final loop is where the values of the operation are calculated. So, for very big factors the lambda will be called many many times, which is terrible.

Looking at the Profile output, the total amount of time it takes to call a lambda 2097152 times is not that much, but it is still something that can easily be eliminated with a different factorOp implementation. One thing to do is, instead of having the operation as a parameter, just implement the operation using the operators. The + operator for sums, * for products, etc. Like this the loop doesn’t call anything. The problem with this is that there needs to be a loop replicated for each operation, and the version using the parameter must still exist for other operations.

Another thing that is noticeable is the excessive amount of time the append function is called. There is some time wasted in there, but still there might be some parts of the code that may be blotted with to many unnecessary appends.

Future Improvements

Two would be cool if implemented in ProbPy. One is implementing a few method using parallelism. I have though about a way of making factorOp parallel, but I am yet to implement it. Using the Python’s library for this very end, a parallel version of ProbPy could be put up online and still have run in every machine with ProbPy. It is just a question of implementing what is necessary.

Another cool thing would be to implement some function using the C programming language. Like that, some of the most executed parts of the code could gain much speed.

In later post further test will be made not only to speed but also to the correctness of some inference algorithms which are already implemented.