Archive | July 2014

Factors in ProbPy – Part 1

Every data structure and algorithm in the ProbPy library fall on two very important classes, the Factor class and the RandVar class. This article, and the next, will talk about what these classes are, how they work, and why they are used like this.


Let us consider a binary variable X which has a domain of either True or False. Let us now consider a distribution over that variable P(X) for which we assign the following probabilities:

P(X=True)  = 0.8,
P(X=False) = 0.2.

So we have a distribution over a binary variable. Nothing new here, it all adds up to 1, meaning:

P(X=True) + P(X=False) = 1.

For many reasons, which I’m not going to get in now, it is quite convenient to represent these distributions not as individual values but has a whole vector. That vector is indexed by the very domain of the variable, meaning that the first element corresponds to the probability of X=True and the second element corresponds to the probability of X=False. In a more compact way, we can write:

P(X) = [0.8, 0.2].

To expand on this idea we can start considering distributions that have two variable, like the join distribution P(X, Y) or the conditional distribution P(X | Y). To represent the values of this distribution we can’t resort to the structure of a vector, we use a matrix instead. That matrix will be indexed by the domain of both X and Y. The following shows an example of this. For simplicity sake we consider both variables to be binary.

P(X, Y) = | 0.2 0.4 |
          | 0.3 0.1 |

So if we consider the rows to be indexed by X and the columns indexed by Y we have, for example, the probability P(X=True, Y=True) = 0.2 and P(X=True, Y=False) = 0.4. If we consider the distribution P(X | Y) we would need to have different values than the ones shown above, but the representation would be the same. The domain of the variables would index a matrix and the elements of that matrix would be the values of the distribution.

This leads to the notion of factor. This is a representation that doesn’t consider whether the variables are conditioning or not. It also doesn’t consider if the matrix being represented is a probability distribution. It can simply be a relation between a few values and a set of random variables, or a function mapping random variables to some values.

So, rather then writing P(X, Y) or P(X | Y) we simply write f(X, Y). The probabilistic significance of the variables is therefor not represented because it is not needed for the actual operations between the values.

Continuing the generalization of this idea of factor we can now consider what happens with a factor, with three variables, like f(X, Y, Z). The idea is off course the same, the representation would use a three dimensional matrix. Like the following:

f(X, Y, Z=True) =  | 0.05 0.1 |
                   | 0.1  0.1 |

f(X, Y, Z=False) = | 0.15 0.1 |
                   | 0.2  0.2 |

Having an arbitrary number of random variables in the factor we can now start thinking on the generalized data representation of a factor.

How Data is Represented

To figure how to store a factor in memory we can start by looking at how some programming languages store matrices.

In the C programming language it is possible to define a matrix in a very simple way, suppose the following 2×2 integer matrix:

mat = | 2 4 |
      | 3 1 |

The definition in the C programming language would be:

int mat[2][2] = {{2, 4}, {3, 1}};

In memory, the matrix gets vectorized. That means that it’s values, which were arranged in rows and columns, are logically placed in line following. The GDB output confirms this vectorization by showing the values of the matrix in memory:

(gdb) x/4x mat
0xbffff6a8:    2    4    3    1

We would get something similar with three dimensions. The matrix:

mat[0][][] = | 2 4 |
             | 3 1 |

mat[1][][] = | 7 5 |
             | 6 8 |

Would become:

int mat[2][2][2] = {{{2, 4}, {3, 1}}, {{7, 5}, {6, 8}}};


(gdb) x/8x mat
0xbffff698:   2   4   3   1
0xbffff6a8:   7   5   6   8

Vectorization is done using a simple process. The first element placed in memory is the element at mat[0][0][0], next one of the indexes is incremented, which would be the one on the far right, mat[0][0][1]. So to this point we have the array:

mat = [2, 4, ...].

Continuing this, if we increment the index on the right again we get out of bounds, so we go to the first value of that index and increment the index in the middle having mat[0][1][0].

It should be clear how the rest plays out. To gain a better visualization of how a multidimensional matrix can be vectorized the following representation is used, for a matrix with indexes mat[z][y][x]:

mat=[ 2   4   3   1   7   5   6   8 ]
  x | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
  y |   0   |   1   |   0   |   1   |
  z |       0       |       1       |
  i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

The first line in the representation corresponds to the values of the matrix itself. The three lines starting with x, y, and z corresponds to the values of the index for each element. For example, supposing the tuple of index (x, y, z), the number one falls in indexes (1, 1, 0), the number 5 falls in indexes (1, 0, 1), and so on.

The question now is, how do we take our indexing variables and find the i index? We do this with a series of sums and multiplications. For the example above we have:

i = x + 2*y + 2*2*z.

So we get the value of x and start by summing 2*y. We multiply by two because each index of y encapsulates every index of x, hence the number 2 comes from the size of x.

After that we do the same for the z variable we add it’s values multiplied by 2 one time because of the x variable and multiply by 2 another time because of the y variable.

To finish this chapter, suppose we do the same with a factor in which index A has size 3, and index B has size 2.

mat = | 4 2 6 |
      | 8 1 9 |

The representation would be:

mat=[ 4   2   6   8   1   9 ]
  A | 0 | 1 | 2 | 0 | 1 | 2 |
  B |     0     |     1     |
  i | 0 | 1 | 2 | 3 | 4 | 5 |

But what if we start by changing the B variable first instead of changing A? We would have:

mat=[ 4   8   2   1   6   9 ]
  B | 0 | 1 | 0 | 1 | 0 | 1 |
  A |   0   |   1   |   2   |
  i | 0 | 1 | 2 | 3 | 4 | 5 |

What do we have? The order of the values is different, the order of the variables is different and some i indexes correspond to different values of mat. But each tuple of indexes (A, B) still correspond to the same value, that is, each number 8 in the first representation has indexes A=1 and B=0, the same for the second representation, and the same for any other values in mat.

Concluding, we see that the order of the variables indexing the factor is very important. So to store a factor using this method we need to store the actual values and the order of the variables. This may seem bad, but in the next post we see that this method brings flexibility when working with multi variable factors. That flexibility is what allows one to define an arbitrary factor using some variables and then allows one to make operations between these factors in an easy way, like multiplications, additions, variable instantiations, among other things.

Working with ProbPy

This articles shows a few things that can be done with the ProbPy library currently in development. The read should have a basic knowledge of probability theory to be able to work with this library.

Random Variables and Factors

Before defining probability distributions we must first think about the random variables. In something like P(X, Y), the variables would be the X and the Y. So what are these random variables?

Lest take a simple example of a coin toss. When tossed, a coin can fall either with the heads side up or the tails side up. We disregard any other highly improbable outcome. In this case the outcome of the coin toss is represented using a random variable. Let’s call that random variable C. So if the coin falls with the head side up, C will take the value of H, and T otherwise.

The random variable is characterized with two things. First it’s actual name, which is C, and second, it’s domain. The domain of the variable can be thought of as an array [H, T] of size two. For now that is the only thing needed to characterize a variable. Defining a variable in ProbPy is done like this:

Coin = RandVar("Coin", ["H", "T"])

The arguments for the constructor are the name and domain of the variable respectively.

Variables for them selves are not very useful, so we extend to the implementation of probability distributions. A probability distribution for the coin toss scenario would be very simple:

P(Coin = Head) = 0.8
P(Coin = Tail) = 0.2

In here we have two events and their probability. We also note that this is an unfair coin. To represent this with ProbPy the following can be done:

PCoin = factor.Factor(Coin, [0.8, 0.2])

A few questions arise. Why is it called factor and not probability distribution? And what are the two arguments? More on the factor later. The first argument of this constructor is the defined Random Variable above. The second is a list of probability values. Because the domain of the variable is "H" followed by "T", we know that the 0.8 in the probabilities list is the probability of the Coin falling with it’s head side up, the 0.2 the probability otherwise. It’s a correspondence, index by index.

But now we need to represent something with many variables, like P(X, Y, Z). So first, the variables:

X = RandVar("X", ["T", "F"])
Y = RandVar("Y", ["T", "F"])
Z = RandVar("Z", ["T", "F"])
PCoin = Factor([X, Y, Z], values)

So first notice that the first argument is now a list of variables instead of a single variable. Also the second argument is just a variable named values. For this to work that variable needs to be a list of probability values just like the previous one, but how you make that list is not exactly trivial and will be the point of a follow up article. One of the next things to be implemented in this project is an easy way to write that list of probabilities, in a few weeks it will be done.

One last point. Why is the class called factor? Mathematically we have the probability distribution and conditional probabilities, like the following:

P(X | Y)
P(X, Y | Z, W)

Their meaning is different from non conditional probabilities, as you should know, but their implementation in this library is the same. It’s just a list of variables and the values in a list. The fact that the variables are conditioning the distribution or not is irrelevant. Also, you can have a function that maps the variables for some values that do not sum up to one, for example:

Q(X=T) = 10
Q(X=F) = 20

In here, Q maps the values of X from T and F to 10 and 20, making it not a probability distribution but still something that can be implemented in this library in exactly the same way. Objects like this are quite common when working with algorithms with probability distributions. Because of it we drop the left and right side variable notion and just stick with the term factor, which is a relation between random variables and their values.

Using ProbPy

We’ve seen what can be done in regards of variables and factors, but what can be implemented with it? To start off, suppose the following:

P(X | Y) P(Y) = P(X, Y)

We have the trivial chain rule where the product between the conditional distribution of X knowing Y and the distribution of Y is the join distribution. This is very common. This is how we do the same in ProbPy:

X = RandVar("X", ["T", "F"])
Y = RandVar("Y", ["T", "F"])
fy = Factor(Y, [0.2, 0.8]) # P(Y)
fx_y = Factor([X, Y], [0.1, 0.9, 0.5, 0.5]) # P(X | Y)
fxy = fx_y * fy # Results in factor that represents P(X, Y)

So we define the first two distributions and calculate the second. To keep going to something more complex, suppose the definition of the Bayes Theorem:

P(X | Y) P(Y) 1/P(X) = P(X | Y) P(Y) 1/n = P(Y | X)

In the above definition we suppose that 1/n is just a normalization constant. From the previous code we need only to add one extra line to get the final result:

fy_x = fxy.normalize(Y)

The normalization of the argument would be equivalent to dividing fxy by a factor with the probability of Y. This method is just simpler, but alternately we could do:

fx = fxy.marginal(X)
fx_y = fxy / fx

For more features of factor, such as calculating the expected value, instantiating variables in factors, and more, check the documentation in ProbPy/doc/_build/html/index.html, which is distributed with this project.

For now, note that an implementation of a Bayesian Network is done, along with the implementation of the Elimination Ask algorithm. An example of these features in use are in the examples directory distributed with this project. An implementation of ProbPy being used with the Bayes theorem is also distributed in the examples directory.

Future Work

One of the next things done is to write about the factorOp method that allows the implementation of products and other operations between factors. That method is basically what allows us to have a factor with an arbitrary number os random variables of different sizes and still work with them in a logical way.

Other things to be done is the implementation of more algorithms and examples to expand the library. The next few things will still be algorithms over Bayesian Networks and examples displaying those same algorithms. I would like to implement a method that makes it easy to work with index variables. Index variables are practical when implementing HMMs, for example.