Reversible-Jump Markov Chain Monte Carlo Multi-Object Tracking Tutorial
The goal of this tutorial is to provide an understanding of Reversible-Jump Markov Chain Monte Carlo (RJMCMC) using multi-object video tracking as an example application. RJMCMC, also known as trans-dimensional MCMC, is a method of approximate inference for a Dynamic Bayesian Network (DBN). I have tried to present the RJMCMC approach in clear and simple terms in this tutorial, with the aid of some graphics and animations to illustrate some of the more difficult concepts. I have also tried to give a general algorithmic description of the RJMCMC approach, so that the reader is free to implement it in the style and programming language of his or her choice.
What is Multi-Object Video Tracking?
Multi-object video tracking is the process of locating a variable number of objects by analyzing the video output of one or more cameras. This process can be online, where only past and present video information is available for processing, or offline, where all video information is available. By their nature, only online tracking methods can be run in real-time.
To serve as a simple real-world example of a multi-object tracking problem, we will consider the scenario shown in the animation of Figure 1 below, in which the goal is to track four billiard balls. We'll treat the animation as the camera view, with the balls free to roll about, collide with one another, and enter or exit the scene.
Figure 1. Billiard ball example.
Modeling the Problem: Dynamic Bayesian Network
We start by modeling the tracking problem using a Dynamic Bayesian Network (DBN). A DBN is a directed acyclic graph, which is a type of graphical model. Graphical models represent dependencies betwteen random variables through a graph structure, where random variables are represented by nodes (drawn as circles), and the edges between the nodes represent the conditional dependencies (drawn as lines). A typical DBN for tracking systems is shown below in Figure 4.
For our approach to multi-object tracking, there are two types of random variables in the graph structure. The first represents the multi-object configuration, which we denote as X. The multi-object configuration describes the state of the objects in the scene through a set of parameters (which might include object indexes, position, height, width, shape, etc). These parameters are unknown (or hidden), and tracking is the task of estimating these parameters from what we can observe. For our billiard ball example, the multi-object configuration will contain the indexes of the balls in the scene (which can vary as some balls enter or exit the scene), and the parameters describing each ball present in the scene. For this simple example, these parameters include the position in the image (x,y) and the radius of the ball r. See the animation in Figure 2.
Figure 2. The multi-object configuration.
The other random variable in our graphical model represents the observations, which we denote Z. Whereas the multi-object configuration was unknown to us, the observations represent what is known. For visual tracking, the observations amount to information taken from the image. There is a plethora of image features to choose from, but common features include shape, color, texture, and motion information. For the billiard ball example, we will use edges and color as our observations. To detect edges, we will use a Canny edge detector. For color we will look in the HSV color space. In the images below, you can see what some of these observations might look like for the first frame of the animation from Figure 1.
Above: original image from billiard ball example |
Below: various observations computed from the image |
Edges | Hue |
Color Saturation | Brightness |
It is important to understand that the multi-object configration X represents the actual state of the objects we are tracking, and the observations Z represent what we can observe about the scene. Intuitively, we can say that what we can observe depends on the true state of the multi-object configuration. This dependency is represented by the arrow in Figure 3.
Figure 3. Graphical Model for a single image. The observations Z depend on the multi-object configuration X.
Thus far, we have defined X and Z as representing the multi-object configuration and observations for a single image, but we are concerned with tracking through a sequence of images. To model the sequence of images, we construct a graphical model consisting of a series of single-image models (like that in Figure 3) for the length of the image sequence t = {1, ... ,T}. To link the single image models, we make a markovian assumption, where it is assumed that the multi-object configuration (or state) of time t (Xt) depends only on the previous state (Xt-1). That is to say, the position of the 4 ball in frame 100 depends only on the position of the 4 ball in frame 99. The 4 ball position in frame 98 is irrelevent. This dependency is represented by the arrows between X for sequential time steps in the multi-object tracking graphical model shown in Figure 4.
Figure 4. A graphical model representation of the tracking problem.
Approaching a Solution: Recursive Bayesian Filter
Now that we have a graphical model of the tracking problem, let's investigate how to apply it to solve the problem of tracking. In the Bayesian approach, this is done by probabilistically modeling the dependencies of the graph structure. Essentially, we are asking the following question mathematically:
Figure 5. The Bayesian filtering distribution.
p(Xt|Z1:t) is a probability distribution function, or pdf, called the posterior. Another way to think of the posterior pdf is as a collection of probabilities for each current possible multi-object configuration knowing all past and present image information. Some configurations are more probable than others. Those that are better supported by the observations should be more probable. The animation in Figure 6 shows what a posterior pdf for the horizontal location of the 3-ball might look like.
Figure 6. 1-D posterior pdf illustration of the horizontal position of the 3 ball.
The posterior pdf represents a belief about the state of the objects, and finding it is the key to solving the tracking problem. So how do we find the posterior pdf? Using the process of recursive Bayesian inference. Using the structure of dependencies from our graphical model, this process defines a set of equations describing the posterior. There are two steps in recursive Bayesian inference. The first step is prediction, where a new multi-object configuration is predicted based on the previous multi-object configuration and previous observations:
Equation 1. Prediction step.
The second step, update, comes from Bayes Theorem, and gives us the posterior pdf in terms of the prediction from Equation 1. It is expressed as:
Equation 2. Update step.
The bottom term of Equation 2 p(z_{t}|z_{1:t-1}) is constant because the observations are independent of each other (there are no links between Zs in the graph structure). If we call this constant term C and plug Equation 1 into Equation 2, we arrive at an expression for the posterior pdf called the recursive Bayesian filtering distribution:
Equation 3. recursive Bayesian filtering distribution.
This equation gives the posterior pdf in terms three quantities, briefly described below (we will discuss them later in greater detail):
- Likelihood. The likelihood measures how likely a set of observations are given a proposed multi-object configuration. Another way of thinking of the likelihood is as a measure of how well the observations support a hypothesis about the current state of the objects.
- Evolution. The state evolution, or state dynamics, govern the evolution of the multi-object configuration between time steps. It predicts the current state X_{t} from the previous state X_{t-1}. The state evolution is responsible for modeling motion, shape change, and interaction between objects.
- Prior. The prior is the posterior pdf for the previous time step. This term provides knowledge of the past to give the recursive Bayesian filtering distribution a frame of reference to predict the current state from.
Approximation by Markov Chain Monte Carlo
The Markov Chain Monte Carlo (MCMC) method approximates the recursive Bayesian filtering distribution of Equation 3 as a set of discrete samples known as a Markov Chain. In order to do this, we follow the Monte Carlo approximation, where the prior (the posterior pdf at time t-1) is approximated by a set of N samples, each of which contains a hypothesis about the multi-object configuration, {X_{t}^{(n)}, n = 1, ... , N}.
Equation 4. Posterior pdf of previous time step is approximated using Monte Carlo.
Following the assumption that the continuous posterior pdf can be approximated by a Markov Chain, the continuous posterior pdf of Equation 3 can be rewritten as a collection of discrete samples in the Monte Carlo approximation of the recursive Bayesian filtering distribution (note that the integral from Equation 3 is now a summation.)
Equation 5. Monte Carlo approximation of the recursive Bayesian
filtering distribution.
In the animation below, we show how the continuous posterior pdf is approximated via MCMC for a single time step. As in Figure 6, we consider a simple example for only the horizontal (x) position for a single ball.
Figure 7. MCMC approximates the 1-D posterior pdf for the horizontal position of the 3 ball.
In the animation in Figure 7, the Markov Chain is constructed using a random walk algorithm such as Metropolis-Hastings, which we will treat as a black box for the moment, but will look at later in more detail. The animation in Figure 7 only showed how to approximate the pdf for a single time step. To track objects through a series of images, MCMC is applied successively to each frame in the video. For each time step, MCMC uses information from the Markov Chain representing the pdf of the previous time step as well as observation information and the dynamic model. The animation in Figure 8 shows a 1-D example of estimating the horizontal position of the cue ball in the example from Figure 1.
Figure 8. Repeatedly applying MCMC to each video frame estimates the horizontal position of the cue ball (white) over time.
Assuming that we have used MCMC to construct Markov Chains representing the posterior pdf for each image in the video, we still need a single practical answer to the tracking problem (i.e. coordinates for each ball and the sizes of their radii). A point estimate is a single multi-object configuration caculated from the Markov Chain which serves as the output of the tracker. There are several ways to compute a point estimate, some of which may be more appropriate depending on the circumstances. Often for multi-object tracking, computing the Marginal Mean of the various parameters of the multi-object configuration gives a smooth, accurate answer. The marginal mean of the x parameter of the cue ball is shown in Figure 9.
Figure 9. The point estimate can be found by computing the marginal mean for
each parameter of the state. Here the x position of the cue ball is computed by finding the
mean x value of the samples in the Markov Chain.
Metropolis-Hastings MCMC
We've established that the key to solving the multi-object tracking problem lays in finding the recursive Bayesian filtering distribution, which can be approximated by a discrete set of samples known as a Markov Chain. If we somehow knew the recursive Bayesian filtering distribution beforehand, approximating it would just be a matter of sampling from it (of course, ignoring the fact that we would already know the distribution we're trying to approximate).
Unfortunately, we find ourselves in the difficult position of trying to find an approximation of the recursive Bayesian filtering distribution which is unknown. Luckily, this is exactly what MCMC algorithms are designed to do. The idea behind MCMC is to generate samples while exploring the state space (the space of possible object configurations) using a Markov Chain mechanism. The mechanism for constructing the chain is designed so that the chain spends more time in the important (high probability) regions. The Metropolis-Hastings (MH) algorithm is the most popular 'brand' of MCMC, which explores the state-space in a random walk fashion. In fact, most variations of MCMC can be interpreted as special cases or extensions of MH.
Classical Metropolis-Hastings
The classic MH algorithm is fairly straightforward. Starting from an initial configuration X^{0}, a new sample X^{*} is generated by sampling from a proposal distribution, q(X^{*}|X^{(n-1)}). The sample is then either accepted and added to the Markov Chain as the n^{th} sample, or rejected and the previous sample is added to the chain as the n^{th} sample. The decision to accept or reject the proposed sample depends on the acceptance ratio, a. The acceptance ratio gauges the improvement in the quality of the sample over the previous sample, as measured by the samples' respective likelihoods. It automatically accepts samples showing improvement, and accepts the rest with probability a.
When modeling a single object, the proposal distribution is the same as the single-object state evolution. To keep things simple, we will model the state evolution of a billiard ball to be the previous position of the ball with a perturbation defined by a zero-mean Gaussian distribution. Of course the state evolution can be more complex if we like: to include velocity in our motion model, we could use position information from t-2 to recenter the Gaussian distribution. The equations for the state evolution are given below in Equation 6, and the process of the state evolution can be seen in the animation of Figure 10.
Equation 6. Simple state evolution equations for a billiard ball.
Figure 10. The state evolution for a single billiard ball.
Tracking more than one object (but still a fixed number of objects) complicates matters a bit. When computing the acceptance ratio, it is important that some terms in the proposal density cancel to keep the acceptance ratio tractable. For this reason, it is common to pertrub just one target at a time when the scene contains multiple objects. This introduces an extra step in the MH algorithm, in which a target object is chosen from a target proposal distribution, before perturbing the sample in the proposal. Typically the target proposal distribution is a uniform distribution over all the objects. The MH algorithm for an arbitrary (but fixed) number of objects is summarized below.
For each time step, approximate the recursive Bayesian filtering distribution by constructing a Markov Chain from drawing N = N_{burn} + N_{mix} samples according to the following schedule (where N_{burn} is the number of burn-in samples and N_{mix} is the number of samples required to reach the mixing time after burn-in):
Initialize the MH sampler by choosing a sample from the t-1 Markov Chain and move all targets according to their state evoution. Use the result as the seed for the new chain, X^{0}.- Begin with the state of the previous sample X^{*} = X^{n-1}.
- Select a target object m^{*} from the target proposal distribution.
- Propose a new configuration for X^{*} by sampling a new configuration for object m^{*} from the proposal distribution (state evolution), while keeping all other objects fixed.
- Compute the acceptance ratio as follows:
- Add the n^{th} sample to the Markov Chain. If a ≥ 1, add the proposed configuration X^{*} → X^{n}. If not, add the proposed configuration with probability a. If the proposed configuration is rejected, add the previous configuration X^{n-1} → X^{n}.
which reduces to
Algorithm 1. Metropolis-Hastings for an arbitrary (but fixed) number of objects.
Notice that the acceptance ratio is just the ratio of the likelihood term from proposed configuration of the target billiard ball to the previous configuration of the target billiard ball. This is because we have defined our likelihood so that the likelihood terms from other billiard balls cancel because their states have not changed. For more information on likelihood modeling, see the section entitled "Appendix: Modeling the Likelihood".
The animation in Figure 11 illustrates how the MH algorithm constructs the Markov Chain for each time step, thus approximating the recursive Bayesian filtering distribution for a fixed number of objects.
Figure 11. Multi-object tracking for a fixed number of targets with Metroplis-Hastings algorithm.
As shown in the animation of Figure 11, the Metropolis-Hastings algorithm explores areas of high likelihood by proposing a change to one target at a time and automatically accepting samples which increase the overall likelihood, while rejecting samples which decrease the overall likelihood with probability a. However, it is important to note that samples resulting in lower likelihoods can still be added to the Markov Chain, though the chances of this happening gets worse as the likelihood of the proposal gets worse. This feature keeps the MH algorithm from becoming stuck in local maxima.
What are Reversible Jumps?
The dimensionality of the examples in Figures 7 and 8 were reduced to one dimension (horizontal position) for simplicity. According to our model, the multi-object configuration (or state) for a single ball, and thus the posterior pdf, is three dimensional (to model position and radius x,y,r). For multiple balls the state and posterior pdf is 3*M dimensional (where M is the number of balls).
To further complicate matters, for multi-object tracking the dimension of the multi-object configuration must be able to change. If it cannot change, the number of objects in the scene would be fixed. Changes in dimension between samples within a Markov Chain are known as dimension jumps. Unfortunately, traditionl MCMC methods such as MH cannot accomodate dimension jumps. But Reversible Jump Markov Chain Monte Carlo (RJMCMC) can, and this allows it to model scenes with varying numbers of objects.
Figure 12. In previous examples, we restricted ourselves to 1-D state vectors. In practice the state vector is usually much larger and can change dimension.
RJMCMC extension of Metropolis-Hastings
Now, let's look at how RJMCMC extends the MH algorithm to handle variable dimensional state spaces. Recall that in order to insert and delete balls from the scene, we must be able to insert them and remove them from the multi-object configuration X_{t}. The MH algorithm of Algorithm 1 (and shown in Figure 11) only works for a fixed dimensional state space.
In 1995, Peter J. Green proposed a method for RJMCMC which allows the sampler to construct a Markov Chain that jumps dimensions. This is accomplished through defining a set of moves, known as reversible jumps. In this scheme, any move which can change the dimension of the chain must be reversible, i.e. it must be possible to revert back to the previous state in a later move.
We can define any RJMCMC move types we want, as long as they are reversible (though sometimes defining a move type can be difficult). This flexibility makes RJMCMC very powerful. For our billiard ball example, we will define a typical set of RJMCMC move types, including an update move, which is equivalent to the classic MH algorithm process of choosing a target object and proposing a new configuration from the state evolution. The move set includes: {update, birth, death, merge, and split}.
Figure 13. UPDATE MOVE. Similar to the classic Metropolis-Hastings algorithm, a target object is chosen and a proposed configuration generated according to the state evolution. Here, the position of the 3-ball has been updated. The dimension of the state space is unchanged.
Figure 14. BIRTH MOVE. A new object is added to the multi-object configuration. Here, the 3-ball is added to the multi-object configuration. The dimension of the state space increases.
Figure 15. DEATH MOVE. An existing object is removed from the multi-object configuration. Here, the extraneous tracker is removed from the multi-object configuration. The dimension of the state space decreases.
Figure 16. MERGE MOVE. Two objects are merged into a single object. Here, the configurations of two trackers following the cue ball are merged into one. The dimension of the state space decreases.
Figure 17. SPLIT MOVE. Two objects are created from a single object. Here, the configuartion of a single tracker is split to track the cue ball and the 3-ball. The dimension of the state space increases.
With the move types defined, we can now give the algorithm for RJMCMC multiple object tracking. To generate a new sample in the Markov Chain, the first step is to choose a move type from the set of move types. This is typically done by sampling uniformly from the set. Then, a multi-object configuraiton proposal X* is generated according to the chosen move type, the acceptance ratio is calculated, and either the proposal X* or the previous sample X^{n-1} are added to the Markov Chain. Note that the classic MH algorithm can be seen as a special case of RJMCMC where only update moves are chosen. The algorithm for RJMCMC multiple object tracking is given below.
For each time step, approximate the recursive Bayesian filtering distribution by constructing a Markov Chain from drawing N = N_{burn} + N_{mix} samples according to the following schedule (where N_{burn} is the number of burn-in samples and N_{mix} is the number of samples required to reach the mixing time after burn-in):
Initialize the RJMCMC sampler by choosing a sample from the t-1 Markov Chain and move all targets according to their state evoution. Use the result as the seed for the new chain, X^{0} .- Begin with the state of the previous sample X^{*} = X^{n-1}.
- Choose a move type from the set of possible move types (by sampling from a move type distribution p_{move type}).
- Apply the chosen move. This involves selecting a target object or target objects and proposing a new configuration X^{*}. Selecting the target(s) is done through a move-specific target proposal distribution q_{target}, and proposing X^{*} is done through a move-specific proposal distribution Q.
- Compute the acceptance ratio, a, keeping in mind it is defined differently for the various move types.
- Add the n^{th} sample to the Markov Chain. If a ≥ 1, add the proposed configuration X^{*} → X^{n}. If not, add the proposed configuration with probability a. If the proposed configuration is rejected, add the previous configuration X^{n-1} → X^{n}.
Algorithm 2. RJMCMC for a variable number of objects.
Acceptance Ratio Considerations
In the classic MH algorithm, the acceptance ratio reduces to a ratio of the likelihood of the proposed configuration to the likelihood of the previous configuration. But now, the proposed and previous configurations may not be of the same dimension. Comparing likelihoods of different dimension would be meaningless; as if instead of comparing one circle to another circle, we are now trying to compare a circle to a sphere. [REF]. To overcome this problem, if we attempt a move from m-dimensional space to n-dimensional space, we must define dimension matching functions Fm→n and Fn→m, which allow us to extend one space into the other, and vice versa.
We can rewrite the multi-object configuration to include the dimension (or number of objects) as X_{t}={m, X_{m,t}}. Doing so, the full expression for the acceptance ratio when moving from an m-dimensional space to n-dimensional space is as follows:
Equation 7. The full RJMCMC acceptance ratio moving from an m-dimensional space to n-dimensional space.
Depending on how we define our moves, a lot of the terms above can cancel. If we choose our move types by sampling from a uniform distribution, the 2nd term cancels (p_{reverse move type} / p*_{move type})
For more details, refer to the chapter regarding RJMCMC move types from my PhD thesis.
Chain Length
Determining the appropriate length of the Markov Chain is not easy. If the chain is too short, it may not give a good approximation of the posterior pdf, as seen in Figure 19. The number of samples needed to approach the target distribution is known as the mixing time. Adding extra samples after reaching the mixing time may marginally improve the approximation, but still cost computation time, so it may not be advantageous to do so.
Some researchers have defined statistical tests to check if the Markov Chain has stabilized, which may indicate it has reached the mixing time. Also, several attempts have been made at fixing the mixing time, but none of these tests guarantee satisfactory results. In practice, it is common to experimentally determine an appropriate chain length.
It may be useful to discard an initial set of burn-in samples to allow the Markov Chain to settle and avoid bias introduced from the starting position. The benefits of discarding burn-in samples can be seen in the animation in Figure 18. In Figure 19, we can see the adverse effects of a Markov Chain which has not reached the mixing time as well as bias effects from not discarding burn-in samples.
Figure 18. Burn-in removes bias from samples early in the Markov Chain.
Figure 19. Chain Length Effects.
Further Reading
I hope you found this tutorial useful. I'd love to hear any feedback, questions, or comments. Please feel free to contact me at my email address found at the top of this page.
Below are some links to a few other useful publications regarding RJMCMC and multi-object tracking. This tutorial is an IDIAP Research Institute Communication. If you would like to reference it, feel free to use the bibtex entry provided below.
P.J. Green. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, 82(4):711-732, 1995.
K. Smith, D. Gatica-Perez, and J.M. Odobez. Using particles to track varying numbers of objects. In Proc. of Computer Vision and Pattern Recognition (CVPR), San Diego, June 2005.
K. Smith, S. Ba, D. Gatica-Perez, and J.M. Odobez. Tracking the multi-person wandering visual focus of attention. In Intl. Conference on Multimodal Interfaces (ICMI), Banff, Canada, Nov 2006.
Z. Khan, T. Balch, and F. Dellart. An mcmc-based particle filter for tracking multiple interacting targets. IEEE transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 27:1805-1819, 2005.
T. Zhao and R. Nevatia. Tracking multiple humans in a crowded environment. In Proc. of Computer Vision and Pattern Recognition (CVPR), Washington D.C., June 2004.
Bibtex for this article:@TechReport{ | |
author | = "Smith, K.", |
title | = "Reversible-jump markov chain monte carlo multi-object tracking tutorial", |
institution | = "IDIAP Research Institute", |
type | = "Communication", |
number | = "IDIAP-COM-06-07", |
year | = "2006" |
url | = "http://www.kev-smith.com/tutorial/rjmcmc.php" |
} |
© 2011 Kevin Smith. All rights reserved.