MCMC for Hierachical Bayesian Models Using Non-reversible Langevin Methods
Hamiltonian Monte Carlo (HMC) is an attractive MCMC method for continuous distributions because it makes use of the gradient of the log probability density to propose points far from the current point, avoiding slow exploration by a random walk (RW). The Langevin method - equivalent to HMC with one leapfrog step - also uses gradient information, but is slow due to RW behaviour. In this talk, I discuss how the Langevin method can be made competitive with HMC using two modifications that suppress RW behaviour by making the Markov chain no longer be time-reversible. This modified Langevin method can be better than HMC when it is necessary to combine gradient-based updates with other updates. One use is when other updates are required for discrete variables. Another application is learning of hierarchical Bayesian neural networks models, for which hyperparameters controlling the properties of the function learned are difficult to update using HMC. If they are instead updated by, for example, Gibbs sampling, these updates can be done more often, improving sampling efficiency, when a non-reversible Langevin method is used rather than HMC with long trajectories.