Why do Markov chains with larger spectral gaps mix faster?
Starting from the basics of Markov chains, this article will help you develop intuitions about why Markov chains with large spectral gaps mix faster.
A Markov chain is a stochastic model describing a sequence of state transitions in which the probability of each event only depends on the previous state of the event. Markov chains have been shown useful in a wide range of topics, such as biology (e.g., DNA evolution models), chemistry (e.g., chemical reaction networks), finance (e.g., stock market forecasting), speech recognition, or even music composition. In my research field, researchers use Markov state models to analyze molecular dynamics simulations to get insights into biomolecular systems. There are a number of classes of Markov chains or Markov models. In this article, we will restrict our discussion to finite-state Markov chains, which is more common than the infinite-state analog.

As exemplified in Figure 1, the state transitions in a Markov chain are frequently described by a transition/stochastic matrix, which encodes key characteristics of the Markov chain. For example, the spectral gap of a transition matrix is closely related to the mixing time of the corresponding Markov chain, which can be useful in one of the topics I’m interested in my Ph.D. research - assessing the sampling efficiencies of different algorithms in molecular dynamics. Roughly, the mixing time is the number of steps it takes for the deviation from the stationary distribution to drop by a factor of
Notably, the definition above implies that for any transition matrix, the largest eigenvalue is always 1. While this statement along with the definition/implication of a spectral gap is almost common sense in relevant fields, introductory materials tend not to delve into the complicated theory behind the relation between the spectral gap and mixing time. Still, some mathematical details that are generally missing from introductory materials could be helpful for beginners to develop intuitions about why and how the spectral gap and mixing time are related. To fill in this gap, this article is aimed to prepare you with the sufficient mathematical background required to develop intuitions about the following statement:
Markov chains with larger spectral gaps mix faster.
In the following sections, I will first review some conventions and definitions of stochastic matrices and the eigenvalue equation. Then, I will prove that 1 is always the largest eigenvalue of a transition matrix and provide intuitions about why Markov chains with larger spectral gaps mix faster. Finally, I will conclude the article with a note about transition matrices in molecular dynamics. If you are already familiar with Markov chains and just curious about intuitions about the spectral gap and mixing time, feel free to jump to the last section of the article. For a deep dive into mixing times in Markov chains, I recommend the textbook written by Montenegro and Tetali or this handout by Dabbs.
Different definitions of stochastic/transition matrices
A stochastic matrix, or a transition matrix, is a square matrix frequently used to describe the transitions of a Markov chain. There are several definitions of a stochastic/transition matrix:
- A right/row stochastic matrix is a real square matrix with each row summing to 1.
- A left/column stochastic matrix is a real square matrix with each column summing to 1.
- A doubly stochastic matrix is a real square matrix with each row and column summing to 1. It is also a symmetric matrix.
In this article, I will adopt the row-stochastic convention with
Eigenvalues and eigenvectors
For an
Supplementary note: The left and right eigenvalues of a matrix are equivalent.
For a square matrix
The largest eigenvalue of a transition matrix is always 1.
To prove the statement in the section title, let’s first prove that any transition matrix has an eigenvalue of 1, then prove that 1 is always the largest eigenvalue of a transition matrix.
For a right stochastic matrix
Proving that 1 is always an eigenvalue of a transition matrix is not sufficient, as we still need to prove that it is the largest eigenvalue, where I found the Gershgorin circle theorem quite handy:
Gershgorin circle theorem: Every eigenvalue of a square matrix lies within at least one of the Gershgorin discs.
Gershgorin theorem is a useful theorem to bound the spectrum of a square matrix. To understand the definition of a Gershgorin disc, let’s consider a complex

To prove the theorem, we let
By applying the triangle inequality and recalling that
Interestingly, the expression
Perron-Frobenius theorem: A positive square matrix has a unique maximal eigenvalue, which corresponds to a positive eigenvector.
In a Markov chain, where the transition matrix is frequently not positive but only nonnegative, a variation of the Perron-Frobenius theorem is more relevant: For a nonnegative square matrix, the Perron-Frobenius eigenvalue
Supplementary note: The existence and uniqueness of stationary distributions
A Markov chain could have
(1) No stationary distribution: A classic example is an infinite-state Markov chain with state space
(2) One unique stationary distribution: This is true for any finite-state, irreducible Markov chain, where all states are positive recurrent. (A Markov chain is irreducible if every state can be reached from every other state within a finite number of steps.)
(3) Multiple stationary distribution: This is true for any finite-state, reducible Markov chain, where there is at least one positive recurrent state. For example, if in Figure 1,
In fact, the principle for distinguishing these three cases is that a Markov chain has at least one stationary distribution if and only if at least one state is positive recurrent. Also, we can say that any finite-state Markov chain has at least one stationary distribution. (Check the next Supplementary note for the definition of recurrence.)
Supplementary note: The transience and recurrence of Markov chains
A state in a Markov chain is either trasient or recurrent. Specifically, a Markov chain starting from a transient state will never return to that state after a finite number of steps. On the other hand, there is a guarantee that a Markov chain starting from a recurrent state will return to that state. Recurrent states can be further divided into positive recurrent or null recurrent states. To understand this, let’s take a look at the example in Figure S1.

In Figure S1, if
Additionally, here are some facts about the transience and recurrence of Markov chains:
- Only infinite Markov chains can have null recurrent states.
- The states in an irreducible Markov chain are either all transient (in an infinite chain) or all positive recurrent (in a finite chain). That is, an irreducible Markov chain can at most have one stationary distribution.
- Any finite Markov chain must have at least one (positive) recurrent state. (If all states are transient, then each of them is either not visited at all or visited only finitely many times, which is not possible.)
For any reversible Markov chain, all eigenvalues of its transition matrix are real and is diagonalizable.
Before we explore the relation between the spectral gap and mixing time, I want to insert this supplementary section about reversible Markov chains, which is the type of Markov chain that most relevant discussions revolve around.
Before we prove the statement in the section title, we need to first understand the reversibility of a Markov chain: A Markov chain is reversible if and only if for all
To prove that all eigenvalues of a reversible transition matrix
So, why is a large spectral gap indicative of faster mixing?
Finally, we are approaching the core question of this article: Why do Markov chains with larger spectral gaps mix faster?
Typically, discussions for answering this question are restricted to ergodic (meaning irreducible and aperiodic) and reversible Markov chains, which are not uncommon in the context of molecular dynamics. (See the Supplementary note below.) As a reminder, such Markov chains have the following properties:
- An irreducible finite-state Markov chain always has one unique distribution
. (Note that the mixing time is only meaningful when a stationary distribution exists.) - If a Markov chain is irreducible and aperiodic, the eigenvalue of 1 is unique (i.e., has a multiplicity of 1) and the moduli of all other eigenvalues are strictly less than 1. (This is not elaborated in this article due to limited space.)
- For any reversible Markov chain, all eigenvalues of its transition matrix
are real and is diagonalizable.
Now, let's consider the distribution of an ergodic, reversible Markov chain after
As
Supplementary note: Transition matrics in molecular dynamics
Notably, a reversible, irreducible, aperiodic, Markov chain is not uncommon in molecular dynamics given the following:
- Reversibility: Many advanced sampling methods (e.g. replica exchange molecular dynamics) strictly enforce detailed balance by governing state transitions in the space of interest using the Metropolis-Hastings algorithm or other similar methods, in which case the transition matrix of the state space does satisfy
. Additionally, reversibility is also guaranteed in irreducible Markov chains with for all , which is also not uncommon in molecular dynamics. (See Theorem 5.3.2 in this post by LibreTexts for the proof of this.) - Irreducibility: Straightforwardly, a tridiagonal matrix
with for all is irreducible. This condition is generally satisfied unless the sampling in the state space is extremely slow. - Aperiodicity: A transition matrix must be aperiodic if it is irreducible and contains at least one self-loop (i.e.,
). Intuitively, this is almost always true in molecular dynamics.
This is the end of the article! 🎉🎉 If you enjoyed this article, you are welcome to share it or leave a comment below, so I will be more motivated to write more! Thank you for reading this far! 😃