WBM STATS: Markov Chain

by W. B. Meitei, PhD

Have you ever made a decision based purely on your current situation, without worrying about how you got there? That’s the basic idea behind Markov chains. A Markov chain is a way of modelling a sequence of events where each event (or state) depends only on the one that came right before it, not the whole history. This is known as the Markov property, often summarised as,

“the future is independent of the past, given the present”

For example, think of a board game where our move depends only on our current position. If we're at square 5, our next step is based on that square alone, not on how many rolls it took to get there. That’s how a Markov chain works.

Markov chains help us understand random processes that evolve step-by-step, like predicting the weather, modelling customer behaviour, or simulating population movement. They’re used in many fields, including statistics, economics, machine learning, and even Google’s search algorithms.

Markov chains or Markov processes were named in honour of the Russian mathematician Andrey Markov.

This article aims to provide a concise introduction to key concepts frequently encountered in applied statistics, particularly within methods such as Markov Chain Monte Carlo (MCMC). Readers seeking a more comprehensive treatment of these topics are encouraged to review the references listed at the end of the page.

Formal definition: Let {X_n} be a sequence of random vectors. The sequence {X_n} is said to be a Markov chain if and only if any given term of the sequence X_n is independent of the terms preceding X_n-1, conditional on X_n-1:

\[F\left( x_{n}|x_{n - 1},x_{n - 2},\ldots,x_{1} \right) = F\left( x_{n}|x_{n - 1} \right)\]

Where F denotes a conditional distribution function.

Fig. 1: A simple Markov chain model

State Space: The state space S of a Markov chain {X_n} is the set of all possible values that the elements of the chain can take. In other words, for any given X_n in {X_n}, the set of values it can assume is contained within S.

State Space: The state space S of a Markov chain {X_n} is the set of all possible values that the elements of the chain can take. In other words, for any given X_n in {X_n}, the set of values it can assume is contained within S.

In the following sections, we will explore key properties and concepts of Markov chains across three types of state spaces:

A finite state space
An infinite but countable state space
An uncountable state space

1. Markov chains with a finite state space

Consider a set S,

\[\mathbf{S} = \left\{ s_{1},s_{2},\ldots,s_{j} \right\}\]

Where each term of the Markov chain can take one of j values from s₁, s₂,…, s_j.

a. Time homogeneity

To fully define a Markov chain with a finite state space, we need to specify two key components: the initial distribution and the transition probability matrix.

Initial distribution: The initial distribution of the chain is specified as π₁, a 1×J vector of probabilities, such that,

\[\sum_{j = 1}^{J}{\pi_{1}(j) = \sum_{j = 1}^{J}{P\left( X_{1} = s_{j} \right)} = 1}\]

This represents the probability distribution over the possible states at the initial time point.

Transition probability matrix: The transition probability matrix P is a J×J matrix, where each row corresponds to a probability distribution over the next state, given the current state. That is, each row sums to one:

\[\sum_{i = j = 1}^{J}{P(i,j) = 1}\]

Where P(X_n+1 = s_j | X_n = s_i) = P (i, j), for all n and i, j < J.

A common and simplifying assumption that P(i, j) is equal for all n is called time-homogeneity (i.e., the transition probabilities do not depend on time n).

The initial distribution π₁ and the transition matrix P together completely characterise the joint distribution of the entire Markov chain.

In fact, for any n, the marginal distribution of the chain at time n can be computed recursively as:

\[\pi_{n} = \pi_{1}P^{n - 1}\]

Note: \(\pi_{2} = \pi_{1}P\), \(\pi_{3} = \pi_{2}P^{2} = \pi_{1}PP^{2} = \pi_{1}P^{3}\), ..., \(\pi_{n} = \pi_{n - 1}P = \pi_{1}P^{n - 2}P = \pi_{1}P^{n - 1}\).

b. Stationary distribution

If, for a given transition probability matrix P, there is an initial distribution π such that the distribution of all the terms of the chain is equal to the initial distribution, then π is called a stationary distribution (or invariant distribution) of the chain.

When π is a stationary distribution, we have.

\[\pi_{n} = \pi_{n - 1} = \pi\]

Since π_n= π_n-1P, this implies π = πP.

c. Detailed balance

A Markov chain with a finite state space is said to satisfy the detailed balance condition if and only if there exists a probability distribution π_b over the state space such that, for all states i and j, the following equality holds:

\[\pi_{b}(i)P(i,j) = \pi_{b}(j)P(j,i)\]

For all n and i, j < J; π_b(i) and π_b(j) denote the probabilities of being in the states i and j, respectively, under the distribution π_b; P(i, j) is the probability of transitioning from state i to state j.

This condition implies that, under distribution π_b, the flow of probability from state i to state j is exactly balanced by the flow from j to i. When this condition holds, the chain is said to be reversible with respect to π_b, and π_b is a stationary distribution of the Markov chain.

Detailed balance is particularly important in applications like MCMC, where it is used to ensure that the target distribution is stationary for the chain being constructed.

Summing both sides, we get

\[\sum_{j = 1}^{J}{\pi_{b}(i)P(i,j)} = \sum_{j = 1}^{J}{\pi_{b}(j)P(j,i)}\]

\[\pi_{b}(i)\sum_{j = 1}^{J}{P(i,j)} = \sum_{j = 1}^{J}{\pi_{b}(j)P(j,i)}\]

Since, \(\sum_{j = 1}^{J}{P(i,j)} = 1\),

\[\pi_{b}(i) = \sum_{j = 1}^{J}{\pi_{b}(j)P(j,i)}\]

This can be written in the form of a matrix as,

\[\pi_{b} = \pi_{b}P\]

Thus, π_b is a stationary distribution of the chain.

It is important to note that the detailed balance condition is a stronger requirement than having a stationary distribution. While satisfying detailed balance guarantees a stationary distribution, the reverse is not necessarily true.

Certain properties of Markov chains are frequently used to study their asymptotic behaviour, namely, irreducibility, recurrence, and aperiodicity.

d. Irreducibility

Let x, y ∈ S, where S is the state space. We define the hitting time of state y starting from x as:

\[T_{xy} = min\left( n > 1:X_{n} = y \right)\]

This represents the first time the chain visits state y, assuming it begins in state x. Thus, state x leads to state y if and only if,

\[P_{x}\left( T_{xy} < \infty \right) > 0\]

i.e., the probability of eventually reaching y from x is positive.

A Markov chain is irreducible if every state in the chain can be reached from every other state with positive probability in a finite number of steps. In other words, for all x, y ∈ S, we must have P_x(T_xy < ∞) > 0. This ensures that the chain has a single communicating class and does not decompose into isolated subsets of states.

Hitting time: The number of steps required for the chain to reach a particular state for the first time, starting from a given initial state.

e. Recurrence

A state y ∈ S is said to be recurrent if:

\[P_{y}\left( T_{yy} < \infty \right) = 1\]

i.e., starting from state y, the probability that the Markov chain returns to y in a finite number of steps is equal to 1. If this condition is not met, i.e., if the probability of returning to y is less than 1, then y is said to be transient.

In simple terms, a recurrent state is one that the chain is guaranteed to revisit eventually. In contrast, a transient state may be visited only a finite number of times, possibly never again.

A Markov chain is called recurrent if every state in its state space is recurrent.

However, the expected value of return time could be infinite even if a state is recurrent, i.e.,

\[E_{y}\left( T_{yy} \right) = \infty\]

A state is said to be positive recurrent if and only if

\[E_{y}\left( T_{yy} \right) < \infty\]

A Markov chain is called positive recurrent if every state in its state space is positive recurrent.

f. Aperiodicity

Let y ∈ S. Then, the period of state y, denoted by d_y, is defined as:

\[d_{y} = gcd\left( n > 1\ :\ P_{x}\left( X_{n} = x \right) > 0 \right)\]

Where gcd is the greatest common divisor. In other words, d_y is the minimum time the chain takes to return to x (after starting from x itself).

If a Markov chain is irreducible (i.e., every state can be reached from every other state), then all states in the chain share the same period. In this case, the shared value d is referred to as the period of the chain.

A Markov chain is called aperiodic if d = 1, meaning that the chain does not follow a fixed cycle and can return to a state at irregular intervals. Aperiodicity is a key condition for ensuring certain long-term convergence properties of the chain.

Fig. 2: (a) An aperiodic Markov chain with positive recurrent states,
(b) A Markov chain with one transient state and two recurring states, (c) An irreducible Markov chain

g. Some propositions

Proposition 1: If a Markov chain with a finite state space is irreducible, then it has a unique stationary distribution.

Proposition 2: If a Markov chain with a finite state space is irreducible and aperiodic, then, irrespective of the initial distribution π₁,

\[\lim_{n \rightarrow \infty}{\pi_{n} = \pi}\]

Where π is the unique stationary distribution of the chain.

i.e., even when the initial distribution of a Markov chain differs from its stationary distribution, the terms X_n of the sequence become less and less dependent on the initial value X₁ as n increases and their distributions converge to the stationary distribution.

Proposition 3: If a Markov chain {X_n} with a finite state space S is irreducible, then for any bounded function f : S → ℝ, we have,

\[\frac{1}{n}\sum_{i = 1}^{n}{f\left( X_{i} \right){\overset{a.s.}{\rightarrow}\sum_{j = 1}^{J}{\pi_{j}f\left( s_{j} \right)}}}\]

Where π is the unique stationary distribution of the chain and a.s. denotes almost sure convergence as n → ∞.

Thus, irreducibility in the finite state space guarantees that the sample averages across the chain converge to population averages across the states. This kind of proposition is often called an ergodic theorem.

Almost sure convergence: A sequence of random variables X₁, X₂, X₃, … converges almost surely to a random variable X, shown by X_n \(\overset{a.s.}{\rightarrow}\) X, if

\[P\left\{ s \in \mathbf{S}\ :\ \lim_{n \rightarrow \infty}{X_{n}(s) = X(s)} \right\} = 1\]

click here to know more about almost sure convergence.

2. Markov chains with an infinite but countable state space

Consider a sequence S,

\[\mathbf{S} = \left\{ s_{j} \right\}\]

Where the elements of the state space can be arranged into a sequence {s_j}.

a. Time homogeneity

Here, the initial distribution of the chain is the sequence of initial probabilities, {π₁(j)}, such that

\[P\left( X_{1} = s_{j} \right) = \pi_{1}(j)\]

Then, the double sequence of transition probabilities P(i, j) such that, for any i,

\[\sum_{j = 1}^{\infty}{P(i,j) = 1}\]

Where P(X_n+1 = s_j | X_n = s_i) = P (i, j), for all n and i, j < J.

Note that the time homogeneity property states that P(i, j) is independent of n.

Sequence {π₂(j)} is the distribution of X₂ whose elements can be derived as,

\[\pi_{2j} = P\left( X_{2} = s_{j} \right) = \sum_{i = 1}^{\infty}{\pi_{1}(i)P(i,j)}\]

Thus, {π_n(j)} is the distribution of X_n whose elements can be derived recursively as,

\[\pi_{nj} = P\left( X_{n} = s_{j} \right) = \sum_{i = 1}^{\infty}{\pi_{n - 1}(i)P(i,j)}\]

b. Stationary distribution

Similar to the concept that was introduced in the case of the finite sample space discussed above, if for a given transition probability matrix P(i, j), there is an initial distribution {π_j} such that the distribution of all the terms of the chain is equal to the initial distribution, then {π_j} is called a stationary distribution of the chain.

Furthermore, we have,

\[\pi_{j} = \sum_{i = 1}^{\infty}{\pi_{i}P(i,j)}\]

c. Detailed balance

Likewise, in the case of the finite sample space discussed above, a Markov chain with an infinite but countable state space is said to satisfy the detailed balance condition if and only if there exists a probability distribution {π_b(j)} over the state space such that, for all states i, j ∈ ℕ, the following equality holds:

\[\pi_{b}(i)P(i,j) = \pi_{b}(j)P(j,i)\]

Since, \(\sum_{j = 1}^{\infty}{P(i,j)} = 1\),

\[\pi_{b}(i) = \sum_{j = 1}^{\infty}{\pi_{b}(j)P(j,i)}\]

for any i ≤ J. So, {π_b(j)} is a stationary distribution of the chain.

The concept of irreducibility, recurrence, and aperiodicity for a Markov chain remains identical when extended from a finite state space to an infinite but countable state space.

d. Some propositions

Proposition 1: If a Markov chain with an infinite but countable state space is irreducible and positive recurrent, then it has a unique stationary distribution.

The key distinction between Markov chains with finite versus infinite but countable state spaces lies in the conditions required for a unique stationary distribution. In the finite case, irreducibility and recurrence are sufficient. However, for an infinite but countable state space, the chain must be irreducible and positively recurrent to ensure a unique stationary distribution.

Proposition 2: If a Markov chain with an infinite but countable state space is irreducible, positive recurrent and aperiodic, then, irrespective of the initial distribution π₁(j),

\[\lim_{n \rightarrow \infty}{\pi_{n}(j) = \pi(j)}\]

Where {π(j)} is the unique stationary distribution of the chain.

Again, the key distinction between Markov chains with finite versus infinite but countable state spaces lies in the conditions required for convergence to the stationary distribution. In the finite case, irreducibility and aperiodicity are sufficient. However, the chain must be irreducible, positively recurrent, and aperiodic for an infinite but countable state space to ensure the convergence to the stationary distribution.

Proposition 3: If a Markov chain {X_n} with an infinite but countable state space S is irreducible and positively recurrent, then for any bounded function f : S → ℝ, such that the expected value \(\sum_{j = 1}^{\infty}{\pi(j)\left| f\left( s_{j} \right) \right|}\) exists and is finite, we have,

\[\frac{1}{n}\sum_{i = 1}^{n}{f\left( X_{i} \right){\overset{a.s.}{\rightarrow}\sum_{j = 1}^{\infty}{\pi(j)f\left( s_{j} \right)}}}\]

Where {π(j)} is the unique stationary distribution of the chain and a.s. denotes almost sure convergence as n → ∞.

Often, there are cases where establishing positive recurrence is analytically complex, but as the chain has a stationary distribution, this proposition may be equivalently expressed as: If a Markov chain {X_n} with an infinite but countable state space S and has a unique stationary distribution {π(j)}, then for any bounded function f : S → ℝ, such that the expected value \(\sum_{j = 1}^{\infty}{\pi(j)\left| f\left( s_{j} \right) \right|}\) exists and is finite, we have

\[\frac{1}{n}\sum_{i = 1}^{n}{f\left( X_{i} \right){\overset{a.s.}{\rightarrow}\sum_{j = 1}^{\infty}{\pi(j)f\left( s_{j} \right)}}}\]

3. Markov chains with an uncountable state space

a. Time homogeneity

In this setting, the initial distribution of the chain is specified by a probability measure π₁, such that

\[P\left( X_{1} \in A \right) = \pi_{1}(A)\]

Where any event A ⊆ S.

Then, the function P(x, A) called the transition kernel, is chosen such that,

\[P\left( X_{n} \in A\ |\ X_{n - 1} = x \right) = P(x,A)\]

for all x ∈ A and all events A ⊆ S.

If the transition kernel is independent of n, then it is time-homogeneous.

Likewise, in the cases of finite and countable state spaces, the initial distribution π₁ and the transition kernel P(x, A) completely determine the distribution of all the chain terms.

The distribution of X₂ can be derived as,

\[\pi_{2}(A) = P\left( X_{2} \in A \right) = \int_{x_{1} \in S}^{}{P(x_{1},A)d}\pi_{1}\left( x_{1} \right)\]

Where A ⊆ S is any event, and the integral is a Lebesgue integral with respect to the probability measure π₁.

The distribution of other terms of the chain X_n can be derived recursively as,

\[\pi_{n}(A) = P\left( X_{n} \in A \right) = \int_{x_{n - 1} \in S}^{}{P(x_{n - 1},A)d}\pi_{n - 1}\left( x_{n - 1} \right)\]

When the terms of the sequence are continuous variables (or vectors), then X₁ has a p.d.f. f₁(x₁), such that,

\[\pi_{1}(A) = \int_{x_{1} \in A}^{}{f_{1}\left( x_{1} \right)dx_{1}}\]

And the kernel density can be expressed as,

\[P\left( x_{n - 1},A \right) = \int_{x_{1} \in A}^{}{f\left( x_{n}|x_{n - 1} \right)dx_{n}}\]

Thus, the marginal density function can be expressed as,

\[f_{n}\left( x_{n} \right) = \int_{x_{n - 1} \in S}^{}{f\left( x_{n}|x_{n - 1} \right)f\left( x_{n - 1} \right)dx_{n}}\]

b. Stationary distribution

If there exists an initial distribution π(A) such that, under a given transition kernel P(x, A), the distribution of every term in the Markov chain is equal to π(A), then π(A) is referred to as a stationary distribution of the chain.

This condition implies that π(A) satisfies the following integral equation:

\[\pi(A) = \int_{x \in S}^{}{P(x,A)d\pi(x)}\]

For any A ⊆ S.

In cases where both the transition kernel and the stationary distribution have densities, the above equation can be rewritten as,

\[f(y) = \int_{x \in S}^{}{f(y|x)f(x)dx}\]

c. Detailed balance

A Markov chain with an uncountable state space and a transition kernel is said to satisfy the detailed balance condition if and only if there exists a probability measure π_b such that, for all measurable sets A⊆ S and for all x∈ S, the following condition holds:

\[P(x,dy)\pi_{b}(dx) = P(y,dx)\pi_{b}(dy)\]

In cases where both the transition kernel and the stationary distribution have densities, the detailed balance condition can be rewritten as,

\[f\left( y \middle| x \right)f(x) = f(x|y)f(y)\]

Integrating both sides of the equation with respect to y,

\[\int_{y \in S}^{}{f\left( y \middle| x \right)f(x)dy} = \int_{y \in S}^{}{f(x|y)f(y)dy}\]

\[f(x)\int_{y \in S}^{}{f\left( y \middle| x \right)dy} = \int_{y \in S}^{}{f(x|y)f(y)dy}\]

Since \(\int_{y \in S}^{}{f\left( y \middle| x \right)dy} = 1\)

\[f(x) = \int_{y \in S}^{}{f(x|y)f(y)dy}\]

Thus, π_b is a stationary distribution.

d. Phi-Irreducibility

Let x∈ S and let A⊆ S be an event. We can define the hitting time as,

\[T_{xA} = min\left\{ x > 1\ :\ X_{n} \in A \right\}\]

Where the subscript x indicates that the chain is assumed to start from X₁=x.

We say that x leads to A if and only if

\[P_{x}\left( T_{xA} < \infty \right) > 0\]

Where P_x denotes the probability computed assuming that the chain starts at x.

A Markov chain is said to be φ-irreducible if and only if there exists a measure φ such that for every state x∈ S, and every measurable set A⊆ S with φ(A)>0, the chain starting from x reaches A with positive probability in finite time.

e. Harris-Recurrence

When working with uncountable state spaces, a stronger notion of recurrence known as Harris recurrence is often employed. This concept extends the idea of recurrence beyond what is captured by positive recurrence in countable state spaces.

An event A⊆ S is said to be Harris recurrent if the probability that the Markov chain returns to A infinitely often (starting from any point within A) equals 1.

A Markov chain is called Harris recurrent if it satisfies two conditions:

It is φ-irreducible, meaning it has the potential to reach any measurable set A with positive φ measure from any starting state.
Every measurable set A⊆ S with φ(A) > 0 is Harris recurrent, i.e., the chain will return to such sets infinitely often with probability 1.

f. Aperiodicity

A Markov chain is said to have period d if its state space can be partitioned into d mutually exclusive events such that the chain exactly takes d periods to cycle through these events.

If the events are A₁, A₂, …, A_d, then

\[X_{n} \in A_{1} \Rightarrow P\left( X_{n + 1}\epsilon A_{2} \right) = 1\]

\[X_{n + 1} \in A_{2} \Rightarrow P\left( X_{n + 2}\epsilon A_{3} \right) = 1\]

…

\[X_{n + d - 1} \in A_{d} \Rightarrow P\left( X_{n + d}\epsilon A_{1} \right) = 1\]

A chain is said to be aperiodic if its period d = 1.

In the case of an uncountable state space, having φ-irreducible and Harris recurrence conditions satisfied doesn’t guarantee the existence and uniqueness of the stationary distribution.

g. Some propositions

Proposition 1: If a Markov chain with an uncountable state space is φ-irreducible, Harris recurrent, aperiodic, and has a stationary distribution π, then π_n converges to π as n → ∞.

Proposition 2: If a Markov chain with an uncountable state space is φ-irreducible, Harris recurrent, aperiodic, and has a stationary distribution π, then

\[\frac{1}{n}\sum_{i = 1}^{n}{f\left( X_{i} \right){\overset{a.s.}{\rightarrow}\int_{x \in S}^{}{f(x)d\pi(x)}}}\]

such that the expected value \(\int_{x \in S}^{}{|f(x)|d\pi(x)}\) exists and is finite for any function f : S → ℝ.

Suggested Readings:

Roberts, G. O., & Rosenthal, J. S. (2006). Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov chains.
Norris, J. R. (1998). Markov chains (No. 2). Cambridge University Press.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. (Eds.). (1995). Markov chain Monte Carlo in practice. CRC Press.
Pishro-Nik, H. (2014). Introduction to probability, statistics, and random processes (p. 732). Blue Bell, PA, USA: Kappa Research, LLC.
Taboga, M. (2021). Markov chain. Fundamentals of Statistics.
Hoel, P. G., Port, S. C., & Stone, C. J. (1986). Introduction to stochastic processes. Waveland Press.

Suggested Citation: Meitei, W. B. (2025). Markov Chain. WBM STATS.

Pages

Sunday, 27 July 2025

Markov Chain

No comments:

Post a Comment

Disclaimer