Markov Chains: A Gentle Introduction

4 minute read

Added: February 25, 2023

Markov Chains

Goal. By the end, you should be able to define a (discrete-time) Markov chain, read and build a transition matrix, forecast with powers of that matrix, and explain when/why long-run behavior stabilizes.

Prereqs. Basic probability (events, conditional probability), matrices at the level of multiplying compatible dimensions.

What is a Markov Process?

Many systems evolve step-by-step: today → tomorrow, minute $t$ → minute $t+1$. A Markov chain models such a system when the distribution of the next step depends only on the present state, not on the full history. This is the Markov property.

Formal setup: Discrete Markov Chains

Time: discrete steps $t=0,1,2,\dots$
States: a set $S$ (finite or countable), e.g. ${1,2,\dots,n}$ or ${\text{Sunny},\text{Rainy}}$
Random variables: $X_t\in S$ is the state at time $t$
Markov property: for all states $i,j$ and all histories, $\Pr(X_{t+1}=j\mid X_t=i, X_{t-1}=x_{t-1},\dots,X_0=x_0)=\Pr(X_{t+1}=j\mid X_t=i).$
Time-homogeneous (assumed here): the one-step probabilities don’t depend on $t$.

Transition matrix

Collect the one-step probabilities in a matrix $P$ with entries $P_{ij}=\Pr(X_{t+1}=j\mid X_t=i).$ We’ll use row-stochastic convention: rows sum to 1. If $s^{(t)}$ is the row vector of state probabilities at time $t$ (so $s^{(t)}_i=\Pr(X_t=i)$), then $\boxed{s^{(t+1)} = s^{(t)}P}\qquad\text{and}\qquad s^{(t)}=s^{(0)}P^t.$

Pedantics: If you prefer column vectors, transpose everything consistently. Mixing conventions causes errors.

Chapman–Kolmogorov: $P^{t+s}=P^tP^s$ and $(P^t)_{ij}=\Pr(X_t=j\mid X_0=i)$.

A first example (two-state weather)

Let $S={\text{Sunny}=1,\;\text{Rainy}=2}$ and $P=\begin{bmatrix} 0.7 & 0.3\\ 0.4 & 0.6 \end{bmatrix}.$

If today is Sunny with certainty, $s^{(0)}=[1\;0]$. Then $s^{(1)}=s^{(0)}P=[0.7\;0.3]$: tomorrow is Sunny with prob. $0.7$, Rainy with prob. $0.3$.
Two days out: $s^{(2)}=s^{(0)}P^2$. You can compute $P^2$ or multiply step-by-step.

Stationary distribution. A distribution $\pi$ with $\pi=\pi P$ (and entries nonnegative summing to 1) is called stationary. For the $P$ above, $\pi_1=0.7\pi_1+0.4\pi_2 \;\Rightarrow\; 0.3\pi_1=0.4\pi_2 \;\Rightarrow\; 3\pi_1=4\pi_2.$ With $\pi_1+\pi_2=1$, solve to get $\pi=(4/7,\;3/7)\approx(0.5714,\;0.4286)$.

Interpretation: in the long run ~57% of days are Sunny and ~43% Rainy, regardless of the starting day.

Long-run behavior: Do we converge?

For finite chains, three properties are central:

Irreducible: from every state you can eventually reach every other state with positive probability.
Aperiodic: the chain doesn’t get trapped in a deterministic cycle (formally, each state has period 1).
Ergodic theorem (informal): if the chain is finite, irreducible, and aperiodic, then for any initial distribution $s^{(0)}$, $s^{(t)} \to \pi\quad\text{as }t\to\infty,$ where $\pi$ is the unique stationary distribution.

If these conditions are satisfied, it means that the system “forgets” its starting point and settles into a stable mix of time spent in each state.

Classifying states

Absorbing: $i$ is absorbing if $P_{ii}=1$. Once you enter, you never leave.
Transient vs. recurrent: Starting from $i$, if the probability of ever returning to $i$ is $<1$, it’s transient; if it’s $1$, it’s recurrent. In any finite irreducible chain, all states are recurrent.
Communicating classes: states partition into classes that can reach each other. A chain is irreducible iff there is a single communicating class.

Building a Markov model from data (recipe)

Choose states so that the next step plausibly depends only on the current state. If the Markov property looks false, your state is probably too small; expand it.
Estimate $P$: for each state $i$, set $\widehat{P}_{ij}=\frac{\#\,\text{of observed transitions }i\to j}{\#\,\text{of times you were in }i},\quad \text{then ensure each row sums to 1.}$
Validate: hold out some sequences and compare forecasts $s^{(t)}=s^{(0)}\widehat{P}^t$ to observed frequencies.
Analyze and simulate: compute $\pi$ (solve $\pi=\pi\widehat{P}$), check irreducibility/aperiodicity, and simulate sample paths to build intuition.

Examples from the real world

Queues (coffee shop): State = number of people in line. Arrivals increment, services decrement. With “memoryless” arrival/service assumptions, this becomes a birth–death chain.
Board games: State = square plus any special flags (e.g., “in jail”). Dice make transitions depend only on the current state once you model the rules properly.
Random surfer (PageRank): State = webpage. Transition probabilities follow links plus occasional random jumps; $\pi$ ranks pages by long-run visit rate.
Credit ratings: State = ${\mathrm{AAA},\mathrm{AA},\mathrm{A},\dots,\mathrm{D}}$. $P$ captures annual upgrade/downgrade/default probabilities.

Share on

Twitter Facebook LinkedIn

Shreeyesh Menon

Markov Chains: A Gentle Introduction

Markov Chains

What is a Markov Process?

Formal setup: Discrete Markov Chains

Transition matrix

A first example (two-state weather)

Long-run behavior: Do we converge?

Classifying states

Building a Markov model from data (recipe)

Examples from the real world

Share on

You May Also Enjoy

The Geometry of Information

Dynamical Systems and Linear Algebra

Markov Chain Monte Carlo

Linear Algebra: Four perspectives