Moment Generating Functions vs Characteristic Functions: Theory & R Code
Moment generating functions (MGFs) and characteristic functions (CFs) both encode a probability distribution into a single function, but they differ in one critical way: characteristic functions always exist, while MGFs may not.
What are moment generating functions and characteristic functions?
When you study a random variable, you can describe it through its mean, variance, and higher moments, or through its full density. Generating functions take a different route: they compress everything about the distribution into one function of a dummy variable. The MGF and the characteristic function are the two most common choices, and they look almost identical on paper. The difference shows up the moment you try to use them.
Both functions are expectations of an exponential. The MGF uses a real exponent, while the characteristic function uses an imaginary one:
$$M_X(t) = E\left[e^{tX}\right], \qquad \varphi_X(t) = E\left[e^{itX}\right]$$
For a standard normal $X \sim N(0,1)$, both have known closed forms. The MGF is $e^{t^2/2}$ and the CF is $e^{-t^2/2}$. Let us evaluate the MGF at a few points and confirm what it looks like.
The function is symmetric around $t=0$ where it equals 1, which is true for every MGF (since $e^{0 \cdot X}=1$). It grows rapidly as $|t|$ increases. The fact that we can write it in closed form is a property of the normal distribution, not of MGFs in general.
We can also estimate the MGF empirically with Monte Carlo. The empirical MGF is just the sample average of $e^{tX_i}$ across many draws. If our theory is right, the two should agree to within sampling noise.
The two columns line up to three decimal places across the whole grid, with the gap widening at $|t|=2$ where the integrand $e^{tX}$ becomes very large and noisy. That increasing variance at the tails is a hint of why MGFs can fail outright for heavy-tailed distributions, which we will see next.
Try it: Compute the MGF of an Exponential(1) random variable at $t = 0.5$ using its closed form $M(t) = 1/(1-t)$, valid for $t < 1$.
Click to reveal solution
Explanation: The MGF $1/(1-t)$ is finite only for $t<1$. At $t=1$ the integral $\int_0^\infty e^{tx} e^{-x} dx$ diverges, which gives the boundary of the MGF's region of existence.
Why does the characteristic function always exist when the MGF may not?
The MGF works only when $E[e^{tX}]$ is finite for $t$ in some open interval around 0. For distributions with heavy tails, that integral diverges and the MGF simply does not exist. The characteristic function dodges this trap by replacing $t$ with $it$, which makes the integrand bounded.
The reason is plain. For any real $x$ and real $t$, $|e^{itx}| = 1$. So the CF integral $\int e^{itx} f(x) dx$ is bounded by $\int f(x) dx = 1$. The CF always exists, for every probability distribution, no exceptions.
The Cauchy distribution is the standard example. Its tails decay so slowly that no moment is finite, and the MGF blows up for any $t \neq 0$. Let us watch this happen empirically.
The Monte Carlo MGF estimate hits $10^{11}$ at $t = 0.3$ and grows worse as we add more samples. There is nothing the simulation can do, because the true MGF really is infinite. The CF, in contrast, shrinks smoothly from 1 at $t=0$ toward 0 as $|t|$ grows, and stays bounded everywhere.
Try it: Run a Monte Carlo MGF estimate for the Cauchy at $t = 1$ with 100,000 samples. Watch what happens.
Click to reveal solution
Explanation: The estimate is dominated by the single largest sample, since $e^x$ explodes for large $x$. Re-running with a different seed will give a wildly different answer. The true MGF is $+\infty$, and Monte Carlo is just sampling from a divergent integral.
How do you compute moments from MGF and characteristic functions?
The whole reason these functions are called moment generating is that you can recover all the moments of $X$ by differentiating them at zero. The relationships are:
$$E[X^n] = M^{(n)}(0), \qquad E[X^n] = i^{-n}\,\varphi^{(n)}(0)$$
The MGF gives moments directly, while the CF requires multiplying by $i^{-n}$ to undo the imaginary unit. In R we can approximate these derivatives numerically with finite differences.
The mean comes back as 0 and the variance as 1, exactly the parameters of the standard normal. The trick is that derivatives of the MGF at 0 collapse out the integral and leave behind raw moments.
We can do the same with the characteristic function. Because $\varphi$ is complex-valued, we need to track the real and imaginary parts separately.
The CF route gives the same answer but with an extra step: multiply by $i^{-n}$ (which is $-i$, $-1$, $i$, $1$ as $n$ goes 1, 2, 3, 4) and then take the real part. For the standard normal this collapses to a sign flip on $\varphi''(0)$, recovering variance 1.
Try it: The third moment of Exponential(1) is 6 (this is $3! / \lambda^3$ for $\lambda=1$). Recover it from the MGF $M(t)=1/(1-t)$ using a third-order finite difference.
Click to reveal solution
Explanation: The result is within $10^{-5}$ of the true value 6. Note we used $h=10^{-3}$ rather than $10^{-4}$; for higher-order derivatives a slightly larger step balances truncation error against floating-point noise.
How do MGFs and characteristic functions handle sums of independent random variables?
This is where these functions earn their keep. If $X$ and $Y$ are independent, the MGF (and CF) of their sum is just the product of the individual MGFs (or CFs):
$$M_{X+Y}(t) = M_X(t) \cdot M_Y(t), \qquad \varphi_{X+Y}(t) = \varphi_X(t) \cdot \varphi_Y(t)$$
This turns convolution of densities, which is awkward, into multiplication, which is trivial. Computing the density of a sum from scratch requires an integral; computing its MGF requires a product.
Let us check this with two independent standard normals. The sum $X+Y$ should be $N(0, 2)$, whose MGF is $e^{t^2}$. The product of the individual MGFs is $e^{t^2/2} \cdot e^{t^2/2} = e^{t^2}$, which matches.
The empirical MGF of the simulated sum lines up tightly with the theoretical product MGF $e^{t^2}$. We did not need to convolve any densities or run a numerical integral; multiplication of two scalar functions did the job.
Try it: Two independent Exponential(1) variables have individual MGFs $M(t)=1/(1-t)$. Compute the MGF of their sum at $t=0.3$ using the multiplication property.
Click to reveal solution
Explanation: Multiplying gives $1/(1-t)^2$, which is the MGF of a Gamma(2,1) distribution. This recovers the well-known fact that a sum of $n$ independent Exp($\lambda$) variables follows Gamma($n,\lambda$).
When should you use MGF vs characteristic function?
In day-to-day applied work, the MGF is usually enough. It is real-valued, easier to differentiate by hand, and the moment recovery formula has no $i^{-n}$ correction. Most introductory probability books rely on it. But the moment your distribution has heavy tails or you need rigorous limit theorems, the CF takes over.
The table at the end of this article gives a complete head-to-head. Here is one numerical demonstration: for the Gamma(2,1) distribution, both routes recover the same density when you invert them.
The two functions are different at any specific $t$, but they encode the same underlying distribution. Working from either, you can recover the full Gamma(2,1) density via inversion. The MGF route uses Laplace inversion, the CF route uses Fourier inversion, and they agree.
Try it: You are modeling stock returns with a Cauchy-like heavy-tailed distribution. Should you build inference around the MGF or the CF? Pick the right answer below.
Click to reveal solution
Explanation: Cauchy-like distributions have undefined moments and the MGF blows up. The CF is the only generating function that survives, which is why CF-based estimators are the standard tool for fitting stable distributions to financial data.
Practice Exercises
Exercise 1: Recover Poisson moments from its MGF
The MGF of a Poisson($\lambda$) is $M(t) = e^{\lambda(e^t - 1)}$. For $\lambda=4$, use numerical differentiation at 0 to estimate the mean and variance, then verify against the known values (both equal $\lambda=4$). Save them to my_pois_mean and my_pois_var.
Click to reveal solution
Explanation: The Poisson($\lambda$) has mean and variance both equal to $\lambda$. Numerical differentiation of $M(t)$ at 0 recovers them within rounding error, confirming the equidispersion property without any direct integration.
Exercise 2: Density of a sum of 20 Exp(1) via Fourier inversion of the CF
Write the CF of a sum of 20 independent Exp(1) variables (using the multiplication property), then invert it via FFT to approximate the density at $x=20$. Compare against dgamma(20, shape = 20, rate = 1). Save the CF function to my_cf_sum and your density estimate to my_density.
Click to reveal solution
Explanation: Multiplying 20 copies of $1/(1-it)$ gives the CF of a Gamma(20,1). Numerically inverting on a wide $t$-grid recovers the density to five decimal places. This is the same Fourier-inversion idea that powers option pricing in the Carr-Madan framework.
Complete Example
Let us put everything together. Suppose we want the density of $S = X_1 + X_2 + X_3$ where each $X_i \sim \text{Exp}(1)$. The textbook answer is Gamma(3,1), but let us derive it numerically through the CF route, end to end.
The CF-inverted density matches the true Gamma(3,1) density to about five decimal places across the full grid. We never needed to convolve three exponentials directly. The pipeline is: write down each individual CF, multiply, then invert. This same pattern scales to any number of summands and any base distribution whose CF is known.
Summary
| Aspect | MGF $M_X(t)=E[e^{tX}]$ | CF $\varphi_X(t)=E[e^{itX}]$ |
|---|---|---|
| Existence | Only if $E[e^{tX}]<\infty$ on an interval | Always exists |
| Formula type | Real-valued | Complex-valued |
| Recover moments | $E[X^n]=M^{(n)}(0)$ | $E[X^n]=i^{-n}\varphi^{(n)}(0)$ |
| Sums of independents | $M_{X+Y}=M_X M_Y$ | $\varphi_{X+Y}=\varphi_X \varphi_Y$ |
| Inversion | Laplace inverse | Fourier inverse (FFT) |
| Typical use | Light-tailed applied stats | Theory, CLT, heavy tails |
Use the MGF whenever it exists and you want easy moment formulas. Switch to the CF for distributions like Cauchy, Pareto, log-normal, or any time you are proving a limit theorem. Both encode the same information when both are defined, but the CF works universally.
References
- Wikipedia. Moment-generating function. Link
- Wikipedia. Characteristic function (probability theory). Link)
- Casella, G., & Berger, R. L. (2002). Statistical Inference, 2nd ed., Chapter 2. Duxbury.
- Billingsley, P. (1995). Probability and Measure, 3rd ed., Chapter 26. Wiley.
- StatLect. Characteristic function. Link
- ProbabilityCourse §6.1.4, Characteristic Functions. Link
- MIT 18.600 Lecture Notes, MGFs and CFs. Link
Continue Learning
- Moment Generating Functions in R: Theory, Computation & Applications. A deeper dive into MGF mechanics, including all the derivative tricks we used here.
- Method of Moments in R. Estimator that flows directly from the moment-recovery property of the MGF.
- The Exponential Distribution in R. Background on the Exp(1) variable used throughout this post.