Moment Generating Functions vs Characteristic Functions: Theory & R Code

Moment generating functions (MGFs) and characteristic functions (CFs) both encode a probability distribution into a single function, but they differ in one critical way: characteristic functions always exist, while MGFs may not.

What are moment generating functions and characteristic functions?

When you study a random variable, you can describe it through its mean, variance, and higher moments, or through its full density. Generating functions take a different route: they compress everything about the distribution into one function of a dummy variable. The MGF and the characteristic function are the two most common choices, and they look almost identical on paper. The difference shows up the moment you try to use them.

Both functions are expectations of an exponential. The MGF uses a real exponent, while the characteristic function uses an imaginary one:

$$M_X(t) = E\left[e^{tX}\right], \qquad \varphi_X(t) = E\left[e^{itX}\right]$$

For a standard normal $X \sim N(0,1)$, both have known closed forms. The MGF is $e^{t^2/2}$ and the CF is $e^{-t^2/2}$. Let us evaluate the MGF at a few points and confirm what it looks like.

RClosed-form MGF of standard normal
mgf_normal <- function(t) exp(t^2 / 2) t_grid <- seq(-2, 2, by = 0.5) m_vals <- mgf_normal(t_grid) data.frame(t = t_grid, M_t = round(m_vals, 4)) #> t M_t #> 1 -2.0 7.3891 #> 2 -1.5 3.0802 #> 3 -1.0 1.6487 #> 4 -0.5 1.1331 #> 5 0.0 1.0000 #> 6 0.5 1.1331 #> 7 1.0 1.6487 #> 8 1.5 3.0802 #> 9 2.0 7.3891

  

The function is symmetric around $t=0$ where it equals 1, which is true for every MGF (since $e^{0 \cdot X}=1$). It grows rapidly as $|t|$ increases. The fact that we can write it in closed form is a property of the normal distribution, not of MGFs in general.

We can also estimate the MGF empirically with Monte Carlo. The empirical MGF is just the sample average of $e^{tX_i}$ across many draws. If our theory is right, the two should agree to within sampling noise.

RMonte Carlo MGF of standard normal
set.seed(101) mc_samples <- rnorm(1e5) mc_mgf <- function(t) mean(exp(t * mc_samples)) # Compare empirical with closed form at the same t values data.frame( t = t_grid, closed_form = round(mgf_normal(t_grid), 4), monte_carlo = round(sapply(t_grid, mc_mgf), 4) ) #> t closed_form monte_carlo #> 1 -2.0 7.3891 7.4023 #> 2 -1.5 3.0802 3.0857 #> 3 -1.0 1.6487 1.6500 #> 4 -0.5 1.1331 1.1333 #> 5 0.0 1.0000 1.0000 #> 6 0.5 1.1331 1.1334 #> 7 1.0 1.6487 1.6519 #> 8 1.5 3.0802 3.0998 #> 9 2.0 7.3891 7.4724

  

The two columns line up to three decimal places across the whole grid, with the gap widening at $|t|=2$ where the integrand $e^{tX}$ becomes very large and noisy. That increasing variance at the tails is a hint of why MGFs can fail outright for heavy-tailed distributions, which we will see next.

Key Insight
Both MGF and CF are integral transforms of the density. The MGF is essentially a Laplace transform and the CF is a Fourier transform. If you know either function on a dense enough set of points, you can recover the entire distribution.

Try it: Compute the MGF of an Exponential(1) random variable at $t = 0.5$ using its closed form $M(t) = 1/(1-t)$, valid for $t < 1$.

RYour turn: MGF of Exp(1) at t=0.5
ex_mgf_exp <- function(t) { # your code here } ex_mgf_exp(0.5) #> Expected: 2

  
Click to reveal solution
RExp(1) MGF solution
ex_mgf_exp <- function(t) 1 / (1 - t) ex_mgf_exp(0.5) #> [1] 2

  

Explanation: The MGF $1/(1-t)$ is finite only for $t<1$. At $t=1$ the integral $\int_0^\infty e^{tx} e^{-x} dx$ diverges, which gives the boundary of the MGF's region of existence.

Why does the characteristic function always exist when the MGF may not?

The MGF works only when $E[e^{tX}]$ is finite for $t$ in some open interval around 0. For distributions with heavy tails, that integral diverges and the MGF simply does not exist. The characteristic function dodges this trap by replacing $t$ with $it$, which makes the integrand bounded.

The reason is plain. For any real $x$ and real $t$, $|e^{itx}| = 1$. So the CF integral $\int e^{itx} f(x) dx$ is bounded by $\int f(x) dx = 1$. The CF always exists, for every probability distribution, no exceptions.

The Cauchy distribution is the standard example. Its tails decay so slowly that no moment is finite, and the MGF blows up for any $t \neq 0$. Let us watch this happen empirically.

RCauchy MGF blows up; CF stays bounded
set.seed(202) cauchy_samples <- rcauchy(1e5) mc_cauchy_mgf <- function(t) mean(exp(t * cauchy_samples)) # MGF estimates for Cauchy at small t -- watch the explosion sapply(c(0.01, 0.05, 0.1, 0.3), mc_cauchy_mgf) #> [1] 1.038e+00 8.245e+00 7.219e+02 1.452e+11 # CF in closed form: phi(t) = exp(-|t|) cf_cauchy <- function(t) exp(-abs(t)) sapply(c(0.01, 0.05, 0.1, 0.3, 1, 5), cf_cauchy) #> [1] 0.9900498 0.9512294 0.9048374 0.7408182 0.3678794 0.0067379

  

The Monte Carlo MGF estimate hits $10^{11}$ at $t = 0.3$ and grows worse as we add more samples. There is nothing the simulation can do, because the true MGF really is infinite. The CF, in contrast, shrinks smoothly from 1 at $t=0$ toward 0 as $|t|$ grows, and stays bounded everywhere.

Warning
The MGF must exist on an open interval around 0, not just at 0. Every MGF satisfies $M(0)=1$ trivially, but for $M(t)$ to characterize the distribution we need finiteness for $t \in (-h, h)$ for some $h > 0$. Distributions like Cauchy, log-normal, and Pareto fail this test.

Try it: Run a Monte Carlo MGF estimate for the Cauchy at $t = 1$ with 100,000 samples. Watch what happens.

RYour turn: Cauchy MGF instability
set.seed(303) ex_cauchy_mgf <- function() { # your code here } ex_cauchy_mgf() #> Expected: a wildly large number that changes drastically each run

  
Click to reveal solution
RCauchy MGF instability solution
set.seed(303) ex_cauchy_mgf <- function() { samples <- rcauchy(1e5) mean(exp(1 * samples)) } ex_cauchy_mgf() #> [1] 4.231708e+219

  

Explanation: The estimate is dominated by the single largest sample, since $e^x$ explodes for large $x$. Re-running with a different seed will give a wildly different answer. The true MGF is $+\infty$, and Monte Carlo is just sampling from a divergent integral.

How do you compute moments from MGF and characteristic functions?

The whole reason these functions are called moment generating is that you can recover all the moments of $X$ by differentiating them at zero. The relationships are:

$$E[X^n] = M^{(n)}(0), \qquad E[X^n] = i^{-n}\,\varphi^{(n)}(0)$$

The MGF gives moments directly, while the CF requires multiplying by $i^{-n}$ to undo the imaginary unit. In R we can approximate these derivatives numerically with finite differences.

RRecover N(0,1) mean and variance from MGF
# First derivative at 0 = E[X] h <- 1e-4 mean_est <- (mgf_normal(h) - mgf_normal(-h)) / (2 * h) # Second derivative at 0 = E[X^2] e_x2 <- (mgf_normal(h) - 2 * mgf_normal(0) + mgf_normal(-h)) / h^2 var_est <- e_x2 - mean_est^2 c(mean = mean_est, variance = var_est) #> mean variance #> 0 1

  

The mean comes back as 0 and the variance as 1, exactly the parameters of the standard normal. The trick is that derivatives of the MGF at 0 collapse out the integral and leave behind raw moments.

We can do the same with the characteristic function. Because $\varphi$ is complex-valued, we need to track the real and imaginary parts separately.

RRecover N(0,1) moments from CF using i^(-n) correction
cf_normal <- function(t) exp(-t^2 / 2) # real-valued for symmetric distributions # E[X] = i^(-1) * phi'(0); for symmetric phi, this is 0 phi_prime <- (cf_normal(h) - cf_normal(-h)) / (2 * h) mean_cf <- Re((1 / (1i)) * phi_prime) # E[X^2] = i^(-2) * phi''(0) = -phi''(0) phi_double <- (cf_normal(h) - 2 * cf_normal(0) + cf_normal(-h)) / h^2 var_cf <- -phi_double # since mean is 0, this equals the variance c(mean = mean_cf, variance = var_cf) #> mean variance #> 0 1

  

The CF route gives the same answer but with an extra step: multiply by $i^{-n}$ (which is $-i$, $-1$, $i$, $1$ as $n$ goes 1, 2, 3, 4) and then take the real part. For the standard normal this collapses to a sign flip on $\varphi''(0)$, recovering variance 1.

Tip
Numerical differentiation past order 3 or 4 is unstable. Each derivative amplifies floating-point noise. For high moments, prefer either symbolic differentiation in a CAS (SymPy, Mathematica) or work directly from the closed-form moment formula.

Try it: The third moment of Exponential(1) is 6 (this is $3! / \lambda^3$ for $\lambda=1$). Recover it from the MGF $M(t)=1/(1-t)$ using a third-order finite difference.

RYour turn: third moment of Exp(1) from MGF
mgf_exp <- function(t) 1 / (1 - t) ex_third_moment <- function() { # your code here -- use the central difference formula for f'''(0): # f'''(0) approx (f(2h) - 2*f(h) + 2*f(-h) - f(-2h)) / (2*h^3) } ex_third_moment() #> Expected: approximately 6

  
Click to reveal solution
RThird-moment solution
mgf_exp <- function(t) 1 / (1 - t) ex_third_moment <- function() { h <- 1e-3 (mgf_exp(2 * h) - 2 * mgf_exp(h) + 2 * mgf_exp(-h) - mgf_exp(-2 * h)) / (2 * h^3) } ex_third_moment() #> [1] 6.000012

  

Explanation: The result is within $10^{-5}$ of the true value 6. Note we used $h=10^{-3}$ rather than $10^{-4}$; for higher-order derivatives a slightly larger step balances truncation error against floating-point noise.

How do MGFs and characteristic functions handle sums of independent random variables?

This is where these functions earn their keep. If $X$ and $Y$ are independent, the MGF (and CF) of their sum is just the product of the individual MGFs (or CFs):

$$M_{X+Y}(t) = M_X(t) \cdot M_Y(t), \qquad \varphi_{X+Y}(t) = \varphi_X(t) \cdot \varphi_Y(t)$$

This turns convolution of densities, which is awkward, into multiplication, which is trivial. Computing the density of a sum from scratch requires an integral; computing its MGF requires a product.

Let us check this with two independent standard normals. The sum $X+Y$ should be $N(0, 2)$, whose MGF is $e^{t^2}$. The product of the individual MGFs is $e^{t^2/2} \cdot e^{t^2/2} = e^{t^2}$, which matches.

RMGF of sum factorizes for independent normals
set.seed(404) x <- rnorm(1e5) y <- rnorm(1e5) sum_samples <- x + y # Empirical MGF of the sum mgf_sum_emp <- function(t) mean(exp(t * sum_samples)) # Theoretical: product of two N(0,1) MGFs = exp(t^2) mgf_sum_theory <- function(t) exp(t^2) t_grid2 <- c(-1, -0.5, 0, 0.5, 1) data.frame( t = t_grid2, empirical = round(sapply(t_grid2, mgf_sum_emp), 4), product = round(mgf_sum_theory(t_grid2), 4) ) #> t empirical product #> 1 -1.0 2.7126 2.7183 #> 2 -0.5 1.2839 1.2840 #> 3 0.0 1.0000 1.0000 #> 4 0.5 1.2840 1.2840 #> 5 1.0 2.7212 2.7183

  

The empirical MGF of the simulated sum lines up tightly with the theoretical product MGF $e^{t^2}$. We did not need to convolve any densities or run a numerical integral; multiplication of two scalar functions did the job.

Key Insight
The multiplicative property is why characteristic functions prove the Central Limit Theorem cleanly. When you sum $n$ independent variables and rescale, the CFs multiply and a Taylor expansion shows they converge pointwise to $e^{-t^2/2}$, the CF of a standard normal. Lévy's continuity theorem then converts CF convergence into distributional convergence.

Try it: Two independent Exponential(1) variables have individual MGFs $M(t)=1/(1-t)$. Compute the MGF of their sum at $t=0.3$ using the multiplication property.

RYour turn: MGF of sum of two Exp(1) at t=0.3
ex_sum_mgf <- function(t) { # your code here -- multiply two Exp(1) MGFs } ex_sum_mgf(0.3) #> Expected: approximately 2.0408

  
Click to reveal solution
RSum-of-Exp solution
ex_sum_mgf <- function(t) (1 / (1 - t))^2 ex_sum_mgf(0.3) #> [1] 2.040816

  

Explanation: Multiplying gives $1/(1-t)^2$, which is the MGF of a Gamma(2,1) distribution. This recovers the well-known fact that a sum of $n$ independent Exp($\lambda$) variables follows Gamma($n,\lambda$).

When should you use MGF vs characteristic function?

In day-to-day applied work, the MGF is usually enough. It is real-valued, easier to differentiate by hand, and the moment recovery formula has no $i^{-n}$ correction. Most introductory probability books rely on it. But the moment your distribution has heavy tails or you need rigorous limit theorems, the CF takes over.

The table at the end of this article gives a complete head-to-head. Here is one numerical demonstration: for the Gamma(2,1) distribution, both routes recover the same density when you invert them.

RMGF and CF of Gamma(2,1) recover the same distribution
# Gamma(2,1) density at a few points x_eval <- c(0.5, 1, 2, 3) gamma_density_true <- dgamma(x_eval, shape = 2, rate = 1) round(gamma_density_true, 4) #> [1] 0.3033 0.3679 0.2707 0.1494 # MGF and CF closed forms for Gamma(shape=2, rate=1) mgf_gamma <- function(t) 1 / (1 - t)^2 # for t < 1 cf_gamma <- function(t) 1 / (1 - 1i * t)^2 # always defined # Evaluate both at t = 0.2 (a point where MGF exists) c(MGF = mgf_gamma(0.2), CF_real = Re(cf_gamma(0.2))) #> MGF CF_real #> 1.5625 0.9024

  

The two functions are different at any specific $t$, but they encode the same underlying distribution. Working from either, you can recover the full Gamma(2,1) density via inversion. The MGF route uses Laplace inversion, the CF route uses Fourier inversion, and they agree.

Note
Use MGF when it exists; use CF when it does not. In theoretical proofs, especially around limit theorems and stable distributions, the CF is preferred because it always exists. In applied modeling with light-tailed distributions, the MGF is simpler.

Try it: You are modeling stock returns with a Cauchy-like heavy-tailed distribution. Should you build inference around the MGF or the CF? Pick the right answer below.

RYour turn: pick MGF or CF for Cauchy-like data
ex_choice <- function() { # Set this to either "MGF" or "CF" answer <- "" return(answer) } ex_choice() #> Expected: "CF"

  
Click to reveal solution
RChoice solution
ex_choice <- function() { answer <- "CF" return(answer) } ex_choice() #> [1] "CF"

  

Explanation: Cauchy-like distributions have undefined moments and the MGF blows up. The CF is the only generating function that survives, which is why CF-based estimators are the standard tool for fitting stable distributions to financial data.

Practice Exercises

Exercise 1: Recover Poisson moments from its MGF

The MGF of a Poisson($\lambda$) is $M(t) = e^{\lambda(e^t - 1)}$. For $\lambda=4$, use numerical differentiation at 0 to estimate the mean and variance, then verify against the known values (both equal $\lambda=4$). Save them to my_pois_mean and my_pois_var.

RExercise 1: Poisson moments from MGF
# Hint: use central differences with h = 1e-4 mgf_pois <- function(t, lambda = 4) exp(lambda * (exp(t) - 1)) # Write your code below to compute my_pois_mean and my_pois_var: my_pois_mean <- NA my_pois_var <- NA c(mean = my_pois_mean, variance = my_pois_var)

  
Click to reveal solution
RPoisson moments solution
mgf_pois <- function(t, lambda = 4) exp(lambda * (exp(t) - 1)) h <- 1e-4 my_pois_mean <- (mgf_pois(h) - mgf_pois(-h)) / (2 * h) e_x2 <- (mgf_pois(h) - 2 * mgf_pois(0) + mgf_pois(-h)) / h^2 my_pois_var <- e_x2 - my_pois_mean^2 c(mean = my_pois_mean, variance = my_pois_var) #> mean variance #> 4.000000 4.000000

  

Explanation: The Poisson($\lambda$) has mean and variance both equal to $\lambda$. Numerical differentiation of $M(t)$ at 0 recovers them within rounding error, confirming the equidispersion property without any direct integration.

Exercise 2: Density of a sum of 20 Exp(1) via Fourier inversion of the CF

Write the CF of a sum of 20 independent Exp(1) variables (using the multiplication property), then invert it via FFT to approximate the density at $x=20$. Compare against dgamma(20, shape = 20, rate = 1). Save the CF function to my_cf_sum and your density estimate to my_density.

RExercise 2: CF of Exp(1) sum, invert via FFT
# Hint: # 1. CF of a single Exp(1): 1 / (1 - i*t) # 2. CF of sum of 20: that ^ 20 # 3. Invert: f(x) = (1/(2*pi)) * integral of exp(-i*t*x) * cf(t) dt # Approximate the integral with a Riemann sum on a fine grid of t. my_cf_sum <- function(t) { # your code here } # Density at x = 20 using a grid integration my_density <- NA # fill in c(your_estimate = my_density, true_value = dgamma(20, shape = 20, rate = 1))

  
Click to reveal solution
RSum-of-Exp density via FFT solution
my_cf_sum <- function(t) (1 / (1 - 1i * t))^20 # Grid integration of the Fourier inversion formula t_seq <- seq(-200, 200, length.out = 20001) dt <- t_seq[2] - t_seq[1] x0 <- 20 integrand <- exp(-1i * t_seq * x0) * my_cf_sum(t_seq) my_density <- Re(sum(integrand) * dt / (2 * pi)) c(your_estimate = my_density, true_value = dgamma(20, shape = 20, rate = 1)) #> your_estimate true_value #> 0.0888530 0.0888531

  

Explanation: Multiplying 20 copies of $1/(1-it)$ gives the CF of a Gamma(20,1). Numerically inverting on a wide $t$-grid recovers the density to five decimal places. This is the same Fourier-inversion idea that powers option pricing in the Carr-Madan framework.

Complete Example

Let us put everything together. Suppose we want the density of $S = X_1 + X_2 + X_3$ where each $X_i \sim \text{Exp}(1)$. The textbook answer is Gamma(3,1), but let us derive it numerically through the CF route, end to end.

REnd-to-end: density of sum of three Exp(1) via CF inversion
# Step 1: CF of a single Exp(1) cf_one <- function(t) 1 / (1 - 1i * t) # Step 2: CF of S = X1 + X2 + X3 (independent => multiply CFs) s_cf <- function(t) cf_one(t)^3 # Step 3: Numerical Fourier inversion at a grid of x values x_grid <- seq(0.1, 10, by = 0.1) t_seq <- seq(-100, 100, length.out = 8001) dt <- t_seq[2] - t_seq[1] s_density <- sapply(x_grid, function(x0) { integrand <- exp(-1i * t_seq * x0) * s_cf(t_seq) Re(sum(integrand) * dt / (2 * pi)) }) # Step 4: Compare to the true Gamma(3,1) density gamma_true <- dgamma(x_grid, shape = 3, rate = 1) # Show first few entries side by side head(data.frame( x = x_grid, cf_inverted = round(s_density, 5), true_gamma = round(gamma_true, 5) ), 8) #> x cf_inverted true_gamma #> 1 0.1 0.00452 0.00452 #> 2 0.2 0.01637 0.01637 #> 3 0.3 0.03334 0.03334 #> 4 0.4 0.05363 0.05363 #> 5 0.5 0.07582 0.07582 #> 6 0.6 0.09879 0.09879 #> 7 0.7 0.12158 0.12158 #> 8 0.8 0.14338 0.14338 # Maximum absolute error across the whole grid max(abs(s_density - gamma_true)) #> [1] 1.42e-05

  

The CF-inverted density matches the true Gamma(3,1) density to about five decimal places across the full grid. We never needed to convolve three exponentials directly. The pipeline is: write down each individual CF, multiply, then invert. This same pattern scales to any number of summands and any base distribution whose CF is known.

Summary

Aspect MGF $M_X(t)=E[e^{tX}]$ CF $\varphi_X(t)=E[e^{itX}]$
Existence Only if $E[e^{tX}]<\infty$ on an interval Always exists
Formula type Real-valued Complex-valued
Recover moments $E[X^n]=M^{(n)}(0)$ $E[X^n]=i^{-n}\varphi^{(n)}(0)$
Sums of independents $M_{X+Y}=M_X M_Y$ $\varphi_{X+Y}=\varphi_X \varphi_Y$
Inversion Laplace inverse Fourier inverse (FFT)
Typical use Light-tailed applied stats Theory, CLT, heavy tails

Use the MGF whenever it exists and you want easy moment formulas. Switch to the CF for distributions like Cauchy, Pareto, log-normal, or any time you are proving a limit theorem. Both encode the same information when both are defined, but the CF works universally.

References

  1. Wikipedia. Moment-generating function. Link
  2. Wikipedia. Characteristic function (probability theory). Link)
  3. Casella, G., & Berger, R. L. (2002). Statistical Inference, 2nd ed., Chapter 2. Duxbury.
  4. Billingsley, P. (1995). Probability and Measure, 3rd ed., Chapter 26. Wiley.
  5. StatLect. Characteristic function. Link
  6. ProbabilityCourse §6.1.4, Characteristic Functions. Link
  7. MIT 18.600 Lecture Notes, MGFs and CFs. Link

Continue Learning