broom fix_data_frame() in R: Rownames to Term Column
The broom::fix_data_frame() helper turns a matrix or data frame with informative rownames into a tidy tibble whose first column holds those names as values. It is the small but load-bearing utility that powers every tidy.lm(), tidy.glm(), and tidy.htest() method inside broom.
fix_data_frame(coef(summary(fit))) # default term column fix_data_frame(mat, newnames = c("est","se")) # rename data columns fix_data_frame(mat, newcol = "predictor") # rename the term column fix_data_frame(df_with_rownames) # data frame input works tibble::rownames_to_column(df, var = "term") # modern public replacement as_tibble(mat, rownames = "term") # tibble-native one-liner broom:::fix_data_frame(x) # access from broom 1.0+
Need explanation? Read on for examples and pitfalls.
What fix_data_frame() does in one sentence
fix_data_frame() takes any rectangular object with meaningful rownames and returns a tibble in which those rownames are an explicit first column. R model objects expose their coefficients as a numeric matrix where each row name is a predictor ((Intercept), wt, cyl). That layout is hostile to dplyr, ggplot2, and report tooling because rownames are a side channel, not a column.
The helper renames the data columns to broom's canonical names (estimate, std.error, statistic, p.value), prepends a term column built from the rownames, and drops the resulting tibble back as a data frame. Every method in broom that wraps a base-R model uses it to standardize output. As of broom 1.0+ the function lives in the package's internal namespace, so user code reaches it via broom:::fix_data_frame() or the public stand-ins shown below.
Syntax
fix_data_frame() has three arguments and zero hidden behavior. The first is the matrix or data frame to convert, the second renames the data columns, and the third names the rownames column.
The three arguments are:
x: a matrix or data frame whose rownames carry information (required)newnames: a character vector of column names for the data columns; length must matchncol(x)(optional; default keeps existing colnames)newcol: a string naming the new column built fromrownames(x); default"term"
The return value is a tibble (an unclassed data frame in pre-1.0 broom) with newcol first, followed by the renamed data columns.
::: operator or switch to tibble::rownames_to_column() and tibble::as_tibble(..., rownames = "term").Common patterns
1. Convert a coefficient matrix to a tidy tibble
The first column is term, lifted straight from the matrix rownames. The next four columns are broom canonical names because the helper recognized the four-column shape produced by the summary call and applied the default labels. From here, downstream verbs in dplyr, geoms in ggplot, and report tables in gt all consume the result without rownames gymnastics.
This is the entire reason broom uses fix_data_frame as a building block. Base R coefficient matrices store the predictor name as a rowname rather than a column, and every modern reporting tool assumes columns. Doing the lift in one tested helper rather than hand-rolling rownames extraction in every method keeps broom internally consistent.
2. Rename the data columns explicitly
Use newnames when you want snake_case columns, or when the matrix has nonstandard headers that broom would otherwise leave untouched. The vector length must equal the number of data columns, and off-by-one errors abort with a clear length-mismatch message rather than producing a corrupted tibble. That strictness is a feature, not a bug, because silent column misalignment is the kind of error that survives review and shows up months later in a report.
3. Rename the term column
Calling the column predictor, coefficient, or feature makes downstream code more readable in reports written for non-statisticians. The default term is what the broom tidy generic always emits, so keep that name if you plan to bind tibbles from many models into one long data frame later. A simple rule covers both cases: keep the default for any tibble that will be combined with broom output elsewhere, and override only at the last step that produces a finished report.
4. Data-frame input passes through untouched
The helper does not require a matrix. When you hand it a data frame, it copies the columns verbatim and prepends the term column, which is useful for cleaning up the output of base aggregation and summary calls that emit data frames with informative rownames. This is especially handy when working with older packages that return tables with rownames containing group labels, factor levels, or time stamps; one helper covers all of them without custom extraction code.
fix_data_frame() vs the modern alternatives
The two replacements live in the tibble package and cover the same ground. Choose based on whether you start with a matrix or a data frame.
| Goal | fix_data_frame() | tibble alternative | |
|---|---|---|---|
| Matrix to tibble with term column | broom:::fix_data_frame(mat) |
tibble::as_tibble(mat, rownames = "term") |
|
| Data frame, move rownames to column | broom:::fix_data_frame(df) |
tibble::rownames_to_column(df, "term") |
|
| Rename data columns at the same time | newnames = c(...) |
`as_tibble() | > rename(...)` |
Available without ::: |
No, internal since broom 1.0 | Yes, exported tibble API |
For new code, prefer the tibble verbs since they ship as part of an exported public interface and survive broom version bumps without warnings. Reach for the broom helper only when reading or maintaining package code that already depends on it, or when writing a new tidier method that should match the broom internal style.
Common pitfalls
broom::fix_data_frame() without the triple colon fails on broom 1.0+. The function is no longer exported, so the public form errors with 'fix_data_frame' is not an exported object from 'namespace:broom'. Use broom:::fix_data_frame() for direct calls, or switch to tibble::as_tibble(mat, rownames = "term").A second trap: rownames silently collapse when a matrix has none. If is.null(rownames(x)) is TRUE, the term column is filled with NA, which then breaks bind_rows() deduplication downstream. Always inspect rownames(x) before passing the matrix in.
The third trap is newnames length mismatch. The function does not recycle short vectors; a length-3 newnames against a four-column matrix aborts with a hard error. When you do not know ncol(x) ahead of time, build the vector with paste0("col", seq_len(ncol(x))).
Try it yourself
Try it: Fit lm(mpg ~ hp + wt, data = mtcars), extract the coefficient matrix, and use fix_data_frame() to produce a tibble whose columns are term, estimate, se, t, and p.
Click to reveal solution
Explanation: coef(summary(fit)) returns a 4-column numeric matrix with predictor rownames. fix_data_frame() moves those rownames into a term column and applies your newnames to the four data columns in order.
Related broom functions
For day-to-day model output, the higher-level wrappers do more in one call:
- tidy(): one row per term with
estimate,std.error,statistic,p.value glance(): one-row model summary with R-squared, AIC, BIC, residual dfaugment(): per-observation fitted values, residuals, and influence diagnostics- tidy.kmeans(): cluster centers, sizes, withinss as a tibble
- tidy.prcomp(): principal component loadings as a tibble
FAQ
Why was fix_data_frame() unexported?
Broom 1.0 reorganized the package around the tidy(), glance(), and augment() generics. The maintainers moved low-level helpers into the internal namespace so the public API would be smaller and easier to document. fix_data_frame() still ships with broom and is called by dozens of internal methods; only direct user calls now need the ::: operator.
What is the difference between fix_data_frame() and tidy()?
tidy() is a generic that dispatches on the class of its argument (lm, glm, htest, etc.) and applies a model-specific method. fix_data_frame() is one building block those methods reach for after they have already extracted the coefficient matrix. You almost never need fix_data_frame() directly when a tidier exists for your model.
Does fix_data_frame() preserve matrix attributes?
No. The helper coerces the input to a data frame, drops the dim, dimnames, and any custom attributes, and returns a fresh tibble. If you need to keep, say, a units attribute on a column, extract it first, run fix_data_frame(), then re-attach the attribute after the conversion.
How do I write a new tidier that uses fix_data_frame() under the hood?
Define a method like tidy.myclass <- function(x, ...) broom:::fix_data_frame(my_coef_matrix(x), newnames = c("estimate", "std.error", "statistic", "p.value")). Register the method with S3method(tidy, myclass) in the package NAMESPACE. The convention is to expose the canonical four columns so downstream broom helpers and reporting code work without per-model branches.
Can I use fix_data_frame() outside of broom?
Yes, but the public alternatives are friendlier. tibble::as_tibble(mat, rownames = "term") covers the matrix case in one call, and tibble::rownames_to_column(df, var = "term") covers data frames. Both are exported, both survive package updates, and both are documented in the tibble reference.