readr read_file() in R: Read a Whole File Into a String
The readr read_file() function in R reads an entire file into a single character string, keeping every line break and space intact. It is the fastest way to pull a whole text file into one object you can search, parse, or print.
read_file("notes.txt") # whole file as one string
read_file_raw("notes.txt") # whole file as a raw vector
read_file("data.csv.gz") # auto-decompresses .gz/.zip/.bz2
read_file("https://site.com/x.txt") # auto-downloads from a URL
read_lines("notes.txt") # one element per line instead
write_file(txt, "out.txt") # write a string back to disk
nchar(read_file("notes.txt")) # length of file contents in charsNeed explanation? Read on for examples and pitfalls.
What read_file() does in one sentence
read_file() collapses a whole file into one string. Unlike functions that return a row per record, it hands back a length-one character vector containing every byte of the file, newlines and all. That makes it the right choice when the file is not tabular: a log you want to grep, an HTML page, a template, or free-form text you plan to parse yourself.
The example below writes a small file into the session, then reads it back. Every later block reuses these files, so run them in order.
read_file() syntax and arguments
The signature is short. read_file() takes just two arguments, and you will rarely touch the second one.
| Argument | What it controls |
|---|---|
file |
Path, connection, URL, or literal data to read. Compressed files (.gz, .bz2, .xz, .zip) are decompressed automatically. |
locale |
Controls encoding, decimal marks, and time zone. Pass locale(encoding = "latin1") for non-UTF-8 files. |
A path is the common case, but file also accepts an http:// or https:// URL, which read_file() downloads before reading. The companion read_file_raw() takes only file and returns a raw vector instead of a string.
read_file() is plain open(path).read(). There is no pandas call because the result is not a DataFrame, just text.read_file() examples by use case
Reading the file is one line. Pass the path and store the result. The return value is always a character vector of length one.
Use read_file_raw() for binary or unknown encoding. It returns a raw vector of bytes, which is safe for images, PDFs, or text whose encoding you have not confirmed.
Once you have the string, ordinary string tools take over. Because the file is now a single value, you can split it, search it, or pull pieces out with base R or stringr.
write_file() is the round trip. It writes a string straight to disk, so read_file() and write_file() form a matched pair for whole-file work.
read_file() vs read_lines() vs read_csv()
The three readr readers differ only in shape. They all open the same file; they hand it back structured differently. Picking the wrong one means extra cleanup later.
| Function | Returns | Best for |
|---|---|---|
read_file() |
One string, length 1 | Templates, logs, HTML, free text to parse yourself |
read_lines() |
Character vector, one element per line | Line-oriented files where each line is a record |
read_csv() |
A tibble (data frame) | Rectangular comma-separated data |
read_csv() or read_delim() will parse types and headers for you, saving a manual cleanup step.Common pitfalls
read_file() keeps the trailing newline. Most text files end with a final \n, and read_file() preserves it. A string comparison that ignores this will fail unexpectedly.
read_lines_chunked() instead of reading it all in one call.A second trap is encoding. If non-ASCII characters come back garbled, the file is probably not UTF-8. Pass an explicit locale, for example read_file("data.txt", locale = locale(encoding = "latin1")), or fall back to read_file_raw() and decode the bytes yourself.
Try it yourself
Try it: Read notes.txt into a string, then count how many characters it holds after the trailing newline is removed. Save the count to ex_count.
Click to reveal solution
Explanation: read_file() returns the 29-character string including the final newline. trimws() strips that newline, leaving 28 characters, and nchar() measures the result.
Related readr functions
These functions cover the file-reading jobs that read_file() does not:
read_lines()reads a file into a vector with one element per line.read_file_raw()reads a file into a raw vector of bytes.read_csv()reads comma-separated tabular data into a tibble.read_rds()restores a saved R object from an.rdsfile.write_file()writes a single string back to disk.
For the bigger picture of getting data into R, see the Importing Data in R guide. The official reference is the readr read_file() documentation.
FAQ
What is the difference between read_file() and read_lines()?
read_file() returns the entire file as one character string with newlines embedded inside it. read_lines() splits the file on newline characters and returns a character vector with one element per line. Use read_file() when you want to treat the file as a single block of text, and read_lines() when each line is a separate record you want to loop over or filter.
Can read_file() read a file from a URL?
Yes. If the file argument starts with http://, https://, ftp://, or ftps://, read_file() downloads the file first and then reads it. Remote files that are also gzip compressed are downloaded and decompressed in one step, so read_file("https://site.com/data.txt.gz") works without any manual handling.
How is read_file() different from base R readLines()?
Base readLines() returns a character vector of lines, similar to read_lines(), not a single string. read_file() has no base R equivalent that returns one string directly; the closest base approach is paste(readLines(path), collapse = "\n"). read_file() is also faster and handles compression and encoding through the locale argument.
Does read_file() work on large files?
It works, but it reads the whole file into memory at once. For files of a few megabytes that is fine. For very large logs, reading everything into one string can exhaust RAM, so prefer read_lines_chunked() or a streaming approach when files run into the gigabytes.
How do I read a file with a non-UTF-8 encoding?
Pass a locale that names the encoding. For a Latin-1 file, call read_file("data.txt", locale = locale(encoding = "latin1")). If you do not know the encoding, read the raw bytes with read_file_raw() and decode them once you have identified the character set.