readxl read_xls() in R: Read Legacy .xls Excel Files
The readxl read_xls() function reads a legacy binary .xls Excel file into a tibble. It targets the old pre-2007 format only, so it fails fast on a modern .xlsx workbook instead of guessing.
read_xls("data.xls") # read the first sheet
read_xls("data.xls", sheet = "Sales") # read a sheet by name
read_xls("data.xls", sheet = 2) # read a sheet by position
read_xls("data.xls", range = "A1:D20") # read a fixed cell range
read_xls("data.xls", skip = 4) # skip junk rows above header
read_xls("data.xls", col_names = FALSE) # treat row 1 as data
read_xls("data.xls", col_types = "text") # force every column to text
read_xls("data.xls", n_max = 100) # read only the first 100 rowsNeed explanation? Read on for examples and pitfalls.
What read_xls() does
The readxl read_xls() function imports one sheet of a legacy .xls workbook as a tibble. You give it a file path and, optionally, which sheet and which cells to read. It returns a tidy data frame with one column per field and detected column types.
The .xls format is the binary file Excel used before 2007. read_xls() parses it through the bundled libxls C library, so it needs no Java, no Perl, and no Excel install.
read_excel() peeks at the file, then calls read_xls() or read_xlsx() for you. Call read_xls() directly when you know the file is the old format and want a clear error if it is not.Syntax and key arguments
Most calls need only the path argument; the rest control which cells to read and how to parse them. A workbook can hold many sheets, so the arguments below tell read_xls() exactly what to pull.
The arguments you reach for most are sheet (pick the tab you want), range (read a precise block of cells), skip (jump past export junk), and col_types (stop the type guessing when you know the schema).
read_xls("data.xls") is pandas.read_excel("data.xls"). The argument names differ: pandas uses sheet_name where readxl uses sheet, and nrows where readxl uses n_max.read_xls() examples
readxl ships example .xls workbooks, so every example below runs without a file of your own. The helper readxl_example() returns the path to a bundled file.
With no sheet argument, read_xls() returns the first tab, which holds the iris data here.
A workbook usually has more than one sheet. List them with excel_sheets(), then pass a name or a position to sheet.
Real .xls exports often have title text above the table. Use range to read an exact block, or skip to jump past the junk rows so row one becomes the header.
readxl_example() to prototype before the real file arrives. The bundled datasets.xls, deaths.xls, and type-detection.xls files let you test sheet, range, and column-type code without a workbook of your own.Stop the type guessing when you know the schema. Pass a vector to col_types so every column is read exactly as you intend.
Here the cyl column is forced to text. A single value, such as "text", applies that type to every column.
read_xls() vs read_excel() vs read_xlsx()
All three functions return the same tibble; they differ in which file format they accept. Pick the specific reader when you know the format, and the dispatcher when you do not.
| Function | Reads | Behavior on the wrong format |
|---|---|---|
read_xls() |
Legacy binary .xls only |
Errors on a .xlsx file |
read_xlsx() |
Modern .xlsx only |
Errors on a .xls file |
read_excel() |
Both .xls and .xlsx |
Detects the format, then dispatches |
Use read_excel() in scripts that accept either format from users. Use read_xls() in a pipeline where every file should be the old format, so a stray .xlsx fails loudly instead of slipping through.
Common pitfalls
Passing a .xlsx file to read_xls() throws an error. The libxls engine cannot parse the newer zipped XML format, so read_xls("modern.xlsx") stops with a "file is not a valid binary" error. Switch to read_xlsx() or the format-agnostic read_excel().
.xls extension. read_xls() will reject it. Open the file in a text editor to check, then use read_html() or read_tsv() instead.Date columns can arrive as numbers. Excel stores dates as serial numbers, and if a date column also contains text, read_xls() may guess numeric. Set col_types = "date" for that column to force correct parsing.
Try it yourself
Try it: Read only the mtcars sheet from the bundled datasets.xls, keeping just the first 5 rows. Save the result to ex_cars.
Click to reveal solution
Explanation: sheet = "mtcars" selects the tab by name, and n_max = 5 caps the read at five data rows. The header row does not count toward n_max.
Related readxl and readr functions
read_xls() is one reader in a family of import functions. These cover the formats you meet alongside legacy Excel files:
read_xlsx()reads modern.xlsxworkbooks.read_excel()auto-detects.xlsor.xlsxand dispatches.excel_sheets()lists sheet names before you read.read_csv()reads comma-separated text files.read_sav()from haven reads SPSS data files.
See the official readxl reference for the full argument list.
FAQ
What is the difference between read_xls() and read_excel()?
read_xls() reads only the legacy binary .xls format and errors on anything else. read_excel() inspects the file first, then calls read_xls() or read_xlsx() for you. Use read_excel() when a script must accept either format, and read_xls() when every file should be the old format so a wrong one fails loudly.
Can read_xls() read .xlsx files?
No. read_xls() uses the libxls engine, which only understands the pre-2007 binary format. Calling it on a .xlsx file raises a "not a valid binary" error. Use read_xlsx() for modern workbooks, or read_excel() if you want automatic detection of either format.
How do I read a specific sheet with read_xls()?
Pass the sheet argument either a name or a position. read_xls("data.xls", sheet = "Sales") reads the tab called Sales, and read_xls("data.xls", sheet = 2) reads the second tab. Call excel_sheets("data.xls") first to see every sheet name in the workbook.
Why does read_xls() return dates as numbers?
Excel stores dates as serial numbers counted from 1899 or 1904. When a column mixes dates with text, the type guesser may settle on numeric and leave the serials raw. Fix it by setting col_types = "date" for that column so read_xls() converts the serials to proper date-times.