Connect R to Any Database: DBI + RSQLite, RPostgres, and RMySQL
DBI is R's universal database interface — it lets you connect to SQLite, PostgreSQL, MySQL, and dozens of other databases using the same set of functions: dbConnect(), dbGetQuery(), dbWriteTable(), and dbDisconnect().
Introduction
Every real-world data project eventually outgrows CSV files. Databases store millions of rows that you can query selectively, without loading everything into memory at once. If you work with data in R, knowing how to talk to a database is a fundamental skill.
DBI (Database Interface) is the R package that provides a consistent API for talking to any SQL database. You write the same R code regardless of whether the database is SQLite, PostgreSQL, MySQL, or one of dozens of others. Driver packages like RSQLite, RPostgres, and RMySQL handle the database-specific details behind the scenes.
In this tutorial, you will learn how to connect to a database, run SQL queries, write R data frames to tables, prevent SQL injection with parameterised queries, and use familiar dplyr syntax on database tables via the dbplyr package. All interactive code blocks use RSQLite with in-memory databases, so you can run every example directly in your browser.

Figure 1: DBI sits between your R code and database-specific drivers.
How does DBI connect R to a database?
DBI is an abstraction layer. You call dbConnect() with a driver object, and DBI handles communication with the database. The driver translates your requests into the protocol that specific database understands.
Let's create an in-memory SQLite database. This is the simplest way to get started because SQLite requires no server, no username, and no password.
The con object is your connection handle. Every DBI function takes this handle as its first argument. The string ":memory:" tells SQLite to create a temporary database in RAM rather than on disk.
Let's verify the connection works by listing the tables. A fresh database has none.
The empty result confirms the connection is active and the database is empty. You will fill it with data in the next section.
For PostgreSQL, the connection call looks like this (runs in local RStudio only):
# PostgreSQL (local R only — not interactive here)
# con <- dbConnect(RPostgres::Postgres(),
# dbname = "mydb", host = "localhost",
# user = "myuser", password = "mypass")
For MySQL, the pattern is identical:
# MySQL (local R only — not interactive here)
# con <- dbConnect(RMySQL::MySQL(),
# dbname = "mydb", host = "localhost",
# user = "myuser", password = "mypass")
How do you query data with dbGetQuery()?
dbGetQuery() sends a SQL statement to the database and returns the result as an R data frame. It combines two steps — sending the query and fetching the result — into one convenient function.
First, let's load some data into our database so we have something to query. Then we will run a SELECT statement.
The query returned only the 4 rows where mpg exceeds 30. The data stayed in the database until the query filtered it — R only received the matching rows.
Now let's write a more targeted query with ORDER BY to sort the results.
Notice how the SQL does the sorting and limiting. Only 5 rows travel from the database to R. With large databases, this is far more efficient than loading everything and filtering in R.
dbExecute() instead. It returns the number of affected rows.
Figure 2: The core DBI workflow: connect, query or write, then disconnect.
How do you write data to a database with dbWriteTable()?
dbWriteTable() creates a new database table from an R data frame. It handles column types automatically, mapping R types to SQL types.
Let's write the iris dataset to our database and verify the result.
The database now has two tables: mtcars (from earlier) and iris. The column names match the R data frame exactly.
You can also append rows to an existing table. Let's add some extra observations to the iris table.
The original iris dataset has 150 rows. After appending 2 more, the count is 152. The append = TRUE argument tells DBI to add rows rather than replace the table.
overwrite = TRUE, all existing data in that table is gone with no confirmation prompt. Always check with dbExistsTable(con, "tablename") before writing.Why should you use parameterised queries instead of paste()?
When you build SQL by pasting user input directly into a query string, you create a SQL injection vulnerability. An attacker (or a careless typo) can insert malicious SQL that drops tables or leaks data.
Here is a dangerous example. Imagine a Shiny app where users type a species name.
This looks harmless with "setosa", but an attacker could type setosa'; DROP TABLE iris; -- and destroy your data. The paste() approach blindly trusts the input.
The safe approach uses parameterised queries. DBI escapes the input for you.
The ? placeholder tells DBI "a value goes here." The params argument provides the actual value. DBI escapes special characters automatically, so SQL injection is impossible.

Figure 3: Parameterised queries prevent SQL injection; paste() does not.
params argument in dbGetQuery() or dbExecute(). This is especially critical in Shiny apps and APIs where input comes from untrusted users.How do you use dplyr syntax on a database with dbplyr?
If you already know dplyr, you can query databases without writing SQL at all. The dbplyr package translates dplyr verbs — filter(), select(), group_by(), summarise() — into SQL behind the scenes.
First, create a reference to a database table using tbl(). This does not load any data into R.
The [?? x 11] tells you dbplyr does not know the row count yet. It has not fetched the data. Now let's build a query using familiar dplyr verbs.
Look at the output: dbplyr translated your dplyr pipeline into valid SQL. The data is still in the database. No rows have been transferred to R yet.
When you are ready for the results, call collect() to execute the query and bring the data into R.
Now you have a regular tibble in R. The database did the filtering, grouping, and averaging. R only received 3 rows.
How do you manage connections and clean up properly?
Every dbConnect() call opens a connection that consumes resources. Forgetting to close it can lock database files (SQLite) or exhaust connection pools (PostgreSQL, MySQL).
Always call dbDisconnect() when you are done. The best practice is to use on.exit() inside a function, which guarantees cleanup even if your code throws an error.
The on.exit(dbDisconnect(con2)) line runs when the function exits — whether it returns normally or throws an error. This prevents connection leaks.
Let's disconnect our main connection too. We are done with it.
Common Mistakes and How to Fix Them
Mistake 1: Forgetting to disconnect
Leaving connections open causes SQLite file locks and PostgreSQL "too many connections" errors.
❌ Wrong:
Why it is wrong: The connection stays open. If you run this script repeatedly, you accumulate stale connections. SQLite may refuse to let other processes access the file.
✅ Correct:
Mistake 2: Pasting user input into SQL
This creates SQL injection vulnerabilities that can destroy data.
❌ Wrong:
Why it is wrong: The pasted string becomes valid SQL that drops your table.
✅ Correct:
Mistake 3: Using overwrite=TRUE without checking
Silent data loss when you overwrite a table you meant to append to.
❌ Wrong:
Why it is wrong: The overwrite argument deletes the entire table before writing. If you intended to add rows, you just lost all historical data.
✅ Correct:
Mistake 4: Calling collect() too early on a large table
Loading millions of rows into R defeats the purpose of using a database.
❌ Wrong:
Why it is wrong: collect() transfers every row to R. If the table has 50 million rows, R tries to allocate memory for all of them.
✅ Correct:
Practice Exercises
Exercise 1: Connect and query
Create an in-memory SQLite database, write the airquality dataset to it, and query all rows where Temp is greater than 90.
Click to reveal solution
Explanation: dbConnect() opens the connection, dbWriteTable() loads the data, and dbGetQuery() filters server-side. Always disconnect when done.
Exercise 2: Explore table structure
Using a connection to an in-memory database with mtcars loaded, find out how many tables exist and list the column names of each table.
Click to reveal solution
Explanation: dbListTables() returns all table names. dbListFields() returns column names for a specific table. Together they let you explore an unfamiliar database.
Exercise 3: Round-trip write and read
Create a custom data frame with 3 columns (name, score, passed), write it to a database, read it back, and verify the result matches the original.
Click to reveal solution
Explanation: dbReadTable() reads the full table back. Notice that SQLite stores TRUE/FALSE as 1/0 integers, so the passed column changes type on the round-trip.
Exercise 4: Parameterised query
Using the airquality table, write a parameterised query that takes a month number as input and returns all rows for that month.
Click to reveal solution
Explanation: The ? placeholder is replaced by the value in params. DBI escapes the input, preventing SQL injection even if my_month came from user input.
Exercise 5: dbplyr grouped summary
Using dbplyr, compute the average Temp and total Ozone for each Month in the airquality table. Do not call collect() until the very end.
Click to reveal solution
Explanation: The tbl() reference and dplyr verbs stay lazy. Only collect() triggers the SQL execution. The database computed the averages and sums, and R received just 5 rows.
Putting It All Together
Here is a complete end-to-end workflow. You will connect to a database, load data, run a parameterised query, build a dbplyr summary, and clean up.
This workflow shows the full DBI cycle: connect, write, query with parameters, analyze with dbplyr, and disconnect. Every function you need for daily database work in R is here.
Summary
| Function | Purpose | Returns |
|---|---|---|
dbConnect() |
Open a database connection | Connection object |
dbDisconnect() |
Close a connection | TRUE (invisible) |
dbListTables() |
List all tables in the database | Character vector |
dbListFields() |
List column names of a table | Character vector |
dbGetQuery() |
Run a SELECT query | Data frame |
dbExecute() |
Run INSERT/UPDATE/DELETE | Rows affected (integer) |
dbWriteTable() |
Write a data frame to a table | TRUE (invisible) |
dbReadTable() |
Read an entire table | Data frame |
dbExistsTable() |
Check if a table exists | TRUE/FALSE |
tbl() |
Create a lazy dbplyr reference | tbl_dbi object |
collect() |
Execute dbplyr query, return tibble | Tibble |
show_query() |
Show the SQL dbplyr generates | SQL string |
Key takeaways:
- DBI provides one API for all SQL databases. Only the driver changes.
- Use
dbGetQuery()for data retrieval anddbExecute()for data modification. - Always use parameterised queries (
params = list(...)) to prevent SQL injection. - dbplyr lets you use dplyr syntax on database tables. Data stays in the database until
collect(). - Always disconnect with
dbDisconnect(). Useon.exit()for automatic cleanup.
FAQ
What is the difference between dbGetQuery() and dbSendQuery()?
dbGetQuery() sends a query and fetches all results in one step. dbSendQuery() only sends the query and returns a result set object. You then call dbFetch() to retrieve rows in chunks. Use dbGetQuery() for most tasks. Use dbSendQuery() + dbFetch() when results are too large to load at once.
Can I use DBI with Microsoft SQL Server?
Yes. Install the odbc package and use dbConnect(odbc::odbc(), ...) with an ODBC driver for SQL Server. The DBI functions work identically. You can also use the RSQLServer package for a native connection.
Does dbplyr support all dplyr verbs?
dbplyr supports most dplyr verbs including filter(), select(), mutate(), group_by(), summarise(), arrange(), left_join(), and distinct(). It focuses on SELECT statements. Verbs that modify data (like rows_insert()) have limited database support. Check show_query() to verify the SQL looks correct.
How do I store database credentials safely?
Never hard-code passwords in your scripts. Use environment variables: store credentials in an .Renviron file (DB_PASSWORD=secret) and access them with Sys.getenv("DB_PASSWORD"). For enterprise settings, use the keyring package or RStudio's connection pane, which stores credentials securely.
What is the difference between RSQLite and DBI?
DBI is the interface specification — it defines the functions. RSQLite is a driver that implements those functions for SQLite databases. You always load both: DBI provides dbConnect(), dbGetQuery(), etc., and RSQLite provides the SQLite() driver that knows how to talk to SQLite files. The same relationship applies to RPostgres (for PostgreSQL) and RMySQL (for MySQL).
References
- R Consortium — DBI: R Database Interface, official package documentation. Link
- RSQLite CRAN vignette — getting started with SQLite in R. Link
- CRAN — Advanced DBI Usage, parameterised queries and transactions. Link
- Wickham, H. & Grolemund, G. — R for Data Science, 2nd Ed. Chapter 22: Databases. Link
- dbplyr tidyverse documentation — Introduction to dbplyr. Link
- RPostgres — PostgreSQL driver for R. Link
- OWASP — SQL Injection Prevention Cheat Sheet. Link
What's Next?
- R Joins — Learn how inner_join(), left_join(), and other join types work in dplyr to combine data from multiple tables.
- Importing Data in R — Read CSV, Excel, JSON, and other file formats into R data frames.
- dplyr filter & select — Master the two most-used dplyr verbs for subsetting rows and columns.