Open Science with R: OSF Integration, Preprints & Sharing Code
Open science means making your research data, code, and findings freely accessible. R's ecosystem — osfr, renv, targets, and R Markdown — makes it easier than ever to do research that anyone can inspect, reproduce, and build upon.
The traditional model of science — collect data privately, analyze in secret, publish only the results — is being replaced by a transparent approach where every step is open. This isn't just idealism: open science produces more reliable results, gets more citations, and is increasingly required by funders and journals.
The FAIR Principles
FAIR data is Findable, Accessible, Interoperable, and Reusable.
Principle
Meaning
Implementation
Findable
Others can discover your work
Assign DOIs, use metadata, register on OSF
Accessible
Anyone can access it
Open repositories, clear licenses
Interoperable
Works with other tools/data
Standard formats (CSV, JSON), ontologies
Reusable
Others can actually use it
Documentation, licenses, provenance
# Check: Is your project FAIR?
fair_checklist <- data.frame(
Principle = c(rep("Findable", 3), rep("Accessible", 3),
rep("Interoperable", 3), rep("Reusable", 3)),
Criterion = c(
"Has a persistent identifier (DOI)?",
"Rich metadata describing the data?",
"Registered in a searchable repository?",
"Data retrievable by identifier?",
"Open access (or clear access procedure)?",
"Metadata accessible even if data isn't?",
"Uses standard file formats (CSV, JSON)?",
"Uses standard vocabularies/ontologies?",
"References other related datasets?",
"Clear and accessible usage license?",
"Provenance information included?",
"Meets community standards?"
)
)
cat("=== FAIR Data Checklist ===\n")
for (p in unique(fair_checklist$Principle)) {
cat(sprintf("\n%s:\n", p))
sub <- fair_checklist[fair_checklist$Principle == p, ]
for (i in 1:nrow(sub)) {
cat(sprintf(" [ ] %s\n", sub$Criterion[i]))
}
}
OSF: The Open Science Framework
OSF (osf.io) is the central hub for open science. It stores your data, code, pre-registrations, and preprints — all in one place with version control and DOIs.
Using osfr in R
# osfr package workflow
cat("=== osfr Package ===\n\n")
cat("Installation:\n")
cat(' install.packages("osfr")\n')
cat(' library(osfr)\n\n')
cat("Authentication:\n")
cat(' # Create a token at osf.io/settings/tokens\n')
cat(' # Set in .Renviron: OSF_PAT=your_token_here\n\n')
cat("Create a project:\n")
cat(' project <- osf_create_project("My R Study 2026")\n\n')
cat("Upload files:\n")
cat(' osf_upload(project, "analysis.R")\n')
cat(' osf_upload(project, "data/clean_data.csv")\n')
cat(' osf_upload(project, "manuscript.pdf")\n\n')
cat("Create components (sub-projects):\n")
cat(' data_comp <- osf_create_component(project, "Data")\n')
cat(' code_comp <- osf_create_component(project, "Code")\n')
cat(' osf_upload(data_comp, "data.csv")\n')
cat(' osf_upload(code_comp, "analysis.R")\n\n')
cat("Download from OSF:\n")
cat(' osf_retrieve_node("abc12") |> osf_ls_files() |> osf_download()\n')
Recommended OSF Project Structure
cat("=== Recommended OSF Project Structure ===\n\n")
structure <- data.frame(
Component = c("Pre-Registration", "Data", "Code", "Materials",
"Results", "Manuscript"),
Contents = c(
"Pre-analysis plan, power analysis, registered hypotheses",
"Raw data (if shareable), processed data, codebook",
"R scripts, renv.lock, _targets.R pipeline",
"Survey instruments, stimuli, protocols",
"Figures, tables, supplementary analyses",
"Paper draft, preprint, final published version"
),
Access = c("Public", "Public or Restricted", "Public", "Public",
"Public", "Public after acceptance")
)
for (i in 1:nrow(structure)) {
cat(sprintf("/%s/\n Contents: %s\n Access: %s\n\n",
structure$Component[i], structure$Contents[i], structure$Access[i]))
}
Zenodo: Permanent Archiving with DOIs
Zenodo (zenodo.org) is a CERN-hosted repository that assigns DOIs to any research output — datasets, code, papers, presentations.
cat("=== Zenodo Integration ===\n\n")
cat("Why Zenodo + OSF?\n")
cat(" OSF: Collaboration, project management, pre-registration\n")
cat(" Zenodo: Permanent archival, DOIs, citation tracking\n\n")
cat("GitHub → Zenodo (automatic archiving):\n")
cat(" 1. Connect your GitHub account at zenodo.org\n")
cat(" 2. Enable Zenodo for your repository\n")
cat(" 3. Create a GitHub release (e.g., v1.0.0)\n")
cat(" 4. Zenodo automatically archives it with a DOI\n\n")
cat("Manual upload:\n")
cat(" 1. Go to zenodo.org/deposit/new\n")
cat(" 2. Upload your files (data, code, paper)\n")
cat(" 3. Fill in metadata (title, authors, license, keywords)\n")
cat(" 4. Publish → get a DOI\n\n")
cat("Citing your DOI:\n")
cat(' "Data and code available at https://doi.org/10.5281/zenodo.XXXXXXX"\n')
Sharing Code Responsibly
The Research Compendium
A research compendium is a standardized R project structure that contains everything needed to reproduce your work.
cat("=== Research Compendium Structure ===\n\n")
cat("my-study/\n")
cat(" ├── DESCRIPTION # Project metadata (like an R package)\n")
cat(" ├── README.md # How to reproduce the analysis\n")
cat(" ├── LICENSE # CC-BY-4.0 or MIT\n")
cat(" ├── renv.lock # Locked package versions\n")
cat(" ├── _targets.R # Analysis pipeline\n")
cat(" ├── R/ # Analysis functions\n")
cat(" │ ├── clean.R\n")
cat(" │ ├── model.R\n")
cat(" │ └── visualize.R\n")
cat(" ├── data/\n")
cat(" │ ├── raw/ # Unmodified data (read-only)\n")
cat(" │ └── processed/ # Cleaned data\n")
cat(" ├── analysis/\n")
cat(" │ └── main.Rmd # Main analysis notebook\n")
cat(" ├── output/\n")
cat(" │ ├── figures/\n")
cat(" │ └── tables/\n")
cat(" └── paper/\n")
cat(" └── manuscript.Rmd # Paper in R Markdown\n")
Choosing a License
License
Allows
Best For
CC-BY 4.0
Any use with attribution
Data, text, figures
CC0 (Public Domain)
Any use, no restrictions
Maximizing reuse
MIT
Any use with license notice
Code
GPL-3.0
Use if derivatives are also open
Code (copyleft)
CC-BY-NC 4.0
Non-commercial use with attribution
Restricted sharing
cat("=== License Selection Guide ===\n\n")
cat("For maximum impact:\n")
cat(" Code: MIT license (permissive, widely understood)\n")
cat(" Data: CC0 or CC-BY-4.0 (maximizes reuse)\n")
cat(" Paper: CC-BY-4.0 (allows sharing with attribution)\n\n")
cat("How to add a license:\n")
cat(" 1. Create a LICENSE file in your project root\n")
cat(" 2. State the license in your README\n")
cat(" 3. Add license metadata to Zenodo/OSF deposits\n")
cat(" 4. Include in DESCRIPTION file: License: MIT + file LICENSE\n")
Preprints
Preprints are papers shared before peer review. They accelerate science by making findings available immediately.
cat("=== Preprint Servers ===\n\n")
servers <- data.frame(
Server = c("OSF Preprints", "arXiv", "bioRxiv", "medRxiv",
"PsyArXiv", "SocArXiv", "EarthArXiv"),
Fields = c("All disciplines", "Math, CS, Physics, Stats",
"Biology", "Medicine & health",
"Psychology", "Social sciences", "Earth sciences"),
URL = c("osf.io/preprints", "arxiv.org", "biorxiv.org",
"medrxiv.org", "psyarxiv.com", "socarxiv.org", "eartharxiv.org")
)
for (i in 1:nrow(servers)) {
cat(sprintf(" %-15s %-30s %s\n",
servers$Server[i], servers$Fields[i], servers$URL[i]))
}
cat("\nPreprint benefits:\n")
cat(" - Immediate visibility (no 6-12 month review wait)\n")
cat(" - Gets a DOI (citable immediately)\n")
cat(" - Feedback before formal review\n")
cat(" - Establishes priority\n")
cat(" - Still compatible with journal submission\n")
Complete Open Science Workflow
cat("=== End-to-End Open Science Workflow ===\n\n")
steps <- data.frame(
Phase = c("Design", "Design", "Design",
"Collect", "Collect",
"Analyze", "Analyze", "Analyze",
"Share", "Share", "Share", "Share"),
Step = c(
"1. Write pre-analysis plan",
"2. Register on OSF/AsPredicted",
"3. Run power analysis",
"4. Collect data per protocol",
"5. Store raw data (read-only)",
"6. Clean with documented code",
"7. Run pre-registered analysis",
"8. Run exploratory analyses (labeled)",
"9. Post preprint",
"10. Upload data + code to OSF/Zenodo",
"11. Get DOIs for everything",
"12. Submit to journal with all links"
),
Tool = c(
"R Markdown, DeclareDesign",
"osf.io, aspredicted.org",
"power.t.test(), DeclareDesign",
"Qualtrics, REDCap, etc.",
"data/raw/ directory, Git",
"R scripts, targets pipeline",
"Pre-specified R code",
"Additional R scripts",
"OSF Preprints, bioRxiv",
"osfr, zenodo.org",
"Zenodo, OSF",
"Journal submission system"
)
)
for (phase in unique(steps$Phase)) {
cat(sprintf("--- %s Phase ---\n", phase))
sub <- steps[steps$Phase == phase, ]
for (i in 1:nrow(sub)) {
cat(sprintf(" %-45s [%s]\n", sub$Step[i], sub$Tool[i]))
}
cat("\n")
}
Reproducibility Toolkit Summary
Tool
Purpose
Install
renv
Lock package versions
install.packages("renv")
targets
Pipeline automation
install.packages("targets")
osfr
OSF integration
install.packages("osfr")
rmarkdown
Literate programming
install.packages("rmarkdown")
here
Relative file paths
install.packages("here")
Git
Version control
git-scm.com
Docker
Environment capture
docker.com
FAQ
Do I have to share all my data? No. Some data can't be shared due to privacy, consent, or legal restrictions. In those cases, share synthetic data, aggregated data, or detailed metadata. Always share as much as ethically possible, and explain any restrictions clearly.
Will sharing my code before publication let others scoop me? This is a common fear but rarely happens in practice. Pre-registration with a timestamp actually protects your priority. And the Open Science community norms strongly discourage scooping. The benefits of openness (more citations, collaborations, trust) far outweigh the risk.
How much extra work is open science? Less than you think. If you're already writing R scripts, using Git, and documenting your analysis, you're 80% there. The main additional steps are: uploading to OSF/Zenodo (10 minutes), choosing a license (5 minutes), and writing a good README (30 minutes). The upfront investment pays off in easier collaboration and fewer "but I can't reproduce it" headaches.
What's Next
Data Ethics in R — The ethical foundations underlying open science