Reproducible Data Workflows With Drake

July 19, 2019

A gentle introduction to reproducible data workflows with the {drake} package.
Date

July 19, 2019

Time

12:00 AM

Location

Tampa, FL

Event

drake is an R package that provides a powerful, flexible workflow management tool for reproducible data analysis pipelines. drake alleviates the pain of managing large (and even small) data analyses, speeding up iteration and development while providing reproducibility guarantees that are essential for modern research.

https://ropensci.github.io/drake/

In this session, we’ll learn how to use drake to manage a data analysis workflow by writing functions that define the steps of the analysis. We’ll then learn how drake can keep track of all of these steps, from start to finish, and intelligently update only the outdated steps when your data or code change.

Meeting prerequisites #

We’ll work through a few examples together, so please bring a laptop with the drake and visNetwork packages installed. (If you don’t have a laptop you can share with someone who does at the session.) You would also benefit from installing the tidyverse package for the session. See the full requirements here.

required_packages <- c(
  # "tidyverse",  #<< For data processing, etc. (you probably have this)
  "here",         #<< For sane path management
  "cowplot",      #<< For composing ggplot2 plots
  "visNetwork",   #<< For visualizing drake plans
  "drake"         #<< Because drake
)

install.packages(required_packages)

Note: if you’ve used drake before, please ensure that you have version 7.0.0 or later installed.

Meeting materials #

The slides from this talk are available online at https://pkg.garrickadenbuie.com/drake-intro/ and the drake source code and RStudio project are in available on GitHub at https://github.com/gadenbuie/drake-intro. There is also an RStudio Cloud project containing the drake project with all of the required dependencies pre-installed that you can use to explore and run the code from the talk.

Posted on:
July 19, 2019
Length:
2 minute read, 269 words
Categories:
Education
Tags:
R drake Reproducible Research Workflow Tutorials Education
See Also:
branchMover: A Shiny app for moving the default branch of your GitHub repos
Pull Request Flow with usethis
Signed and verified: signed git commits with Keybase and RStudio