Let’s move on from iris

About iris and how to move on.
R
Scripts
Author

Garrick Aden-Buie

Published

June 9, 2020

Keywords

rstats

It’s time for iris to go. Use de_iris_my_repos() to help find references to iris in your public GitHub code so you can replace it with something better.

It only takes two lines to get started. First, check the source code on https://git.io/de-iris-my-repos. Then run these two lines in your R console:

source("https://git.io/de-iris-my-repos")
iris_issues <- de_iris_my_repos()

Follow @gadenbuie  Star de-iris-my-repos  Fork de-iris-my-repos


Last week, motivated by the Black Lives Matter movement and protests around the United States, Daniela Witten wrote a long and insightful Twitter thread about the origins of an often-used and completely boring dataset: iris.

I’ve long known about Ronald Fisher’s eugenicist past, but I admit that I have often thoughtlessly turned to iris when needing a small, boring data set to demonstrate a coding or data principle.

But Daniella and Timothée Poisot are right: it’s time to retire iris.

Other Options

I read Daniella’s thread and Timothée’s blog post and immediately realized that I needed to be more thoughtful in my choice of datasets. There is absolutely no need for iris in my examples; there are plenty of other options available.

I’m particularly excited about a new penguins dataset announced on Twitter by the amazing Allison Horst.

Here’s a short list of other data sets you can turn to instead:

  • Anything else in data().

  • ggplot2::mpg

  • ggplot2::diamonds

  • dplyr::starwars

  • nycflights13

  • fivethirtyeight

  • Any of the many #TidyTuesday datasets

De-Iris Your Repos

To help us move on into an iris-free world, I’ve created a small command-line utility to de_iris_my_repos().

The code is available on GitHub at gadenbuie/de-iris-my-repos, and it only takes two lines in your console to find any references to iris in your repositories and open an issue in each repo reminding you to kick iris out.

de_iris_my_repos() won’t do anything without your explicit consent, but you should still probably check the R script before your source it.

source("https://git.io/de-iris-my-repos")
iris_issues <- de_iris_my_repos()

When you run de_iris_my_repos() it searches your public code for mentions of iris and asks you if you want to open an issue in each repo. If you do, it opens an issue using the template in the screen shot below so that you can remember to remove iris.

An example issue opened by de_iris_my_repos().

An example issue opened by de_iris_my_repos().

Options

A few options are available in de_iris_my_repos()

  • Choose which GitHub user name to review, by default the user associated with the GitHub PAT used by gh

  • Set dry_run = TRUE to return results without doing anything

  • Set ask = FALSE to go ahead and open issues in all repositories

  • Use extensions to provide a list of file types where iris might be found.