Let's move on from iris
By Garrick Aden-Buie in Blog
June 9, 2020
It’s time for iris
to go. Use
de_iris_my_repos() to help find references to iris
in your public GitHub code so you can replace it with something better.
It only takes two lines to get started. First, check the source code on https://git.io/de-iris-my-repos. Then run these two lines in your R console:
source("https://git.io/de-iris-my-repos")
iris_issues <- de_iris_my_repos()
Follow @gadenbuie
Star de-iris-my-repos
Fork de-iris-my-repos
Last week,
motivated by the Black Lives Matter movement and protests around the United States,
Daniela Witten
wrote a long and insightful
Twitter thread
about the origins of an often-used and completely boring dataset: iris
.
Like many people, I have spent the last 10 days watching so much tragedy unfold. So much anguish from Black colleagues here on twitter.
— Daniela Witten (@daniela_witten) June 4, 2020
And so I've been trying to think of ways that *I* can improve my tiny corner of the world.
A thread on why change is hard in academia 1/
I’ve long known about Ronald Fisher’s eugenicist past,
but I admit that I have often thoughtlessly turned to iris
when needing a small, boring data set to demonstrate a coding or data principle.
But Daniella and Timothée Poisot are right: it’s time to retire iris.
Like many people, I have spent the last 10 days watching so much tragedy unfold. So much anguish from Black colleagues here on twitter.
— Dr. Daniela Witten (@daniela\_witten) June 4, 2020
And so I’ve been trying to think of ways that *I* can improve my tiny corner of the world.
A thread on why change is hard in academia 1/
Other Options #
I read Daniella’s thread and Timothée’s blog post and
immediately realized that I needed to be more thoughtful in my choice of datasets.
There is absolutely no need for iris
in my examples;
there are plenty of other options available.
I’m particularly excited about a new penguins dataset announced on Twitter by the amazing Allison Horst.
The Iris dataset feels really gross now.
— Chris Albon (@chrisalbon) June 4, 2020
Here’s a short list of other data sets you can turn to instead:
-
Anything else in
data()
. -
ggplot2::mpg
-
ggplot2::diamonds
-
dplyr::starwars
-
nycflights13
-
fivethirtyeight
-
Any of the many #TidyTuesday datasets
De-Iris Your Repos #
To help us move on into an iris
-free world,
I’ve created a small command-line utility to
de_iris_my_repos().
The code is available on GitHub at
gadenbuie/de-iris-my-repos,
and it only takes two lines in your console to find any references to iris
in your repositories and open an issue in each repo reminding you to kick iris
out.
de_iris_my_repos()
won’t do anything without your explicit consent, but you should still probably check the R script before your source it.
source("https://git.io/de-iris-my-repos")
iris_issues <- de_iris_my_repos()
When you run de_iris_my_repos()
it searches your public code for mentions of iris
and asks you if you want to open an issue in each repo.
If you do,
it opens an issue using the template in the screen shot below
so that you can remember to remove iris
.
Options #
A few options are available in de_iris_my_repos()
-
Choose which GitHub
user
name to review, by default the user associated with the GitHub PAT used by gh -
Set
dry_run = TRUE
to return results without doing anything -
Set
ask = FALSE
to go ahead and open issues in all repositories -
Use
extensions
to provide a list of file types whereiris
might be found.