Read and Visualize your Twitter Archive

By Garrick Aden-Buie in Blog

December 7, 2022

Posted on:
December 7, 2022
Length:
50 minute read, 10614 words
Categories:
Blog
Tags:
R Twitter Personal Data
Source:
content/blog/2022/tweet-archive-in-r/index.Rmarkdown
See Also:
countdown v0.4.0 – Now on CRAN!
Process Profile Pictures with magick
xaringanExtra v0.6.0 — Now on CRAN!

Twitter finds itself in an… interesting… transition period. Whether or not you’re considering jumping ship to another service — you can find me lurking on Mastdon — you should download an archive of your Twitter data. Not only does the archive include all of your tweets, it also contains a variety of other interesting data about your account: who you followed and who followed you; the tweets you liked; the ads you were served; and much more.

This post, very much inspired by the awesome Observable notebook, Planning to leave Twitter?, shows you how to use R to read and explore your archive, using my own archive as an example.

Read on to learn how to read your Twitter archive into R, or how to tidy your tweets. The second half of the post showcases a collection of plots about monthy tweet volume, popular tweets, the time of day when tweets were sent, and the app used to send the tweet.

I’ve also included a section on using rtweet to collect a full dataset about the tweets you’ve liked and another section about the advertising data in your Twitter archive.

Reading your Twitter archive #

Get your Twitter data archive #

First things first, you need to have your Twitter data archive. If you don’t have it yet, go to Settings and Privacy and click Download an archive of your data. After you submit the request, it takes about a day or so for an email to show up in your inbox.

@grrrck your Twitter data is ready

Your Twitter archive is ready for you to download and view using your desktop browser. Make sure you download it before Nov 12, 2022, 9:46:31 PM

The archive downloads as a zip file containing a standalone web page — called Your archive.html — for exploring your data. But the real archive lives in the included data/ folder as a bunch of .js files. I’ve copied that data/ directory into my working directory for this post.

Setup #

On the R side, we’ll need the usual suspects: tidyverse and glue.

library(tidyverse)
#> ── Attaching core tidyverse packages ───────────────────────
#>  dplyr     1.0.10             readr     2.1.3        
#>  forcats   0.5.2              stringr   1.4.1        
#>  ggplot2   3.4.0              tibble    3.1.8        
#>  lubridate 1.9.0              tidyr     1.2.1        
#>  purrr     0.9000.0.9000     
#> ── Conflicts ────────────────────── tidyverse_conflicts() ──
#>  dplyr::filter() masks stats::filter()
#>  dplyr::lag()    masks stats::lag()
#>  Use the conflicted package to force all conflicts to become errors
library(glue)

(I’m using the dev version of tidyverse (1.3.2.9000), which loads lubridate automatically, and the dev version of purrr that is slated to become version 1.0.0.)

To read in the data files, I’ll use jsonlite to read the archive JSON data, with a small assist from brio for fast file reading. I’m also going to have some fun with ggiraph for turning static ggplot2 plots into interactive plots.

Finally, the Twitter archive doesn’t require API access to Twitter, but you can use it to augment the data in the archive. The rtweet package is excellent for this, even though it takes a little effort to get it set up.

Read the manifest #

The data/ folder is surprisingly well structured! There are two key files to help you find your way around the archive. First, the README.txt file explains the structure and layout of the files, and includes descriptions of the data contained in all of the files.

Here’s how the README describes the account.js data file:

account.js
- email: Email address currently associated with the account if an email address has been provided.
- createdVia: Client application used when the account was created. For example: “web” if the  account was created from a browser.
- username: The account’s current @username. Note that the @username may change but the account ID will remain the same for the lifetime of the account.
- accountId: Unique identifier for the account.
- createdAt: Date and time when the account was created.
- accountDisplayName: The account’s name as displayed on the profile.

The data/ folder also contains a manifest.js file that can be used to help read the data included in the archive. Let’s start by assuming this file is JSON and reading it in.

jsonlite::fromJSON("data/manifest.js")
#> Error in parse_con(txt, bigint_as_char): lexical error: invalid char in json text.
#>                                        window.__THAR_CONFIG = {   "use
#>                      (right here) ------^

Here we hit our first snag. The archive files are packaged as JSON, but they’re not strictly compliant JSON files; they include some JavaScript to assign JSON objects to the global namespace (called window in the browser). Here’s the data/manifest.js file as an example.

window.__THAR_CONFIG = {
  // ... data ...
}

If we just remove everything up to the first the { or sometimes [ on the first line, we can turn the data into valid JSON.

lines[1] <- sub("^[^{[]+([{[])", "\\1", lines[1])
manifest <- jsonlite::fromJSON(lines)

This worked, but… jsonlite was designed for statistical work, so it transforms the data structure when reading in the JSON. For example, by default it converts arrays that look like JSON-ified data frames into actual data.frames.

manifest$dataTypes[1:2] |> str()
#> List of 2
#>  $ account          :List of 1
#>   ..$ files:'data.frame':    1 obs. of  3 variables:
#>   .. ..$ fileName  : chr "data/account.js"
#>   .. ..$ globalName: chr "YTD.account.part0"
#>   .. ..$ count     : chr "1"
#>  $ accountCreationIp:List of 1
#>   ..$ files:'data.frame':    1 obs. of  3 variables:
#>   .. ..$ fileName  : chr "data/account-creation-ip.js"
#>   .. ..$ globalName: chr "YTD.account_creation_ip.part0"
#>   .. ..$ count     : chr "1"

That’s often quite helpful! But I find it’s safer, when trying to generalize data reading, to disable the simplification and know for certain that the data strcutre matches the original JSON. For that reason, I tend to disable the matrix and data.frame simplifications and only allow jsonlite to simplify vectors.

Here’s a quick helper function that includes those setting changes and the first line substitution needed to read the archive JSON files.

read_archive_json <- function(path) {
  lines <- brio::read_lines(path)
  lines[1] <- sub("^[^{[]+([{[])", "\\1", lines[1])
  
  jsonlite::fromJSON(
    txt = lines, 
    simplifyVector = TRUE, 
    simplifyDataFrame = FALSE,
    simplifyMatrix = FALSE
  )
}

Now we’re ready to read the manifest again.

manifest <- read_archive_json("data/manifest.js")
names(manifest)
#> [1] "userInfo"    "archiveInfo" "readmeInfo"  "dataTypes"

The manifest file contains some information about the user and the archive,

str(manifest$userInfo)
#> List of 3
#>  $ accountId  : chr "47332433"
#>  $ userName   : chr "grrrck"
#>  $ displayName: chr "garrick aden-buie"

plus details about all of the various data included in the archive, like the data about my account.

str(manifest$dataTypes$account)
#> List of 1
#>  $ files:List of 1
#>   ..$ :List of 3
#>   .. ..$ fileName  : chr "data/account.js"
#>   .. ..$ globalName: chr "YTD.account.part0"
#>   .. ..$ count     : chr "1"

Each dataType in the manifest points us to a file (or files) in the archive and helpfully tells us how many records are included.

Here are the data files with the most records.

Code: Manifest, Top Records
manifest$dataTypes |>
  # All data types we can read have a "files" item
  keep(~ "files" %in% names(.x)) |>
  # We keep the files objects but still as a list of lists within a list
  map("files") |>
  # Turn the files into tibbles (list of tibbles within a list)
  map_depth(2, as_tibble) |>
  # Then combine the files tables for each item keeping track of the file index
  map(list_rbind, names_to = "index") |>
  # And finally combine files for all items
  list_rbind(names_to = "item") |>
  mutate(across(count, as.integer)) |>
  select(-globalName, -index) |>
  slice_max(count, n = 15) |>
  knitr::kable(
    format.args = list(big.mark = ","),
    table.attr = 'class="table"',
    format = "html"
  )
item fileName count
like data/like.js 11,773
follower data/follower.js 9,030
tweetHeaders data/tweet-headers.js 6,225
tweets data/tweets.js 6,225
ipAudit data/ip-audit.js 3,787
following data/following.js 1,519
contact data/contact.js 645
listsMember data/lists-member.js 254
block data/block.js 242
adImpressions data/ad-impressions.js 173
adEngagements data/ad-engagements.js 171
directMessageHeaders data/direct-message-headers.js 97
directMessages data/direct-messages.js 97
userLinkClicks data/user-link-clicks.js 67
connectedApplication data/connected-application.js 63

Reading the account data file #

For a first example, let’s read the data/account.js archive file. We start by inspecting the manifest, where manifest$dataTypes$account tells us which files hold the account data and how many records are in each.

manifest$dataTypes$account |> str()
#> List of 1
#>  $ files:List of 1
#>   ..$ :List of 3
#>   .. ..$ fileName  : chr "data/account.js"
#>   .. ..$ globalName: chr "YTD.account.part0"
#>   .. ..$ count     : chr "1"

Here there’s only one file containing a single account record: data/account.js. Inside that file is a small bit of JavaScript. Like the manifest, it’s almost JSON, except that it assigns the JavaScript object to window.YTD.account.part0.

window.YTD.account.part0 = [
  {
    "account" : {
      "email" : "my-email@example.com",
      "createdVia" : "web",
      "username" : "grrrck",
      "accountId" : "47332433",
      "createdAt" : "2009-06-15T13:21:50.000Z",
      "accountDisplayName" : "garrick aden-buie"
    }
  }
]

And again, if we clean up the first line, this is valid JSON that we can read in directly with jsonlite.

account <- read_archive_json("data/account.js")
str(account)
#> List of 1
#>  $ :List of 1
#>   ..$ account:List of 6
#>   .. ..$ email             : chr "my-email@example.com"
#>   .. ..$ createdVia        : chr "web"
#>   .. ..$ username          : chr "grrrck"
#>   .. ..$ accountId         : chr "47332433"
#>   .. ..$ createdAt         : chr "2009-06-15T13:21:50.000Z"
#>   .. ..$ accountDisplayName: chr "garrick aden-buie"

This leads us to our first fun fact: I created my Twitter account on June 15, 2009, which means that I’ve been using Twitter (on and off) for 13.5 years. That’s 4,923 days of twittering!

Read any archive item #

Let’s generalize what we learned into a few helper functions we can reuse. I’ve placed everything into a single code block so that you can copy and paste it into your R session or script to use it right away.

#' Read the Twitter Archive JSON 
#' 
#' @param path Path to a Twitter archve `.js` file
read_archive_json <- function(path) {
  lines <- brio::read_lines(path)
  lines[1] <- sub("^[^{[]+([{[])", "\\1", lines[1])
  
  jsonlite::fromJSON(
    txt = lines, 
    simplifyVector = TRUE, 
    simplifyDataFrame = FALSE,
    simplifyMatrix = FALSE
  )
}

#' Read an twitter archive data item
#' 
#' @param manifest The list from `manifest.js`
#' @param item The name of an item in the manifest
read_twitter_data <- function(manifest, item) {
  manifest$dataTypes[[item]]$files |> 
    purrr::transpose() |>
    purrr::pmap(\(fileName, ...) read_archive_json(fileName))
}

#' Simplify the data, if possible and easy
#' 
#' @param x A list of lists as returned from `read_twitter_data()`
#' @param simplifier A function that's applied to each item in the
#'   list of lists and that can be used to simplify the output data.
simplify_twitter_data <- function(x, simplifier = identity) {
   x <- purrr::flatten(x)
   item_names <- x |> purrr::map(names) |> purrr::reduce(union)
   if (length(item_names) > 1) return(x)
   
   x |>
    purrr::map(item_names) |>
    purrr::map_dfr(simplifier)
}

Quick recap: to use the functions above, load your archive manifest with read_archive_json() and then pass it to read_twitter_data() along with an item name from the archive. If the data in the archive item is reasonably structured, you can call simplify_twitter_data() to get a tidy tibble1.

manifest <- read_archive_json("data/manifest.js")
account <- read_twitter_data(manifest, "account")

simplify_twitter_data(account)
#> # A tibble: 1 × 6
#>   email              creat…¹ usern…² accou…³ creat…⁴ accou…⁵
#>   <chr>              <chr>   <chr>   <chr>   <chr>   <chr>  
#> 1 my-email@example.… web     grrrck  473324… 2009-0… garric…
#> # … with abbreviated variable names ¹​createdVia, ²​username,
#> #   ³​accountId, ⁴​createdAt, ⁵​accountDisplayName

Example: my followers #

Let’s use this on another archive item to find the earliest Twitter adopters among my followers.

# These tables are wide, you may need to scroll to see the preview
options(width = 120)

followers <- 
  read_twitter_data(manifest, "follower") |>
  simplify_twitter_data()

Then we can arrange the rows of followers by accountId as a proxy for date of account creation.

early_followers <- 
  followers |>
  arrange(as.numeric(accountId)) |>
  slice_head(n = 11)

# Top 11 earliest followers
early_followers
#> # A tibble: 11 × 2
#>    accountId userLink                                      
#>    <chr>     <chr>                                         
#>  1 1496      https://twitter.com/intent/user?user_id=1496  
#>  2 11309     https://twitter.com/intent/user?user_id=11309 
#>  3 37193     https://twitter.com/intent/user?user_id=37193 
#>  4 716213    https://twitter.com/intent/user?user_id=716213
#>  5 741803    https://twitter.com/intent/user?user_id=741803
#>  6 755726    https://twitter.com/intent/user?user_id=755726
#>  7 774234    https://twitter.com/intent/user?user_id=774234
#>  8 787219    https://twitter.com/intent/user?user_id=787219
#>  9 799574    https://twitter.com/intent/user?user_id=799574
#> 10 860921    https://twitter.com/intent/user?user_id=860921
#> 11 944231    https://twitter.com/intent/user?user_id=944231

As you can see, some parts of the Twitter archive include the barest minimum amount of data. Thankfully, we can still use rtweet to gather additional data about these users. I’m looking at a small subset of my 9,030 followers here, but you might want to do this for all your followers and save the collected user data in your archive.

early_followers_accounts <- 
  early_followers |>
  pull(accountId) |>
  rtweet::lookup_users()

early_followers_accounts |>
  select(id, name, screen_name, created_at, followers_count, description)
#> # A tibble: 11 × 6
#>        id name                                    screen_name    created_at          followers_count description        
#>     <int> <chr>                                   <chr>          <dttm>                        <int> <chr>              
#>  1   1496 Aelfrick                                Aelfrick       2006-07-16 14:44:05              25 ""                 
#>  2  11309 Aaron Khoo                              aklw           2006-11-02 07:14:47             244 "I am a weapon of …
#>  3  37193 Rob                                     coleman        2006-12-02 11:54:15             645 "data science / la…
#>  4 716213 Tim Dennis                              jt14den        2007-01-27 16:54:12             976 "Data librarian/di…
#>  5 741803 @AlgoCompSynth@ravenation.club by znmeb znmeb          2007-02-01 00:03:16            9923 "https://t.co/rZhZ…
#>  6 755726 Travis Dawry                            tdawry         2007-02-06 23:45:01             273 "data, politics, o…
#>  7 774234 Shea's Coach Beard                      mandoescamilla 2007-02-15 13:51:10            1185 "my anger is a gif…
#>  8 787219 Jonathan                                jmcphers       2007-02-21 15:56:20             595 "Software engineer…
#>  9 799574 @dietrich@mastodon.social               dietrich       2007-02-27 18:41:20            6086 "A lifestyle brand…
#> 10 860921 ⌜will⌟                                  wtd            2007-03-09 23:20:43             720 "👋 I'm an optimis…
#> 11 944231 Christopher Peters 🇺🇦                   statwonk       2007-03-11 14:49:39            4472 "Lead Econometrici…

My tweets #

Now we get to the main course: the tweets themselves. We can read them in the same way that we imported accounts and followers with read_twitter_data(), but for now we won’t simplify them.

To see why, let’s take a look at a single tweet. The file of tweets (outer list, [[1]]) contains an array (inner list, e.g. [[105]]) of tweets (named item, $tweet). Here’s that example tweet:

# Tweets are a list of a list of tweets...
tweet <- read_twitter_data(manifest, "tweets")[[1]][[105]]$tweet
str(tweet, max.level = 2)
#> List of 16
#>  $ edit_info         :List of 1
#>   ..$ initial:List of 4
#>  $ retweeted         : logi FALSE
#>  $ source            : chr "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>"
#>  $ entities          :List of 5
#>   ..$ user_mentions:List of 1
#>   ..$ urls         :List of 1
#>   ..$ symbols      : list()
#>   ..$ media        :List of 1
#>   ..$ hashtags     :List of 1
#>  $ display_text_range: chr [1:2] "0" "236"
#>  $ favorite_count    : chr "118"
#>  $ id_str            : chr "1276198597596459018"
#>  $ truncated         : logi FALSE
#>  $ retweet_count     : chr "33"
#>  $ id                : chr "1276198597596459018"
#>  $ possibly_sensitive: logi FALSE
#>  $ created_at        : chr "Thu Jun 25 17:00:30 +0000 2020"
#>  $ favorited         : logi FALSE
#>  $ full_text         : chr "Thanks to prodding from @dsquintana, I added `include_tweet()` to {tweetrmd}. Automatically embed the HTML twee"| __truncated__
#>  $ lang              : chr "en"
#>  $ extended_entities :List of 1
#>   ..$ media:List of 1

There’s quite a bit of data in each tweet, so we’ll pause here and figure out how we want to transform the nested list into a flat last that will rectangle nicely.

tidy_tweet_raw <- function(tweet_raw) {
  basic_items <- c(
    "created_at",
    "favorite_count",
    "retweet_count",
    "full_text",
    "id",
    "lang",
    "source"
  )
  
  # start with a few basic items
  tweet <- tweet_raw[basic_items]

  # and collapse a few nested items into a single string
  tweet$user_mentions <- tweet_raw |> 
    purrr::pluck("entities", "user_mentions") |>
    purrr::map_chr("screen_name") |>
    paste(collapse = ",")
  
  tweet$hashtags <- tweet_raw |> 
    purrr::pluck("entities", "hashtags") |>
    purrr::map_chr("text") |>
    paste(collapse = ",")
  
  tweet
}

When we apply this function to the example tweet, we get a nice, flat list.

tidy_tweet_raw(tweet) |> str()
#> List of 9
#>  $ created_at    : chr "Thu Jun 25 17:00:30 +0000 2020"
#>  $ favorite_count: chr "118"
#>  $ retweet_count : chr "33"
#>  $ full_text     : chr "Thanks to prodding from @dsquintana, I added `include_tweet()` to {tweetrmd}. Automatically embed the HTML twee"| __truncated__
#>  $ id            : chr "1276198597596459018"
#>  $ lang          : chr "en"
#>  $ source        : chr "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>"
#>  $ user_mentions : chr "dsquintana"
#>  $ hashtags      : chr "rstats"

This flattened tweet list will end up becoming a row in a tidy table of tweets thanks to simplify_twitter_data(), which is used to flatten the list of all of the tweets into a tibble. Once combined into a single table, we use our good friends dplyr, lubridate and stringr to convert columns to their correct format and to extract a few features.

tidy_tweets <- 
  read_twitter_data(manifest, "tweets") |> 
  simplify_twitter_data(tidy_tweet_raw) |> 
  mutate(
    across(contains("_count"), as.integer),
    retweet = str_detect(full_text, "^RT @"),
    reply = str_detect(full_text, "^@"),
    type = case_when(
      retweet ~ "retweet",
      reply ~ "reply",
      TRUE ~ "tweet"
    ),
    created_at = strptime(created_at, "%a %b %d %T %z %Y"),
    hour = hour(created_at),
    day = wday(created_at, label = TRUE, abbr = TRUE, week_start = 1),
    month = month(created_at, label = TRUE, abbr = FALSE),
    day_of_month = day(created_at),
    year = year(created_at)
  )

The result… a nice tidy table of tweets!

tidy_tweets
#> # A tibble: 6,223 × 17
#>    created_at          favori…¹ retwe…² full_…³ id    lang  source user_…⁴ hasht…⁵ retweet reply type   hour day   month
#>    <dttm>                 <int>   <int> <chr>   <chr> <chr> <chr>  <chr>   <chr>   <lgl>   <lgl> <chr> <int> <ord> <ord>
#>  1 2022-11-05 10:02:17        0       0 "RT @g… 1588… en    "<a h… "georg… ""      TRUE    FALSE retw…    10 Sat   Nove…
#>  2 2022-11-04 19:42:01        4       0 "@JonT… 1588… en    "<a h… "JonTh… ""      FALSE   TRUE  reply    19 Fri   Nove…
#>  3 2022-11-04 15:21:23        1       0 "@tjma… 1588… en    "<a h… "tjmah… ""      FALSE   TRUE  reply    15 Fri   Nove…
#>  4 2022-11-03 12:39:09        1       0 "@trav… 1588… en    "<a h… "trave… ""      FALSE   TRUE  reply    12 Thu   Nove…
#>  5 2022-11-03 06:45:53        5       0 "@mcca… 1588… en    "<a h… "mccar… ""      FALSE   TRUE  reply     6 Thu   Nove…
#>  6 2022-11-03 06:36:56        2       0 "@trav… 1588… en    "<a h… "trave… ""      FALSE   TRUE  reply     6 Thu   Nove…
#>  7 2022-11-02 12:26:46        0       0 "RT @p… 1587… en    "<a h… "posit… ""      TRUE    FALSE retw…    12 Wed   Nove…
#>  8 2022-11-02 12:20:50        4       0 "And I… 1587… en    "<a h… ""      ""      FALSE   FALSE tweet    12 Wed   Nove…
#>  9 2022-10-31 11:47:57        0       0 "RT @D… 1587… en    "<a h… "Dante… ""      TRUE    FALSE retw…    11 Mon   Octo…
#> 10 2022-10-30 19:32:22        8       0 "At fi… 1586… en    "<a h… "pomol… ""      FALSE   FALSE tweet    19 Sun   Octo…
#> # … with 6,213 more rows, 2 more variables: day_of_month <int>, year <dbl>, and abbreviated variable names
#> #   ¹​favorite_count, ²​retweet_count, ³​full_text, ⁴​user_mentions, ⁵​hashtags

If you’ve seen the Observable notebook that inspired this post, you’ll notice that I’ve mostly recreated their data structure, but in R. Next, let’s recreate some of the plots in that notebook, too!

Monthly tweets, replies and retweets #

Code: Set Blog Theme

Yeah, so real quick, I’m going to set up a plot theme for the rest of this post. Here it is, if you’re interested in this kind of thing!

blog_theme <-
  theme_minimal(18, base_family = "IBM Plex Mono") +
  theme(
    plot.background = element_rect(fill = "#f9fafa", color = NA),
    plot.title.position = "plot",
    plot.title = element_text(size = 24, margin = margin(b = 1, unit = "line")),
    legend.position = c(0, 1),
    legend.direction = "horizontal",
    legend.justification = c(0, 1),
    legend.title.align = 1,
    axis.title.y = element_text(hjust = 0),
    axis.title.x = element_text(hjust = 0),
    panel.grid.major = element_line(color = "#d3d9db"),
    panel.grid.minor = element_blank()
  )

theme_set(blog_theme)

The first chart shows the number of tweets, replies and mentions sent in each month from 2009 to 2022. From 2009 to 2015, I sent about 25 total tweets per month, with one large spike in January 2014 when a grad school course I was taking decided to do a “Twitter seminar.” My Twitter usage dropped off considerably between 2015 and 2018: the result of a mix of grad school grinding, and then when my son was born in 2016 tweeting practically stopped altogether.

My Twitter usage picked up again in 2018, which also coincided with my realization that academia wasn’t my ideal future. In 2018 and 2019 you can see my baseline usage pick up considerably at the start of the year — the effects of a lot of tweeting and networking during rstudio::conf. Since 2019, my usage has been fairly stable; I typically send between 50 and 100 tweets a month. Finally, there’s a noticeable recent drop in activity: since Twitter changed ownership I still read Twitter but only occasionally tweet.

Hover or tap2 on a bar above to see the top 5 tweets in each segment.

Code: Plot Monthly Tweets
type_colors <- c(reply = "#5e5b7f", tweet = "#ef8c02", retweet = "#7ab26f")

top_5_tweets_text <- function(data) {
  slice_max(
    data,
    n = 5,
    order_by = retweet_count * 2 + favorite_count,
    with_ties = FALSE
  ) |>
    pull(full_text) |>
    str_trunc(width = 120)
}

plot_monthly <-
  tidy_tweets |>
  # Group nest by month and tweet type ---
  mutate(dt_month = sprintf("%d-%02d", year, month(created_at))) |>
  group_nest(dt_month, month, year, type) |>
  mutate(
    # Calculate number of tweets per month/type
    n = map_int(data, nrow),
    # and extract the top 5 tweets
    top = map(data, top_5_tweets_text)
  ) |>
  select(-data) |>
  # Then build the tooltip (one row per month/type)
  rowwise() |>
  mutate(
    type_pl = plu::ral(type, n = n),
    tooltip = glue::glue(
      "<p><strong>{month} {year}: ", 
      "<span style=\"color:{type_colors[type]}\">{n} {type_pl}</span></strong></p>",
      "<ol>{tweets}</ol>",
      tweets = paste(sprintf("<li>%s</li>", top), collapse = "")
    ),
    tooltip = htmltools::HTML(tooltip)
  ) |> 
  ungroup() |>
  # Finally ensure the order of factors (including month!)
  mutate(type = factor(type, rev(c("tweet", "reply", "retweet")))) |>
  arrange(dt_month, type) |>
  mutate(dt_month = fct_inorder(dt_month)) |>
  # Plot time! ----
  ggplot() +
  aes(x = dt_month, y = n, fill = type, color = type, group = type) +
  ggiraph::geom_col_interactive(
    width = 1,
    aes(tooltip = tooltip)
  ) +
  scale_fill_manual(values = type_colors) +
  scale_color_manual(values = type_colors) +
  # The x-axis is factors for each month,
  # we need labels for each year, e.g. 2010-01 => 2010
  scale_x_discrete(
    breaks = paste0(seq(2008, 2022, by = 1), "-01"),
    labels = seq(2008, 2022, by = 1)
  ) +
  scale_y_continuous(expand = expansion(add = c(1, 1))) +
  labs(
    title = "Tweets per month",
    x = "Month Tweeted →",
    y = "Count →",
    fill = NULL,
    color = NULL
  ) +
  theme(
    plot.title = element_text(size = 24, margin = margin(b = 2, unit = "line")),
    legend.position = c(0, 1.14)
  )

ggiraph::girafe(
  ggobj = plot_monthly, 
  width_svg = 14, 
  height_svg = 6,
  desc = knitr::opts_current$get("fig.alt")
)

Which tweets earned the most internet points? The next plot displays tweets that had at least 5 retweets or 5 favorites. Note that I’ve fiddled with the axis scales; both are log-scales and each break shows (roughly) a doubling of internet points in each direction. Interestingly, for “popular” tweets (please note the air-quotes) retweets and favorites appear to be log-linear: a doubling of one generally corresponds to a doubling of the other, although my tweets tended to receive about 4 times as many likes as retweets.

There’s also some pretty interesting stuff going on in the low-retweets but high-favorites area. Popular tweets are cool, but the tweets that got lots of likes without being retweeted are the feel-good tweets that made me feel like I was part of a community online.