Remembering what I did this week with LLMs

The object impermanence of work

My least favorite meetings are status meetings. I love seeing the people I work with and hearing about what they’ve been up to, but I hate trying to remember what the hell I’ve been working on.

When I joined the Shiny team, we’d have a short daily status meeting and you’d think it’d be easy to remember what I did in the last 24 hours. But apparently, that’s not how I live my life. I’m sure I thought of things to say, but every daily update I felt like I’d say some variation of “I did a bunch of stuff and now I’m… doing more stuff”.

My perception of how I work is in short bursts of deep focus time punctuated by many short breaks to answer questions on Slack or GitHub issues or the Shiny Discord server or Posit Community or any of the many other places that people post questions and sometimes code.

Between the deep focus work, I usually maintain one thread of maintenance tasks, like babysitting a CI process, or making a small change to fix a bug, or turning screenshots into reproducible bug reports. And I also tend to have another more speculative thread going for learning something new or playing with a new idea.

I don’t recommend my way of working to others. Whole books have been written just to say “don’t work like this”! Software engineers rail against context switching, interruptions, and multitasking.

I compensate for the known perils of my work style in two ways. First, even if I do a lot of different things throughout the day, there’s generally only ever one task that I’m working on at any given moment. That task gets all the focus.

Second, I try to leave context behind in my work. If I’m working on a pull request, that means submitting the PR with enough description that I could forget that I even wrote the PR and most of what I’d need to pick it back up is there in the description or comments. If I had some code in front of me when I found the issue or was testing the PR, that code ends up in the description. If the context is recoverable, it’s at least not lost forever in the switching.

But these two strategies combine in a strange way, resulting in the object impermanence of work: if it’s not in front of me, it doesn’t exist.

I just don’t keep a running inventory of what I was working on or what I did; I know that I can find everything I need to know from the open GitHub issues or PRs. Which then are summarized in stand ups as… stuff.

What did I even do this week?

It turns out I can answer the question of what I did this week, as long as I’m allowed to run some R code.

Since my entire life is lived on GitHub, I can get a pretty good sense of what I’ve been working on by looking at the PRs and issues I opened and closed in a week. Even better, this information is readily obtained from GitHub with a search like author:gadenbuie created:>2025-01-17T00:00:00Z or author:gadenbuie closed:>2025-01-17T00:00:00Z. And I can create those search queries with R:

author <- "gadenbuie"
date <- Sys.Date() - 7
search_created <- sprintf("author:%s created:>%sT00:00:00Z", author, date)
search_closed <- sprintf("author:%s closed:>%sT00:00:00Z", author, date)

But of course I don’t want to have to actually click around on GitHub to remember what I was doing. What I want is some rectangular data that I can work on in R.

Enter the gh package, a package that lets you interact with the GitHub API. We can use gh to run the created and closed search queries and collect all of the results.

library(gh)

created <- gh("/search/issues", q = search_created, .limit = Inf)
closed <- gh("/search/issues", q = search_closed, .limit = Inf)

Why not do this in one single API request? Unfortunately, GitHub search doesn’t let you use OR for created: and closed: in the same query, but that’s okay, we can put the results together ourselves. Each result has an $items field that contains a list of issues/pull requests, each with a whole bunch of fields. I’ll use purrr to map over the issues and extract the fields I care about.

library(purrr)
library(dplyr, warn.conflicts = FALSE)

work_data <-
  c(created$items, closed$items) |>
    map(function(issue) {
      tibble(
        repository = sub(
          "https://api.github.com/repos/",
          "",
          issue$repository_url
        ),
        title = issue$title,
        created_at = issue$created_at,
        closed_at = issue$closed_at,
        url = issue$html_url,
        state = issue$state,
        is_issue = is.null(issue$pull_request),
        body = issue$body
      )
    }) |>
    list_rbind()

work_data

# A tibble: 83 × 8
   repository              title created_at url   state is_issue body  closed_at
   <chr>                   <chr> <chr>      <chr> <chr> <lgl>    <chr> <chr>    
 1 r-lib/systemfonts       Repo… 2025-01-2… http… open  TRUE     "A s… <NA>     
 2 rstudio/shiny           fix(… 2025-01-2… http… open  FALSE    "Thi… <NA>     
 3 rstudio/chromote        v0.4… 2025-01-2… http… open  FALSE    "For… <NA>     
 4 rstudio/chromote        Rele… 2025-01-2… http… open  TRUE     "Pre… <NA>     
 5 quarto-ext/bluesky-com… refa… 2025-01-2… http… open  FALSE    "Fix… <NA>     
 6 quarto-ext/bluesky-com… Add … 2025-01-2… http… open  TRUE     "The… <NA>     
 7 rstudio/bslib           Bran… 2025-01-2… http… clos… FALSE    "Fix… 2025-01-…
 8 rstudio/bslib           chor… 2025-01-2… http… clos… FALSE    "Mov… 2025-01-…
 9 rstudio/bslib           Bran… 2025-01-2… http… clos… TRUE     "The… 2025-01-…
10 quarto-dev/quarto-cli   All … 2025-01-2… http… open  TRUE     "###… <NA>     
# ℹ 73 more rows

Now that I have a nice rectangular summary of my work over the last week, I’ll organize it a bit to differentiate between my Posit/Shiny work and my personal open-source projects. I’ll also add an action column indicating the primary action this week.

work_data <-
  work_data |>
    mutate(
      personal = grepl("(^gadenbuie)|bluesky-comments", repository),
      action_at = coalesce(closed_at, created_at),
      action = recode(state, open = "opened"),
      action = paste(action, if_else(is_issue, "an issue", "a PR")),
    ) |>
    arrange(personal, repository, is_issue, action_at)

# Preview the work data
work_data |>
  select(repository, action, action_at, title)

# A tibble: 83 × 4
   repository                  action          action_at            title       
   <chr>                       <chr>           <chr>                <chr>       
 1 posit-dev/air               opened an issue 2025-01-17T19:11:42Z Try to brea…
 2 posit-dev/air               opened an issue 2025-01-17T20:11:24Z Empty or sh…
 3 posit-dev/air               closed an issue 2025-01-21T19:56:11Z Selectively…
 4 posit-dev/air               closed an issue 2025-01-21T19:56:11Z Selectively…
 5 posit-dev/brand-yml         closed a PR     2025-01-21T17:32:35Z docs: Add g…
 6 posit-dev/brand-yml         closed a PR     2025-01-21T17:32:35Z docs: Add g…
 7 posit-dev/brand-yml         closed a PR     2025-01-21T19:58:30Z docs: Use G…
 8 posit-dev/brand-yml         closed a PR     2025-01-21T19:58:30Z docs: Use G…
 9 posit-dev/py-shiny          opened an issue 2025-01-21T16:25:43Z `render.Dat…
10 posit-dev/py-shiny-validate closed a PR     2025-01-22T17:28:32Z fix: Disabl…
# ℹ 73 more rows

Rectangular data FTW! This is great, I have a nice table of all the things I’ve done this week. But it’s still a bit hard to read to make sense of it all. We’ve reached the “I did some stuff and it’s in this tibble” stage, which is admittedly a little better than “I did some stuff”.

What I’d love to do is give this to a robot assistant who could read it all for me and they could summarize my week, or at least get me started on a summary. Sounds like a great task for a large language model!

Hiring my executive robot assistant

Now we get to the fun part!¹ We’re going to use the newly released ellmer package to send my work summary to a large language model and ask it to summarize my week.

If you’re following along at home, here’s where you’ll need to pause and get yourself set up. First, install ellmer:

install.packages("ellmer")

Second, you’ll need access to an LLM. The coolest part of ellmer is that it supports a wide range of LLM providers, which are all listed in the Chatbots reference. To use your favorite LLM, you’ll need to set up an account and get an API key. But once you’ve completed that step, switching between LLMs is as easy as changing out the chat_*() function that you use to start the chat.

For this demo, I’ll use Anthropic with chat_claude(). The help page for chat_claude() includes instructions for setting up a developer account, which I’ve already followed.

library(ellmer)

chat <- chat_claude()

If you prefer OpenAI’s ChatGPT, you’d use chat_openai(), or if you have ollama running a local model you’d use chat_ollama().

Once you have a chat object, you can use the $chat() method to talk to the LLM. The answers are streamed back to your R session and the chat is cumulative, so you can keep “talking”.

chat$chat("Say hello three languages.")

Here's "hello" in three languages:

1. English: Hello
2. Spanish: Hola
3. French: Bonjour

chat$chat("Say goodbye in those same languages.")

Here's "goodbye" in those same languages:

1. English: Goodbye
2. Spanish: Adiós
3. French: Au revoir

Our final task is to give the LLM all of the information we’ve collected from my GitHub issues and then to convince it to summarize it all in a meaningful way. I can further refine this task into three distinct components:

Use a system prompt to vibe check the LLM. System prompts are a good place to store initial instructions that the LLM uses a sort of “light instruction set”, like a sort of zero-th prompt.

I’ll use the system prompt to convince the LLM that it’s my assistant.
Turn the work summary into something a little bit more readable that I can send to the LLM as part of the user prompt.
Write up a user prompt that gets the LLM to do what I want (most of the time).

System prompt

This is the system prompt that I settled on:

_prompt-system.md

You are an assistant who keeps track of the work I do each week. Your goal is to
faithfully and accurately report the work I do to my boss while making sure that
the impact of my work is clearly articulated. You are clear and concise and use
direct language, preferring to match the terminology and phrases that I use. You
respond directly without any preamble.

The goal is to try to help the LLM align with the vibe of the task:

It needs to summarize my work
Please don’t make things up
None of that “Sure, here’s your summary of…” fluff.

Work context

The next step is to get the work summary into a format that’s easier for LLMs to work with. There’s a good chance that I could just throw CSV or a JSON of the work_data table at it, but I think I’ll get better answers if I format it like text. Notice that I’ve already grouped projects together and I’ve tried to make sure relevant parts of the work data are close to each other. I’m going off vibes with this, but I expect I’ll get better summaries with a little pre-work than if I’d ask the LLM to make connections at great distance across the text.

For this task, I’ll turn to my favorite string templating package, epoxy. I’ve created a template that I’ll apply to each row of work_data:

_prompt-work-summary.md

---
engine: epoxy
---

<work-item>
Action: {action}
Title: {title}
Type: {ifelse(personal, 'Personal Work', 'Professional Work')}
Link: {url}

<work-description>
{body}
</work-description>
</work-item>

I can take work_data and pass it to epoxy::epoxy_use_file() to fill in the templating bits with the values from work_data, one row at a time.

library(epoxy)

work_data |>
  slice(15) |>
  epoxy_use_file("_prompt-work-summary.md", .engine)

Here’s an example of a single work summary using the 15th row.

<work-item>
Action: closed a PR
Title: chore: Add `done.` signal when package installation is completed
Type: Professional Work
Link: https://github.com/posit-dev/shinylive/pull/195

<work-description>
In #194 we went back to calling

```patch
- micropip.install(pkg_name)
+ micropip.install(req)
```

because otherwise we lose information like the requested package version.

I thought this broke installation of package extras because it appeared that `micropip.install()` doesn't actually install the packages listed in an extra. 

It turns out that extras aren't installed when the main package is already installed. In other words, `micropip.install("shiny[theme]")` will _not install_ `brand_yml` and `libsass` because `shiny` is already installed, but `micropip.install("httpx[socks]")` _will install_ `socksio` along with `httpx` because `httpx` isn't initially installed.

So in the end, this PR mostly just adjusts a few comments and improves the UX of the console message. As [pointed out on Discord](https://discord.com/channels/1109483223987277844/1318357786379161731/1318536291750641674) the messages say `Installing {req} ...` and the last message seems to hang. This PR adds `done` to each line when the package install finishes.
</work-description>
</work-item>

Notice that I’ve used <work-item> and <work-description> pseudo-tags to give the text a bit of structure.

Now we can put the work summary together into a single text string.

work_summary <-
  work_data |>
    epoxy_use_file("_prompt-work-summary.md", .collapse = "\n\n")

The user prompt

The last step is to give the LLM some instructions about what I want it to write. Again, I’ll use a prompt file for this task.

_prompt.md

Summarize my work over the last week for a team update in prose.
Focus more on pull requests; issues are less important and just provide context.
Write in prose with full paragraphs.
Focus on the user impact of the work first and secondarily on the internal impact of dev changes.

Summarize the work using markdown under two headings:
'## Professional Work' and '## Personal Work'.
Include direct links to any mentioned pull requests using markdown syntax.
Use the first-person and try to sound like me.

<work-summary>
{{ work_summary }}
</work-summary>

For this prompt, I’ll use ellmer::interpolate_file(), which works a lot like epoxy_use_file() but comes built-in with ellmer. It reads in a file and interpolates the { expr } strings, which in this case will inject the work_summary into the prompt.

The final result

Finally, we’re ready to put it all together and find out if it works!

chat <- chat_claude(
  system_prompt = readLines("_prompt-system.md")
)

work_summarized <-
  chat$chat(interpolate_file("_prompt.md"), echo = FALSE)

# I'm caching the result so I don't rack up costs while writing
saveRDS(chat, "chat.rds")

I’ve set echo=FALSE so that I can store the result and show it to you with a big reveal, but by default the response streams right into your console.

Sending my whole work summary plus the prompt took about 23,297 tokens and Claude responded with 594 tokens. Using LLM Price Check, 25k input tokens and 500k output tokens, ends up costing about $0.08 per call to claude-3-sonnet or about $0.13 per call to gpt-4o.

Here’s the final output! ellmer has a really neat feature that you can turn any part of the conversation into markdown with contents_markdown(). I’ve used code like what’s listed below to embed the response here in this post.

chat$last_turn() |>
  contents_markdown()

My Work Summary

Professional Work

I made significant improvements across multiple projects this week. In bslib, I improved navbar compatibility for Bootstrap 5 and light/dark modes through PR #1145, which brought bslib in line with Bootstrap 5’s expectations around navbar background color control while adding appropriate controls for various scenarios. I also contributed to better code organization by moving navbar options to a dedicated file through PR #1173.

In chromote, I enhanced the developer experience by adding the chromote_info() helper function via PR #190 to provide useful system and configuration information. I also improved session management by setting user-specific crash dump directories in PR #194.

For shinylive and Shiny, I made several improvements to enhance stability and user experience. I fixed Windows path handling in shinylive through PR #47 and improved the insertion of nav items in Shiny with PR #4179.

I also worked on improving code formatting across our repositories by setting up automated R code formatting with air through PR #43 in the shiny-workflows repository.

Personal Work

I made substantial contributions to the bluesky-comments Quarto extension, implementing several key features and improvements:

Enhanced content label handling and UI through PR #29
Added support for images and GIFs in PR #40
Implemented nested comment depth control via PR #41
Improved header configuration options in PR #39

I also enhanced shrtcts to support better workflow integration in Positron through PR #26, adding shortcut_run() and shortcut_get() functions.

I have to admit, that’s not a bad summary² of my week!

What’s cool about this approach is that I could continue to chat with Claude about my week, asking follow up questions to have it further remind me about what I did.

Another idea I have now is to run this script in a weekly cron job, just before our larger team meeting, to make sure that I know what I did this week when I get there. Not only that, but this would make a great running summary that might come in handy for a performance review or promotion request.

Using local LLMS

Just to prove that it’s easy to switch models, I’m going to repeat the above process with a local model. This is the same code as above, but this time using ollama to run phi4, a new 14B parameter model from Microsoft.

Before I show you the results, I have to admit something: they’re … not stellar. I tried a few other local models, and phi4 generally did the best. A 14B parameter model isn’t the smallest model on the block, but it’s still pretty small. By comparison, Claude 3.5 Sonnet has roughly 175 billion parameters. The smaller parameter models have almost no chance of acing this task.

I also found that if I tried to summarize both professional and personal work in the same chat, the local models would get hung up on the personal work and would end up summarizing only that. After playing around with it for a bit, I decided to focus just on the professional work summary, where phi4 does okay.

professional_work <-
  work_data |>
    filter(!personal) |>
    epoxy_use_file("_prompt-work-summary.md", .collapse = "\n\n")

chat_llama <- chat_ollama(
  system_prompt = brio::read_file("_prompt-system.md"),
  model = "phi4",
  api_args = list(temperature = 0.2)
)

work_summarized_llama <-
  chat_llama$chat(
    interpolate_file(
      "_prompt.md",
      work_summary = professional_work
    ),
    echo = FALSE
  )

My Work Summary (phi4)

Summary of Recent Work

Professional Activities:

Shiny Core CI Snapshots Update:
- Updated snapshots due to Chrome v132’s deprecation of --headless=old mode.
- Chromote now uses the new --headless mode by default (see Chromote PR #189).
- Reviewed all updated snapshots; no behavior changes detected, except for potential issues with absolute positioning in panel layouts (027-absolutely-positioned-panels).
Shiny Workflows Enhancement:
- Introduced a new composite workflow rstudio/shiny-workflows/format-r-code to format R code using air.
- This workflow is part of the routine process, activated via inputs.format-r-code, which defaults to false.
- Users can enable it by setting format-r-code: true in their configuration.
- Additional inputs for this workflow include:
  - version: Specifies the version of air (default: latest).
  - check: When set to "true", checks if files change without committing (air format --check).
  - path: Defines paths for formatting, defaulting to all R files in the repository.
- Demonstrated usage in the bslib project (bslib PR #1167).
Shiny Core CI Snapshots Update (Duplicate Entry):
- Reiterated update due to Chrome v132 changes, with similar outcomes and considerations as previously noted.

Open Source Contributions:

Shiny PR #4952:
- Addressed a bug where shiny::runApp() would fail if the app directory was not writable.
- Implemented a fix that allows apps to run in read-only directories by copying files into a temporary location for execution.
Shiny Core CI Snapshots Update (Duplicate Entry):
- Repeated update details regarding Chrome v132 and Chromote’s headless mode transition, with no additional changes noted beyond the initial entry.

These activities reflect ongoing efforts to maintain and enhance the Shiny ecosystem, ensuring compatibility with new browser versions and improving developer workflows.

The local model takes quite a bit longer to provide a response, but I can run it on my M3 MacBook Air. The summarization quality is nowhere near what I got from Claude, either, but I didn’t need to pay for the query or send my data to a third-party. (I also didn’t spend much time on this, I probably could get better results with more prompt engineering.) At the very least, the local model is a great way to practice setting up the pipeline before I’m ready to pay for queries with Anthropic or others.

Final script

One last thing: here’s all of the code from the meandering blog post above in one place, with all of the prompts inlined. Amazingly, its less than 100 lines of code.

# Query issues and pull requests from this week -------------------------------
library(dplyr)
library(purrr)
library(gh)

author <- "gadenbuie"
start_date <- Sys.Date() - 7

search_created <- sprintf("author:%s created:>%sT00:00:00Z", author, start_date)
search_closed <- sprintf("author:%s closed:>%sT00:00:00Z", author, start_date)

created <- gh("/search/issues", q = search_created, .limit = Inf)
closed <- gh("/search/issues", q = search_closed, .limit = Inf)

work_data <-
  c(created$items, closed$items) |>
    map(function(issue) {
      tibble(
        repository = sub(
          "https://api.github.com/repos/",
          "",
          issue$repository_url
        ),
        title = issue$title,
        created_at = issue$created_at,
        closed_at = issue$closed_at,
        url = issue$html_url,
        state = issue$state,
        is_issue = is.null(issue$pull_request),
        body = issue$body
      )
    }) |>
    list_rbind() |>
      mutate(
        personal = grepl("(^gadenbuie)|bluesky-comments", repository),
        action_at = coalesce(closed_at, created_at),
        action = recode(state, open = "opened"),
        action = paste(action, if_else(is_issue, "an issue", "a PR")),
      ) |>
      arrange(personal, repository, is_issue, action_at)

# Prepare the work data summary -----------------------------------------------
library(epoxy)

work_summary <-
  epoxy(
    "<work-item>
    Action: {action}
    Title: {title}
    Type: {ifelse(personal, 'Personal Work', 'Professional Work')}
    Link: {url}

    <work-description>
    {body}
    </work-description>
    </work-item>",
    .collapse = "\n\n"
  )

# Summarize with the LLM ------------------------------------------------------
library(ellmer)

chat <- chat_claude(
  system_prompt = epoxy(
    "You are an assistant who keeps track of the work I do each week. Your goal is to
    faithfully and accurately report the work I do to my boss while making sure that
    the impact of my work is clearly articulated. You are clear and concise and use
    direct language, preferring to match the terminology and phrases that I use. You
    respond directly without any preamble."
  )
)

chat$chat(
  interpolate(
    "Summarize my work over the last week for a team update in prose.
    Focus more on pull requests; issues are less important and just provide context.
    Write in prose with full paragraphs.
    Focus on the user impact of the work first and secondarily on the internal impact of dev changes.

    Summarize the work using markdown under two headings:
    '## Professional Work' and '## Personal Work'.
    Include direct links to any mentioned pull requests using markdown syntax.
    Use the first-person and try to sound like me.

    <work-summary>
    {{ work_summary }}
    </work-summary>"
  )
)

# To get the LLM response as markdown:
contents_markdown(chat$last_turn())

Footnotes

I know what you’re thinking and you’re totally right. The fun part was getting all this data in a nice tidy tibble! Think of all the things we could do with that tibble. But dplyr isn’t the new hotness and this blog needs to meet its buzz word quota, so we’re forced to do something extra.↩︎
Does the summary have links to the PRs? This one was hit-or-miss. Sometimes Claude includes these and sometimes they’re omitted. I’m sure a little more prompt massaging could improve the consistency.↩︎