~ before you arrive

Prerequisites

Install these before the workshop. If you get stuck, bring your laptop and we will help you at the start.

Required

All participants

A laptop with a terminal

macOS, Linux, or Windows with WSL. You need a working terminal where you can run shell commands. On Windows, install WSL 2 first.

Version control

Git

We use git throughout the workshop. Install it if you do not have it already.

# macOS xcode-select --install # Ubuntu / Debian sudo apt install git # Check git --version
Hands-on

An API key

You need access to at least one LLM. We recommend OpenRouter (one key, many models). Or get a key from Anthropic or OpenAI directly.

Budget about $5–10 of credit.

Install one coding agent

Pick at least one. We demo with Claude Code, but OpenCode is a strong open-source alternative that works with any provider.

Recommended

Claude Code

Anthropic's agentic coding tool. Requires Node.js 18+ and an Anthropic API key (or a Claude Pro/Max subscription).

# Install via npm npm install -g @anthropic-ai/claude-code # Verify claude --version
Docs
Open source

OpenCode

Open-source AI coding agent. Works with any LLM provider — OpenRouter, Anthropic, OpenAI, or local models.

# Install curl -fsSL https://opencode.ai/install | bash # Verify opencode --version
Docs

Before you bring data

An LLM API is a third party. Anything you send leaves your machine. Decide what is safe to share before the workshop — especially if you work with restricted administrative or human-subjects data.

Decide first

Can this data leave your machine?

Data use agreements, IRB protocols, and statistical-agency contracts often forbid sending microdata to an external service. If in doubt, treat it as a no — the exercises work fine on synthetic or public data.

Safe default

Mock or anonymized data

Ask the agent to generate a synthetic dataset with the same schema, or anonymize a sample. You debug the code against fake data, then run it yourself on the real data, locally.

When you can't share

Local models + guardrails

For restricted data, run an open-weight model locally (Ollama, LM Studio) or your institution's private instance. Either way, set permissions so the agent cannot read or delete files outside its sandbox.

Optional but useful

Act 3 exercises

DuckDB

We use DuckDB in the dodo demo and exercise. Install it to follow along hands-on in Act 3.

# macOS brew install duckdb
Install guide
Likely pre-installed

Make

GNU Make for running pipelines. Already installed on most macOS and Linux systems.

make --version
Bring your own

A project to work on

The exercises work best with a real research project — ideally a git repo with data-wrangling scripts (.do, .py, .R, or .jl). We provide a sample dataset if you do not have one.