~ one-day workshop · economists & social scientists

Agentic AI for Researchers

One-day workshop for economists and social scientists. Balanced demos and guided exercises.

9:00 – 16:10 ~25 participants Demos + exercises

Workshop arc

  1. Intelligence is a commodity — understand the stack, unbundle it, pick your tools.
  2. Agents work only when structure underneath them is good — your Day-1 practices are the prerequisite.
  3. Agents introduce new reproducibility risks — pinned models, saved prompts, structured IRs, and assertions are the cure.

Schedule

Act 1 How LLMs actually work 80 min
9:00 45 min
Talk Everything is a chat completion

The technical core without the hype. A model is a next-token predictor. Context is what fits in the window — everything outside it is gone. Attention is how the model decides what matters. Tool calling is structured output with a feedback loop. System prompt, user turn, assistant turn: the anatomy of every interaction. Why context filling causes failure — and why the fix is a fresh session, not more prompting.

context window attention tool calls system prompt
9:45 35 min
Talk The ecosystem — and the unbundling argument

Four separable layers: model (GPT, Claude, Gemma, Qwen, DeepSeek), harness (Claude.ai, Cursor, Claude Code, Zed), MCP (tools the model can call at runtime), skill (plain-text instructions that live in your repo). Intelligence is commoditizing fast — Chinese models match frontier at 1/25th the cost, rankings change weekly. The opinionated prescription: pick one harness you are comfortable in, swap models freely for each task, write skills as readable markdown, keep very good structure in your work. Lock-in is the risk; model loyalty is a mistake.

model harness MCP skill OpenRouter unbundling
Act 2 Working with agents 110 min
10:20 20 min
Break Coffee
10:40 40 min
Demo The escape–enter loop in practice

Live Claude Code session on a real research task. The two-layer model: natural language in, structured output out. Review diffs, not code. When to hit Escape. The “toxic context” failure mode — when to start fresh. The independence principle: never let the same session verify its own work. Permission boundaries — approve reads and edits inside the project, deny destructive shell commands — so the agent can never delete or overwrite your data. CLAUDE.md vs AGENTS.md: what belongs in each (project conventions, data assumptions, niche tools) and what does not (everything else — irrelevant instructions actively hurt).

Claude Code diff review permissions CLAUDE.md AGENTS.md context poisoning independence principle
11:20 30 min
Talk Skills and project configuration

Two kinds of plain text that make agents smarter. Project config (CLAUDE.md / AGENTS.md): your conventions, data assumptions, preferred tools — start with 3–5 rules, add one each time you correct the agent twice. Skills: reusable task templates with a fixed output location, commit convention, and explicit stopping rules. Show the paper-writing skill set as a worked example: five domain skills (WRITING, MODEL, LITERATURE, EMPIRICS, EDITING), each with a task queue. Contrast with dumping everything into one file — that actively degrades performance.

CLAUDE.md domain skills task queues commit conventions separation of concerns
11:50 20 min
Exercise Write a project config and a skill

10 minutes: draft a CLAUDE.md for your project — tools the agent does not know, naming conventions, data paths, verification rules. 5 minutes: draft a minimal skill for one recurring task. 5 minutes: share one thing that was hard to write down. The act of writing conventions in plain text for a model forces a precision that is valuable even if you never use the skill.

hands-on CLAUDE.md skill.md your own project
Act 3 Structure, reproducibility, and the agentic risk 130 min
12:10 50 min
Break Lunch
13:00 30 min
Talk Reproducibility from Day 1 — the prerequisite

Quick recap of the Vilhuber/Koren principles as the necessary foundation: no hard-coded paths, no overwriting raw data, main.do with step flags, README from day 1, secrets in environment variables, checksum-guarded downloads. These are not just journal requirements — they are what makes your project legible to an agent. An agent working in a project with hard-coded paths, no folder structure, and one giant script will fail even with the best model. Day-1 practices are agent-compatible practices. But they handle only the old reproducibility risk. Agents add a new one: the same prompt does not produce the same code twice — models are non-deterministic, and the model you used is deprecated within a year. The cure is to treat the interaction as data — pin the exact model version, and commit the prompt and the transcript alongside the diff. The prompt is part of your methods section.

TIER protocol no hard-coded paths main.do README model pinning prompt as artifact
13:30 25 min
Demo dodo: structured IR for your .do pipelines

Live demo. Run an existing .do file through the dodo DuckDB extension unchanged. Use show sql to see what the agent actually produced — not what you asked for. Use assert to catch silent failures: agent dropped observations to force a merge, agent treated panel as cross-section, agent used an ID that is not stable over time. Use dodoc to compile the pipeline to SQL for CI. The point: .do syntax is a structured intermediate representation — the agent generates it, you audit the SQL it produces, the Makefile runs it.

dodo show sql assert dodoc CI
13:55 75 min
Exercise Find the silent failure in your own pipeline

Participants bring a .do file (or use a provided Hungarian company registry dataset). Task: (1) run it through dodo, (2) use show sql to find one thing the agent generated that you would not have caught by reading the .do file, (3) write two assert statements that would catch it next time, (4) add those asserts to a make test target. Debrief: each participant names their silent failure. Collect these — they form the workshop's shared artifact and illustrate why “it ran without errors” is not a test.

hands-on dodo assert make test own data

Backup dataset provided for participants without a ready .do file.

Closing What comes next 40 min
15:10 20 min
Break Coffee
15:30 40 min
Discussion What do you actually do on Monday?

Structured around five choices: (1) which harness to start with, (2) which model for which task, (3) what goes in your CLAUDE.md — the minimum viable project config, (4) what structural change your project needs before an agent can work in it, (5) which failure modes to watch for — hallucination, silent data decisions, scope creep, context degradation, confident domain errors, restricted data leaving your machine. Each participant writes their five answers. Collected answers become the shared takeaway. Close by restating the arc: Day-1 structure was already right. Skills make it agent-legible. Assertions make it auditable. The models will keep getting cheaper; the structure is the durable investment.

harness choice model selection CLAUDE.md structural prerequisite failure modes
16:10
End