Project Structure
Clone the template
Open Terminal, navigate to where you want the project, and clone:
gh repo create my-project --template Black-JL/Research-Project-Flow --private --clone
cd my-project
This creates a private repo on your GitHub account with the full template structure. You now have a project folder that looks like this:
my-project/
├── README.md ← Project overview, pipeline, replication
├── CLAUDE.md ← AI agent instructions
├── run_all.sh ← Master execution script
├── .gitignore
│
├── data/
│ ├── raw/ ← Untouched source data. NEVER modify.
│ │ └── README.md ← Source, date, access instructions
│ └── processed/ ← Created by scripts
│
├── scripts/
│ ├── 00_run.do ← Master do-file: globals and pipeline
│ ├── params.do ← Research parameters
│ └── programs/ ← Reusable helper functions
│
├── output/
│ ├── logs/ ← Execution logs
│ ├── figures/ ← Plots and maps
│ ├── tables/ ← LaTeX table fragments
│ └── results/ ← Stored estimation results
│
├── manuscript/
│ ├── manuscript.tex ← Active manuscript
│ ├── references.bib ← Auto-exported from Zotero
│ └── aea_style_guide.md ← Formatting reference
│
└── scratch/ ← Throwaway work. Not committed.
Why this structure
Six principles govern the layout:
- Every file has one home. Outputs in
output/, data indata/, scripts inscripts/. No duplicates, no ambiguity. - Data scripts write to
data/processed/. Analysis scripts write tooutput/. No script crosses this boundary. - Absolute paths live in one place. Machine-specific paths go in
00_run.do. Everything else uses globals defined there. - One command reproduces everything.
run_all.shexecutes the full pipeline from raw data to final output. - Fail loudly. Every script logs its execution. When something breaks, the log shows where and why.
- Structure enforces discipline. If a file doesn’t have an obvious home, the structure needs updating — not the file.
These conventions draw from Gentzkow & Shapiro, the TIER Protocol, and the AEA Data Editor’s requirements. Following them gets your project closer to replication-ready from the start.
Key files
README.md
The project README is the single source of truth. It contains:
- The pipeline table (every script, its inputs, its outputs)
- The parameters table (every hardcoded research value and its source)
- The table/figure map (which script produces which output)
- Replication instructions
When the AI adds a script or modifies an output, it updates the README automatically. You should verify these updates are correct.
CLAUDE.md
This file tells the AI how to behave in your project. It specifies:
- The project root path
- Rules (e.g.,
data/raw/is read-only) - Key file locations
- How to execute scripts
- Writing standards
The AI reads this file at the start of every session. It is the contract between you and the AI. Edit it when you want to change the AI’s behavior. If you are starting a project without the template, you can create a fresh CLAUDE.md by typing /init in Claude Code.
Other tools have their own equivalents. Codex uses
AGENTS.md. Gemini CLI usesGEMINI.md. The content is the same idea — project instructions the AI reads at startup. If you use multiple tools on the same project, create the appropriate file for each. The template ships withCLAUDE.md; adapt it for your tool of choice.
scripts/00_run.do
The master do-file. It defines path globals for every collaborator’s machine and calls each pipeline step in order. When you or the AI add a new script, it gets registered here.
scripts/params.do
All hardcoded research parameters — treatment dates, sample restrictions, outcome definitions. Centralizing them here means a parameter change propagates everywhere automatically. The values must match the Parameters table in the README.
run_all.sh
The shell script that executes the pipeline. It calls Stata, R, and Python scripts in sequence, captures logs, and reports success or failure. Run a single step:
./run_all.sh "01_import"
Run everything:
./run_all.sh --all
First steps after cloning
-
Set your machine path in
scripts/00_run.do. Replace the placeholder path with your actual project directory. -
Edit the README. Replace placeholder content with your project details: data sources, computational requirements, parameters.
-
Configure CLAUDE.md. Update the project root path. Review the rules and adjust if needed.
-
Set up
.gitignore. The template includes sensible defaults. Add any large data files or sensitive content. -
If using Dropbox: Move the project folder into Dropbox and exclude
.git:
xattr -w com.dropbox.ignored 1 .git
- Launch Claude Code and start your first session:
claude
The AI will read your CLAUDE.md, orient itself, and tell you if anything needs attention. You’re ready to work.
