The Anatomy of a Harbor Task

How tasks are structured and what each piece does

The Anatomy of a Task

└── example-task/
    ├── task.toml
    ├── instruction.md
    ├── environment
    │   ├── Dockerfile
    │   └── main.go
    ├── solution
    │   └── solve.sh
    └── tests
        └── test.sh

Convention: the directory name (example-task) is also the task name.

task.toml

└── example-task/
    ├── task.toml        ◄
    ├── instruction.md
    ├── environment/
    │   ├── Dockerfile
    │   └── main.go
    ├── solution/
    │   └── solve.sh
    └── tests/
        └── test.sh

Task metadata

[metadata]
author_name = "Przemyslaw Hejman"
difficulty = "easy"
tags = ["trivial", "golang"]

[agent]
timeout_sec = 900.0

[environment]
cpus = 1
memory_mb = 4096

instruction.md

└── example-task/
    ├── task.toml
    ├── instruction.md   ◄
    ├── environment/
    │   ├── Dockerfile
    │   └── main.go
    ├── solution/
    │   └── solve.sh
    └── tests/
        └── test.sh

The prompt for LLM agent containing the actual assignment.

  • Define role and goal clearly.
  • Specify constraints (e.g., "DO NOT MODIFY main.go").
  • Mention expected output location.
You are expert Go programmer.
Your task is to compile Go program from source
code located in `/app/src` and produce a working binary at `/app/bin/main`.
Do not modify any Go files.

environment/

└── example-task/
    ├── task.toml
    ├── instruction.md
    ├── environment/     ◄
    │   ├── Dockerfile   ◄
    │   └── main.go      ◄
    ├── solution/
    │   └── solve.sh
    └── tests/
        └── test.sh

The world your task lives in

  • Dockerfile — environment in which the tasks runs.
  • Input files: code and data (need to be explicitly copied).
  • docker-compose.yaml (optional) for complex setups or network isolation.

tests/

└── example-task/
    ├── task.toml
    ├── instruction.md
    ├── environment/
    │   ├── Dockerfile
    │   └── main.go
    ├── solution/
    │   └── solve.sh
    └── tests/           ◄
        └── test.sh      ◄

How we know the agent succeeded

  • test.sh — main verification script.
  • It executes tests (e.g. pytest, diff) inside the environment.
  • Writes score 1 (success) or 0 (failure) to /logs/verifier/reward.txt.
  • Support of multidimensional metrics with reward.json.

solution/

└── example-task/
    ├── task.toml
    ├── instruction.md
    ├── environment/
    │   ├── Dockerfile
    │   └── main.go
    ├── solution/        ◄
    │   └── solve.sh     ◄
    └── tests/
        └── test.sh

Reference (oracle) solution

  • Running solve.sh should make all tests pass.
  • Optional, but useful for checking tests.
  • Required for public benchmarks. e.g. TerminalBench.
harbor run -p "tasks/example-task" --agent oracle

Running the example task

export OPENROUTER_API_KEY=...
harbor run -p "tasks/example-task"
  --agent terminus-2
  --model openrouter/anthropic/claude-haiku-4.5

An interactive preview of the results:

harbor view jobs/

Agents

export ANTHROPIC_API_KEY=sk-ant-api03-...
harbor run -p "tasks/example-task" --agent claude-code

Default is terminus-2 by Laude Institute.

Multiple agents are available: gemini-cli, claude-code, codex, cursor-cli.

Environments

harbor run -p "tasks/example-task"
  --agent terminus-2
  --model openrouter/anthropic/claude-haiku-4.5
  --env daytona
  • The default is a local Docker environment.
  • Built-in support for remote environments, e.g. Daytona.
  • Requires DAYTONA_API_KEY.

Task inspiration