NAME
ralph — the KAHN retry loop. Exponential backoff scheduler for failed done_when conditions.
DESCRIPTION
Ralph is the component responsible for retrying nodes that fail their done_when condition. It implements exponential backoff with a configurable budget (maximum iterations per node) and respects interrupts cleanly.
Every failed node evaluation increments Ralph's retry counter. If the counter exceeds the budget, the node is marked as failed and dependents are notified.
BACKOFF STRATEGY
iteration 1: 0ms (immediate)
iteration 2: 100ms
iteration 3: 200ms
iteration 4: 400ms
iteration 5: 800ms
iteration 6: 1600ms
iteration 7: 3200ms
...
iteration N: min(100ms * 2^(N-2), 60s) # capped at 60 seconds
All timings are subject to jitter (±10%) to prevent thundering herd during mass retries.
BUDGET TRACKING
Each node has a retry budget (default: 10 iterations). After 10 failed evaluations, Ralph marks the node as FAILED and logs the reason. The run continues with dependent nodes if possible.
Budgets are per-node and per-run. They reset when a new run begins.
DONE_WHEN EVALUATION
Ralph executes a shell expression (the node's done_when field) in the node's working directory. The expression must exit with code 0 to indicate success.
done_when: "cargo test && ./scripts/smoke.sh"
done_when: "[ -f build/output.txt ]"
done_when: "curl http://localhost:8080/health | jq .status | grep -q online"
If the expression exits with non-zero, Ralph logs the failure and schedules a retry after the backoff delay.
EXIT CODES
- 0
- all nodes converged (all done_when satisfied)
- 1
- at least one node exhausted its budget without converging
SEE ALSO
COLOPHON
Ralph is the retry loop component of the KAHN orchestrator. Named after Ralph Langley, because determination and persistence in the face of obstacles is a feature, not a bug. See https://kahn.tools for details.