A continuous shipping platform for engineering teams. Workflows live in your repo. Agents are first-class. Tickets, alerts, and PRs trigger pipelines directly, no webhook glue.
Shipfox is a GitOps workflow orchestration engine for engineering teams. Define workflows and agents in YAML under .shipfox/. They're versioned with your code, reviewed in PRs, and snapshotted at every trigger. If you've written a CI pipeline, you already know how this works.
Workflows are YAML files in .shipfox/. Version-controlled, reviewable, auditable. No separate platform to configure.
Define reusable agents with a model, tools, MCP servers, system prompt, and structured output schema. Reuse the CLAUDE.md / AGENTS.md and Claude skills already in your repo; agents pick them up automatically.
Each job runs in its own VM with your code checked out; isolated, reproducible, parallel by default. Compose them into arbitrarily complex graphs: fan-out, fan-in, branches, loops, matrices, event-driven waits.
A Sentry alert, a Linear ticket, a GitHub push, a Slack message, a cron schedule, a raw webhook. Every integration is a native trigger, not glue code.
.shipfox/ ├── workflows/ │ ├── triage-sentry.yml │ ├── review-pr.yml │ ├── plan-and-build.yml │ └── best-of-n.yml └── agents/ ├── coder.yml ├── reviewer.yml ├── planner.yml └── diagnostician.yml
Tickets in, pull requests out. Sentry alerts in, fixes out. Every integration is a first-class trigger or a tool an agent can call.
Four patterns that cover most of what engineering teams automate. Each one is a single YAML file in your repo. Hover any highlighted line to see what it does.
A Sentry issue fires the workflow. A diagnostician agent analyzes the error and decides: auto-fixable or needs a human? The fix path writes a patch, runs tests, and opens a PR. The escalate path pages oncall with full context.
triggers: - source: sentry event: new_issue jobs: triage: agent: diagnostician output: action: enum(fix, escalate) route: needs: triage branch: fix: coder → pytest → gh pr create escalate: page #oncall with context
A producer agent writes a fix. Ordered gates verify it: first tests, then a reviewer agent. If anything fails, the producer gets the feedback and tries again. Sessions persist across rounds so the reviewer remembers what it already flagged.
loop: max_rounds: 3 producer: agent: coder session: persistent gates: - type: shell run: pytest tests/ -x - type: agent agent: reviewer session: persistent approve_when: verdict == "approve"
A Linear ticket assigned to @shipfox triggers a planner agent that posts a proposed plan as a GitHub issue. The workflow then sleeps, releasing its runner. When a human comments, it wakes, revises the plan, and sleeps again. When the reviewer comments /approve, implementation begins with full context. A pipeline that runs over days, not minutes.
jobs: review-plan: on: events: - source: github event: issue.comment until: event.body contains "/approve" debounce: 5m agent: planner session: persistent inherit_session: create-plan.planner
The same bug sent to three different LLMs in parallel. Each gets its own copy-on-write worktree, so there are no conflicts. A reviewer agent compares all patches and picks the best one.
jobs: race: matrix: each: model in: [opus-4.7, gpt-5.4, qwen3.6-max] concurrency: 3 isolation: worktree agent: coder model: "{{ matrix.model }}" judge: needs: race agent: reviewer prompt: Pick the best patch.
Workflows aren't write-once. After every execution, Shipfox reflects on what happened, drafts changes, and accumulates a project memory that all your agents share.
After every run, the system reviews what worked, what failed, what was slow, and what cost too much. No manual triage required.
It proposes concrete changes: better prompts, tighter output schemas, missing edge cases in your YAML, or fixes directly in the codebase. Changes are submitted as PRs you review like anything else.
Patterns, past failures, conventions, and decisions accumulate into a shared project memory that every agent in every workflow can read. The tenth workflow is cheaper and faster than the first.
The control plane teams need to operate this safely at scale: every run logged, every model swappable, every dollar accounted for.
Every run logged end-to-end: trigger source, agent sessions, tool calls, outputs, duration, cost. Full audit trail across your entire fleet. No black boxes.
Not locked into one provider. Use Anthropic, OpenAI, Google, Mistral, Qwen, DeepSeek, or any open-weight model. Plug in your own inference APIs or API keys. Mix models within a single workflow. Switch without rewriting.
Set budgets per workflow, per team, or per model. Hard caps and soft alerts. Real-time spend dashboard. No surprise bills from a runaway agent loop.
The same engine, same YAML, same agents. Start free on your own infra, move to managed cloud when you stop wanting to wake up at 3am, or keep it all in your VPC.