CI Runners docs ↗
Shipfox: continuous shipping platform

Your software factory.

A continuous shipping platform for engineering teams. Workflows live in your repo. Agents are first-class. Tickets, alerts, and PRs trigger pipelines directly, no webhook glue.

Star on GitHub
MIT licensed·Self-host or managed·Use any model·Full GitOps
workflow.triage-sentry·run #4,128·acme/payments-apiLiveelapsed 12.4s
sentry.new_issueSEN-2812 · NullPointer0.4s
triagediagnostician8.2s · action: fix
route→ fixbranch on output
coderclaude-sonnetwriting patch · 2.1s
pytest tests/queued
page #oncallskipped (branch)
gh pr createacme/payments-apiqueued
running8a3f2c1·defined in .shipfox/workflows/triage-sentry.yml$0.043 spent3 of 5 jobs
/how-it-works

A workflow engine that lives in your codebase.

Shipfox is a GitOps workflow orchestration engine for engineering teams. Define workflows and agents in YAML under .shipfox/. They're versioned with your code, reviewed in PRs, and snapshotted at every trigger. If you've written a CI pipeline, you already know how this works.

01

Lives in your repo

Workflows are YAML files in .shipfox/. Version-controlled, reviewable, auditable. No separate platform to configure.

02

Agents are first-class

Define reusable agents with a model, tools, MCP servers, system prompt, and structured output schema. Reuse the CLAUDE.md / AGENTS.md and Claude skills already in your repo; agents pick them up automatically.

03

Jobs form a pipeline

Each job runs in its own VM with your code checked out; isolated, reproducible, parallel by default. Compose them into arbitrarily complex graphs: fan-out, fan-in, branches, loops, matrices, event-driven waits.

04

Starts from your tools

A Sentry alert, a Linear ticket, a GitHub push, a Slack message, a cron schedule, a raw webhook. Every integration is a native trigger, not glue code.

your-repo/.shipfox/
.shipfox/
├── workflows/
│ ├── triage-sentry.yml
│ ├── review-pr.yml
│ ├── plan-and-build.yml
│ └── best-of-n.yml
└── agents/
 ├── coder.yml
 ├── reviewer.yml
 ├── planner.yml
 └── diagnostician.yml
versioned, reviewed in PRs
/integrations

Plugs into the tools your team already ships with.

Tickets in, pull requests out. Sentry alerts in, fixes out. Every integration is a first-class trigger or a tool an agent can call.

Source control
GitHubGitLabBitbucket
Ticketing
LinearJiraAsanaShortcut
Alerting & monitoring
SentryDatadogPagerDutyOpsGenie
Comms
SlackTeamsDiscord
Generic
WebhookCron
Shipfox Logo

Shipfox engine

User defined pipelines
Engine connector dots
Code changes
Open a pull requestMerge when checks pass
Tickets
Create a ticketComment on a ticketUpdate status
Communication
Send a messagePage oncall
Deploy
Trigger a deployRoll back
Anything else
MCP serversSkillsCLI tools
/use-cases

From alert to PR. From ticket to production.

Four patterns that cover most of what engineering teams automate. Each one is a single YAML file in your repo. Hover any highlighted line to see what it does.

01auto-triage and fix

Sentry alert → triage → fix or escalate.

A Sentry issue fires the workflow. A diagnostician agent analyzes the error and decides: auto-fixable or needs a human? The fix path writes a patch, runs tests, and opens a PR. The escalate path pages oncall with full context.

~12s end-to-endbranch on agent output
triage-sentry.ymlhover lines
triggers:
  - source: sentry
    event: new_issue

jobs:
  triage:
    agent: diagnostician
    output:
      action: enum(fix, escalate)

  route:
    needs: triage
    branch:
      fix:      coder → pytest → gh pr create
      escalate: page #oncall with context
02adversarial code review

Coder vs. reviewer. Until approved.

A producer agent writes a fix. Ordered gates verify it: first tests, then a reviewer agent. If anything fails, the producer gets the feedback and tries again. Sessions persist across rounds so the reviewer remembers what it already flagged.

max 3 roundspersistent sessions
review-pr.ymlhover lines
loop:
  max_rounds: 3
  producer:
    agent: coder
    session: persistent
  gates:
    - type: shell
      run:  pytest tests/ -x
    - type: agent
      agent: reviewer
      session: persistent
      approve_when: verdict == "approve"
03ticket to plan to production

Linear ticket → plan → human review → ship.

A Linear ticket assigned to @shipfox triggers a planner agent that posts a proposed plan as a GitHub issue. The workflow then sleeps, releasing its runner. When a human comments, it wakes, revises the plan, and sleeps again. When the reviewer comments /approve, implementation begins with full context. A pipeline that runs over days, not minutes.

wakes on commentsruns over days
plan-and-build.ymlhover lines
jobs:
  review-plan:
    on:
      events:
        - source: github
          event: issue.comment
      until: event.body contains "/approve"
      debounce: 5m
    agent: planner
    session: persistent
    inherit_session: create-plan.planner
04multi-model best-of-N

3 models. 3 patches. 1 winner.

The same bug sent to three different LLMs in parallel. Each gets its own copy-on-write worktree, so there are no conflicts. A reviewer agent compares all patches and picks the best one.

matrix executionCOW worktrees
best-of-n.ymlhover lines
jobs:
  race:
    matrix:
      each: model
      in: [opus-4.7, gpt-5.4, qwen3.6-max]
    concurrency: 3
    isolation: worktree
    agent: coder
    model: "{{ matrix.model }}"

  judge:
    needs: race
    agent: reviewer
    prompt: Pick the best patch.
/self-improvement

Every run makes the next one better.

Workflows aren't write-once. After every execution, Shipfox reflects on what happened, drafts changes, and accumulates a project memory that all your agents share.

1

Reviewautomatic

After every run, the system reviews what worked, what failed, what was slow, and what cost too much. No manual triage required.

2

Suggestopens PRs

It proposes concrete changes: better prompts, tighter output schemas, missing edge cases in your YAML, or fixes directly in the codebase. Changes are submitted as PRs you review like anything else.

3

Rememberproject memory

Patterns, past failures, conventions, and decisions accumulate into a shared project memory that every agent in every workflow can read. The tenth workflow is cheaper and faster than the first.

/platform

Full visibility. Any model. Predictable costs.

The control plane teams need to operate this safely at scale: every run logged, every model swappable, every dollar accounted for.

Workflow observability

Every run logged end-to-end: trigger source, agent sessions, tool calls, outputs, duration, cost. Full audit trail across your entire fleet. No black boxes.

triage-sentry12.4s
review-pr2m 04s
plan-and-build6d 02h
best-of-n48.1s
  • Run history with full replay
  • Agent session logs
  • Tool call traces
  • Cost attribution per workflow

Use any model

Not locked into one provider. Use Anthropic, OpenAI, Google, Mistral, Qwen, DeepSeek, or any open-weight model. Plug in your own inference APIs or API keys. Mix models within a single workflow. Switch without rewriting.

AnthropicOpenAIGoogleMistralQwenDeepSeekSelf-hosted · vLLM, TGI, Ollama
  • Any public model provider
  • Bring your own API keys
  • Custom or self-hosted endpoints
  • Mix models per workflow

Cost control

Set budgets per workflow, per team, or per model. Hard caps and soft alerts. Real-time spend dashboard. No surprise bills from a runaway agent loop.

this month · platform-team$3,840 / $6,000
⚠ alert at 80 %hard cap at 90 %
  • Per-workflow budgets
  • Per-team limits
  • Spend alerts and hard caps
  • Real-time cost dashboard
/deployment

Run it your way.

The same engine, same YAML, same agents. Start free on your own infra, move to managed cloud when you stop wanting to wake up at 3am, or keep it all in your VPC.

/open-source

Open source

The core engine, fully open source. Fork it, audit it, extend it. Run on your own infrastructure.
$0forever · MIT licensed
  • Full source code
  • Community plugins
  • Self-managed runners
  • Community support
View on GitHub
Most teams
/cloud

Cloud

Managed control plane, runners, and inference. A predictable per-seat baseline that covers most teams, pay for overage only when usage spikes.
Per developer · baselineincluded
Each seat includes a monthly compute allocation and a pool of token-inference credits. Most teams stay within the baseline.
Above baseline · usageoverage
Heavy compute or inference workloads are billed by the minute and by the token. Set hard caps in the dashboard so spend can't surprise you.
  • Managed or self-hosted runners
  • Managed or BYO inference
  • Org-wide spend caps & alerts
  • SOC 2 Type II · 99.9 % SLA
/enterprise

Enterprise

Cloud or fully self-hosted. Enterprise license with SSO, RBAC, air-gapped deployments, and a dedicated CSM.
Custom· talk to sales
  • Cloud or on-prem deployment
  • SSO / SAML
  • Role-based access control
  • Air-gapped support
  • Priority support · dedicated CSM
/shipfox · 2026