OpenGuilds

99.1% accuracy across 200,000+ tasks. The number gets cited a lot. Here is exactly how we produce it.

The core formula

Every task is routed to N workers (default 3). Each worker submits a result and a confidence score. We run a weighted consensus calculation:

winnerVotes     = votes_for_winner / total_votes
weightedConf    = mean(confidence scores for winner votes)
finalConfidence = winnerVotes × weightedConf

if finalConfidence >= min_confidence → completed
else if attempts < 3                → re_queued
else                                → failed

The default min_confidence is 0.85. Developers can raise or lower it per task. Raising it (e.g. to 0.95) improves accuracy at the cost of higher re-queue rates and latency. Lowering it (e.g. to 0.70) increases throughput for tasks where near-perfect precision isn't required.

Why we use confidence weighting, not simple majority

Simple majority voting has a well-known failure mode: three workers who are all uncertain can outvote one worker who is highly confident. Our formula accounts for this. A 3-0 vote where all three workers submit 0.55 confidence produces a finalConfidence of 0.55 — which fails a 0.85 threshold and triggers re-queue. A 2-1 vote where the two agreement workers submit 0.95 confidence produces a finalConfidence of 0.63 — which may still fail, depending on your threshold.

This means high-confidence minority results surface for review rather than getting silently outvoted. Edge cases get caught, not buried.

Re-queuing and the 3-attempt limit

When confidence falls below threshold, the task enters re_queued status with a fresh set of workers — different workers than the previous attempt to avoid anchoring effects. If confidence fails on all three attempts, the task transitions to failed status and the developer is not charged.

In practice, less than 0.4% of tasks reach 3 failed attempts. Most low-confidence tasks resolve on the second attempt when ambiguous cases are clarified by workers with more domain familiarity.

The AI fallback and consensus

If the AI fallback fires (default: 5 minutes with no human assignment), the Claude agent submits a result and confidence score into the same consensus pipeline. From the developer's perspective, the result looks identical. The optional include_completion_source field reveals whether the final result was human or AI.

Inside the Consensus Protocol: How We Guarantee Accuracy

The core formula

Why we use confidence weighting, not simple majority

Re-queuing and the 3-attempt limit

The AI fallback and consensus