AXME Cloud · 3 min read

I Stopped Building Webhook Retry Logic. Here's What I Use Instead.

Exponential backoff, jitter, dead letter queues, idempotency keys, HMAC verification - all to deliver one message reliably. There are better options now.

Every backend team eventually builds the same thing: reliable message delivery between services. And every team builds it wrong at least once.

The Webhook Retry Stack

Here’s what “just use webhooks” actually means in production:

# Receiver: build an HTTP endpoint
@app.post("/webhooks/orders")
async def receive_order(req):
    # Verify HMAC signature (or get spoofed)
    signature = req.headers.get("x-webhook-signature")
    if not verify_hmac(signature, req.body, WEBHOOK_SECRET):
        return {"error": "invalid signature"}, 401

    # Idempotency check (webhooks arrive twice, sometimes three times)
    idempotency_key = req.headers.get("x-idempotency-key")
    if db.exists("processed_webhooks", idempotency_key):
        return {"status": "already processed"}, 200

    process_order(req.json())
    db.insert("processed_webhooks", idempotency_key)
    return {"status": "ok"}, 200

# Sender: retry with backoff
async def send_with_retry(url, payload, max_retries=5):
    for attempt in range(max_retries):
        try:
            resp = requests.post(url, json=payload, headers=sign(payload))
            if resp.status_code == 200:
                return resp
            if resp.status_code >= 500:
                raise RetryableError()
        except (ConnectionError, Timeout, RetryableError):
            delay = min(2 ** attempt + random.uniform(0, 1), 300)
            await asyncio.sleep(delay)
    dlq.send(payload)  # dead letter queue
    alert("Webhook delivery failed after 5 retries")

And this is the simplified version. Production adds:

  • DLQ consumer that retries or alerts
  • Monitoring for delivery success rate
  • Alerting on DLQ depth
  • Cleanup cron for the idempotency table
  • Secret rotation for HMAC keys
  • Circuit breaker when receiver is down
  • Thundering herd protection when receiver comes back up

That’s 200+ lines of infrastructure code. For every pair of services that need to talk.

The Alternative: Let the Platform Deliver

from axme import AxmeClient, AxmeClientConfig

client = AxmeClient(AxmeClientConfig(api_key=os.environ["AXME_API_KEY"]))

intent_id = client.send_intent({
    "intent_type": "intent.order.process.v1",
    "to_agent": "agent://myorg/production/order-processor",
    "payload": {
        "order_id": "ORD-2026-00142",
        "customer": "acme-corp",
        "total": 4999.50,
    },
})
result = client.wait_for(intent_id)

No webhook endpoint on the receiver. No HMAC. No idempotency table. No retry logic. No DLQ. No monitoring for delivery failures.

The platform handles at-least-once delivery on all channels.

Five Ways to Receive (Not Just Webhooks)

The receiver picks the delivery mode that fits their architecture:

ModeTransportBest For
streamSSE (server-sent events)Real-time agents, always-on services
pollGET requestServerless functions, cron jobs
httpWebhook POSTTraditional services (but platform handles retry)
inboxHuman queueApprovals, reviews, manual tasks
internalPlatform-handledReminders, escalations, notifications

The sender doesn’t care which mode the receiver uses. send_intent() is the same regardless.

This is the key difference from webhooks: the receiver chooses how to get messages, not the sender. The sender doesn’t need to know if the receiver is a Lambda function, a Kubernetes pod, or a human with an email inbox.

What You Stop Building

ComponentWith WebhooksWith AXME
HTTP endpoint on receiverYou build itNot needed (for stream/poll/inbox)
HMAC verificationYou build itPlatform handles
Idempotency tableYou build itBuilt into intent lifecycle
Retry with backoffYou build itPlatform handles (configurable)
Dead letter queueYou build itPlatform handles
Delivery monitoringYou build itBuilt-in lifecycle events
Secret rotationYou managePlatform manages
Thundering herd protectionYou build itPlatform handles

When Webhooks Are Still Fine

Webhooks work well when:

  • The receiver is always up (99.9%+ uptime)
  • Occasional message loss is acceptable
  • You only have 2-3 service pairs communicating
  • You already have the retry infrastructure built

Webhooks break down when:

  • You have 10+ services that need reliable delivery
  • Receivers go down for minutes/hours (deploys, incidents)
  • You need delivery guarantees (financial transactions, compliance)
  • You need human approval gates in the delivery chain
  • You’re tired of debugging “why didn’t the webhook arrive”

Try It

Working example - sender submits an order, receiver processes it via SSE stream, no webhook endpoint needed:

github.com/AxmeAI/reliable-delivery-without-webhooks

Python, TypeScript, and Go implementations included.

Built with AXME - 5 delivery bindings with at-least-once guarantees. Alpha - feedback welcome.

More on AXME Cloud