Project Details

Alertive Webhook Service - Real-Time Event Delivery for NHS Trusts

I built a serverless webhook service that delivers real-time conversation events to each trust's own systems, handling encryption, authentication, concurrent delivery, and automatic retries.

Alertive is a secure messaging platform used by NHS hospitals to replace legacy pager systems.

The Alertive Webhook Service is a serverless event processing pipeline that delivers real time conversation notifications to NHS trust systems. Built as a lightweight, fault tolerant microservice, it listens for events from the Alertive messaging platform, things like new messages, read receipts, and conversation closures, and pushes them securely to each trust's own systems with automatic retries and flexible authentication.

Alertive is a conversational messaging platform that helps organisations engage with customers through structured communication channels. As NHS trusts adopted the platform, they needed a way to feed conversation activity into their own infrastructure, patient management systems, analytics dashboards, internal tooling, in real time, as events happened.

The Challenge

NHS trusts needed to know the moment something happened in a conversation, a message arrived, a response was sent, a case was closed, and have that information land reliably in their own systems. Every trust had different security requirements, different endpoints, and different preferences for which events mattered to them. The service needed to handle a firehose of encrypted conversation data, deliver webhooks to multiple destinations per trust simultaneously, and keep going when downstream systems were temporarily unavailable, all within a tight 30 second processing window.

Key requirements included:

  • Decrypting sensitive conversation data on the fly using AES-256 encryption
  • Supporting multiple authentication methods per endpoint (OAuth2, token-based, username/password)
  • Delivering to several endpoints at once, each with its own independent retry logic
  • Letting each trust choose which conversation events trigger notifications
  • Running across three isolated deployment environments with tightly scoped permissions
  • Automatically cleaning up temporary conversation data after 30 days

Technical Architecture

Event Processing

The service picks up batched events from a Kinesis stream, unpacks each one, and decides what to do based on the event type. When a new conversation starts, webhook settings are saved for later. When something happens during a conversation, a message comes in, someone reads it, the conversation closes, the service looks up which endpoints care about that event and kicks off delivery. Encrypted message content and user names are decrypted using AES-256, with the service falling back gracefully if decryption hits a snag.

Data Storage

Two DynamoDB tables keep track of webhook state. One stores trust-level configuration, endpoint URLs, credentials references, which events to listen for, timeout and retry settings. The other stores per conversation webhook bindings that automatically expire after 30 days, so there's no need for manual cleanup jobs. This separation keeps stable trust config cleanly apart from short lived conversation data.

Authentication

Each webhook endpoint points to a named secret in AWS Secrets Manager holding the credentials for that destination. The service supports three authentication styles: OAuth2 (the full client credentials handshake, with token caching so it doesn't re-authenticate on every request), Bearer tokens (a simple pass-through), and Basic auth (username and password). Trusts can rotate credentials in Secrets Manager without any code changes.

Delivery

Webhooks fire off to all configured endpoints at the same time using concurrent requests. Each endpoint gets its own retry logic, if delivery fails, the service waits and tries again with progressively longer delays (1s, 2s, 4s, capped at 5s). It's smart enough to stop retrying immediately on permanent errors (like a bad request) but will keep trying on temporary failures. Connection pooling keeps things efficient across retries.

Key Features

  • Flexible Authentication: OAuth2, Bearer tokens, and Basic auth on a per endpoint basis, with credentials managed entirely through AWS Secrets Manager, rotate them any time without touching code.
  • Smart Retries: Configurable retry attempts with progressive backoff delays, and intelligent behaviour that stops retrying when the error isn't going to fix itself.
  • Event Filtering: Trusts choose exactly which events trigger webhooks, and each conversation inherits those preferences automatically at creation time.
  • Encrypted Payload Handling: Sensitive message content decrypted on the fly with environment aware key management.
  • Built to Keep Going: Failed records, decryption hiccups, missing configurations, and delivery failures are all logged but never crash the service, it processes what it can and moves on.
  • Self-Cleaning Storage: Conversation webhook bindings expire automatically after 30 days, preventing data buildup without needing scheduled maintenance.

Results

  • Processes up to 100 events per batch within the 30 second execution window
  • Delivers concurrently to multiple authenticated endpoints per trust
  • Runs across three isolated environments (testing, staging, production) in separate AWS accounts
  • Thorough test coverage across unit and integration tests with mocked AWS services and HTTPS calls

Technologies Used

Backend: Node.js 22, ES Modules, AWS Lambda, Kinesis
Data: DynamoDB, AWS Secrets Manager, AES-256 encryption
Networking: Axios with connection pooling, OAuth2 client credentials flow, exponential backoff retries
Infrastructure: AWS SAM (CloudFormation), scoped IAM policies, X-Ray tracing, structured CloudWatch logging
CI/CD: Bitbucket Pipelines, manual deployment gates, multi account deployments
Testing: Jest, aws-sdk-client-mock, Nock HTTP mocking, ESLint

Looking for a fractional Rails engineer or CTO?

I take on a limited number of part-time clients.

Get in touch