Back to Blog

Building a Self-Hosted Event-Driven Activity Stream with Kafka

Dephyned
Kafka event-driven architecture activity tracking AWS Docker system design backend

Building a Self-Hosted Event-Driven Activity Stream with Kafka

How we implemented reliable activity tracking for $15/month with Docker.

The Problem

Every SaaS application needs activity tracking. Whether it’s for audit logs, user analytics, or debugging, knowing who did what and when is essential. But as your application grows, writing activity logs synchronously to your database creates problems:

  • Increased latency on every API request
  • Database contention during high-traffic periods
  • Lost activities if the database write fails
  • Tight coupling between your API and logging infrastructure

We faced this exact challenge with our home services platform. With thousands of daily API operations—creating appointments, updating customers, generating invoices—we needed a better approach.

The Solution: Self-Hosted Kafka with Docker

We implemented an event-driven activity stream using self-hosted Kafka that decouples activity logging from our main API. Everything runs in Docker containers on a single EC2 instance:

+------------------+                    +----------------------------------------+
|  Express API     |                    |         EC2 t3.small ($15/mo)          |
|  (Heroku)        |------------------->|  +----------+  +-------+  +----------+ |
+------------------+       :9092        |  |  Kafka   |  |Zookpr |  | Consumer | |
         |                              |  |  :9092   |  | :2181 |  |  (Node)  | |
         |                              |  +----------+  +-------+  +----------+ |
         | (fallback if Kafka fails)    |        |                       |       |
         |                              |        v                       v       |
         +----------------------------->|   +--------+            +-----------+  |
                                        |   |  DLQ   |            | Firestore |  |
                                        |   +--------+            +-----------+  |
                                        +----------------------------------------+

The flow:

  1. User performs an action (creates invoice, updates customer, etc.)
  2. API middleware captures the activity and sends it to Kafka
  3. Consumer (in same Docker network) receives the message and writes to Firestore
  4. Failed messages go to a Dead Letter Queue for investigation

Why Self-Hosted?

We originally planned to use Upstash (managed Kafka), but discovered they no longer offer Kafka. After evaluating alternatives:

  • Confluent Cloud: $75+/month minimum
  • AWS MSK Serverless: $0.10/hr (~$75/month)
  • Redpanda Cloud: No free tier
  • CloudKarafka: Limited free tier

We decided to self-host. For $15/month, we get:

  • Full control over our infrastructure
  • No message limits or throttling
  • Same Docker Compose setup for local dev and production
  • Native Kafka protocol (faster than REST APIs)

Key Design Decisions

1. Circuit Breaker Pattern

We implemented a circuit breaker to prevent cascading failures. If Kafka becomes unavailable, we don’t want every API request waiting for timeouts.

Circuit breaker states:

StateDescription
CLOSEDNormal operation, requests flow through
OPENService failing, skip Kafka immediately
HALF_OPENTesting recovery, allow one request

After 5 consecutive failures, the circuit “opens” and all requests bypass Kafka for 30 seconds. This protects our API’s response times even when Kafka is down.

2. Graceful Fallback

The producer only signals success or failure. The calling code (our middleware) decides what to do:

// Producer returns result, doesn't handle fallback
const result = await kafkaProducer.sendActivity(activityData);

if (result.success) {
  return; // Activity will be written by consumer
}

// Kafka failed - fall back to direct write
logger.warn('Kafka send failed, falling back to Firestore');
await activityService.createActivity(activityData);

This separation of concerns makes each component simpler and more testable.

3. Lazy Connection

The producer doesn’t connect to Kafka at startup. It waits until the first message is sent. This means:

  • Faster application startup
  • No connection errors if Kafka isn’t running yet
  • Graceful handling when Kafka is unavailable

4. Dead Letter Queue

Failed messages go to a Dead Letter Queue (DLQ) instead of being lost or retrying forever.

The DLQ message includes:

  • Original message content
  • Error details and stack trace
  • Metadata (original topic, partition, offset)
  • Timestamp of failure

This lets us investigate failures without blocking the main processing pipeline.

5. Message Keys for Ordering

We use companyId as the Kafka message key. This ensures all activities for a single company are processed in order (they go to the same partition).

The All-in-One Docker Setup

Everything runs in a single Docker Compose file:

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    # Memory: 256MB limit

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    # Memory: 768MB limit
    # Exposes :9092 for external connections

  kafka-init:
    # Creates topics on startup
    # activities (3 partitions)
    # activities-dlq (1 partition)

  activity-consumer:
    # Our Node.js consumer
    # Memory: 512MB limit

Total memory footprint: ~1.5GB, fits comfortably on t3.small (2GB RAM).

Infrastructure: Why t3.small

For self-hosted Kafka, you need enough resources for Zookeeper + Kafka + Consumer:

t3.nanot3.microt3.small
RAM0.5 GB1 GB2 GB
CPU Baseline5%10%20%
Monthly Cost~$3.80~$7.60~$15.00

Why t3.small is the minimum:

  • Kafka needs ~512MB heap minimum
  • Zookeeper needs ~128MB
  • Consumer + Node.js needs ~256MB
  • OS and Docker overhead ~256MB
  • Total: ~1.2GB, with headroom for spikes

t3.micro (1GB) would work but leaves no margin. One traffic spike and you’re OOM. The extra $7/month for t3.small buys reliability.

The Cost Breakdown

ComponentCost/Month
EC2 t3.small (Kafka + Consumer)~$15.00
TOTAL~$15/month

Compare this to managed alternatives:

  • Confluent Cloud: $75+/month
  • AWS MSK Serverless: ~$75/month
  • Heroku Kafka add-on: $100+/month

We’re running a production-grade event streaming system for the cost of a few coffees.

Local Development

The same Docker Compose works locally:

# Start everything
cd consumer
docker-compose up -d

# View logs
docker-compose logs -f activity-consumer

# Stop everything
docker-compose down

Your local backend connects to localhost:9092, the consumer connects to kafka:29092 (internal Docker network).

Deployment Steps

  1. Launch EC2 t3.small (Ubuntu 22.04)
  2. Install Docker
  3. Clone your repository
  4. Add Firebase credentials to consumer/credentials/
  5. Run: docker compose up -d --build
  6. Set Heroku config: KAFKA_BROKERS=your-ec2-ip:9092

That’s it. Kafka, Zookeeper, and Consumer all start together.

Lessons Learned

1. Self-hosting isn’t scary

With Docker Compose, running Kafka is straightforward. The same file works locally and in production. No vendor lock-in, no surprise bills.

2. Design for failure

Every component can fail. The circuit breaker handles Kafka failures. The fallback handles producer failures. The DLQ handles consumer failures. Build resilience into every layer.

3. Right-size your infrastructure

The difference between t3.micro and t3.small is $7/month. That $7 buys you reliability, headroom, and peace of mind. Don’t cheap out on production infrastructure.

4. Native protocol beats REST

Using kafkajs (native Kafka protocol) instead of a REST wrapper means:

  • Lower latency
  • Better connection handling
  • Native consumer groups (no polling)
  • Proper backpressure

5. All-in-one simplifies operations

Running Kafka, Zookeeper, and Consumer on one instance means:

  • One server to monitor
  • One place to check logs
  • Simple deployment
  • Lower cost than separate instances

The Code Structure

backend/
  src/
    shared/kafka/
      constants.js          # Configuration
    services/kafka/
      circuit-breaker.js    # Resilience pattern
      client.js             # Kafka client wrapper (kafkajs)
      producer.js           # Activity producer
    middleware/
      activity-tracker.js   # Express middleware
  consumer/
    index.js                # Entry point
    activity-consumer.js    # Kafka consumer
    Dockerfile
    docker-compose.yml      # All-in-one setup

Scaling Considerations

Let’s be honest about the limits of t3.small.

What t3.small handles well:

  • Hundreds to low thousands of messages/minute
  • Activity tracking for small-to-medium SaaS
  • Early-stage products with moderate traffic

Where it breaks down:

ConcernLimitation
Memory2GB total, ~500MB headroom after Kafka + Zookeeper + Consumer
CPU20% baseline - sustained load burns credits, then throttles hard
Single brokerNo redundancy - Kafka down = no messages
Disk I/OLimited EBS bandwidth for high throughput

When to upgrade:

  • t3.medium ($30/mo): 4GB RAM, handles 2-3x more throughput
  • t3.large ($60/mo): 8GB RAM, comfortable headroom for growth
  • Multiple instances: Separate Kafka cluster from consumer

How to Scale When the Time Comes

Step 1: Vertical scaling (easiest)

Just upgrade the EC2 instance. Stop the instance, change type to t3.medium or t3.large, start it again. Your docker-compose.yml works unchanged. This buys you 2-4x headroom with zero code changes.

Step 2: Separate Kafka from Consumer

Run Kafka + Zookeeper on one instance, Consumer on another:

Instance 1 (t3.medium):     Instance 2 (t3.micro):
- Zookeeper                 - activity-consumer
- Kafka                     - (just Node.js, light)

Update consumer’s KAFKA_BROKERS to point to Instance 1’s IP. This lets you scale each component independently.

Step 3: Multi-broker Kafka cluster

For redundancy and higher throughput, run multiple Kafka brokers:

# Add to docker-compose.yml
kafka-2:
  image: confluentinc/cp-kafka:7.5.0
  environment:
    KAFKA_BROKER_ID: 2
    # ... same config, different broker ID

kafka-3:
  image: confluentinc/cp-kafka:7.5.0
  environment:
    KAFKA_BROKER_ID: 3

Update topic replication factor to 2 or 3. Now you have fault tolerance.

Step 4: Scale consumers horizontally

Need more processing power? Run multiple consumer instances:

docker-compose up -d --scale activity-consumer=3

Kafka’s consumer groups automatically distribute partitions across instances. Just make sure your topic has enough partitions (we created 3).

Step 5: Move to managed Kafka

When you’re running 3+ brokers across multiple instances, the operational overhead starts to outweigh the cost savings. At that point, Confluent Cloud or AWS MSK makes sense - let them handle the infrastructure while you focus on your product.

The bottom line: t3.small is a great starting point. It’ll handle years of growth for most small-to-medium apps. But it’s not a “scale forever” solution - it’s a “get started cheap and upgrade when needed” solution.

Don’t prematurely optimize. Start here, monitor your metrics, and upgrade when the numbers tell you to.

Conclusion

Event-driven architecture doesn’t have to be expensive or complex. With self-hosted Kafka in Docker, a t3.small instance, and careful design, we built a reliable activity stream for $15/month.

The key principles:

  • Decouple your main application from ancillary processes
  • Design for failure at every layer
  • Keep components focused on single responsibilities
  • Use Docker for consistent local and production environments
  • Self-host when managed services are too expensive

If you’re building activity tracking, audit logs, or any async processing pipeline, I hope this architecture gives you some ideas.

📄 Want a Free App Planning Checklist PDF?

Get our comprehensive checklist to plan your app from idea to launch