Building a Self-Hosted Event-Driven Activity Stream with Kafka

How we implemented reliable activity tracking for $15/month with Docker.

The Problem

Every SaaS application needs activity tracking. Whether it’s for audit logs, user analytics, or debugging, knowing who did what and when is essential. But as your application grows, writing activity logs synchronously to your database creates problems:

Increased latency on every API request
Database contention during high-traffic periods
Lost activities if the database write fails
Tight coupling between your API and logging infrastructure

We faced this exact challenge with our home services platform. With thousands of daily API operations—creating appointments, updating customers, generating invoices—we needed a better approach.

The Solution: Self-Hosted Kafka with Docker

We implemented an event-driven activity stream using self-hosted Kafka that decouples activity logging from our main API. Everything runs in Docker containers on a single EC2 instance:

+------------------+                    +----------------------------------------+
|  Express API     |                    |         EC2 t3.small ($15/mo)          |
|  (Heroku)        |------------------->|  +----------+  +-------+  +----------+ |
+------------------+       :9092        |  |  Kafka   |  |Zookpr |  | Consumer | |
         |                              |  |  :9092   |  | :2181 |  |  (Node)  | |
         |                              |  +----------+  +-------+  +----------+ |
         | (fallback if Kafka fails)    |        |                       |       |
         |                              |        v                       v       |
         +----------------------------->|   +--------+            +-----------+  |
                                        |   |  DLQ   |            | Firestore |  |
                                        |   +--------+            +-----------+  |
                                        +----------------------------------------+

The flow:

User performs an action (creates invoice, updates customer, etc.)
API middleware captures the activity and sends it to Kafka
Consumer (in same Docker network) receives the message and writes to Firestore
Failed messages go to a Dead Letter Queue for investigation

Why Self-Hosted?

We originally planned to use Upstash (managed Kafka), but discovered they no longer offer Kafka. After evaluating alternatives:

Confluent Cloud: $75+/month minimum
AWS MSK Serverless: $0.10/hr (~$75/month)
Redpanda Cloud: No free tier
CloudKarafka: Limited free tier

We decided to self-host. For $15/month, we get:

Full control over our infrastructure
No message limits or throttling
Same Docker Compose setup for local dev and production
Native Kafka protocol (faster than REST APIs)

Key Design Decisions

1. Circuit Breaker Pattern

We implemented a circuit breaker to prevent cascading failures. If Kafka becomes unavailable, we don’t want every API request waiting for timeouts.

Circuit breaker states:

State	Description
CLOSED	Normal operation, requests flow through
OPEN	Service failing, skip Kafka immediately
HALF_OPEN	Testing recovery, allow one request

After 5 consecutive failures, the circuit “opens” and all requests bypass Kafka for 30 seconds. This protects our API’s response times even when Kafka is down.

2. Graceful Fallback

The producer only signals success or failure. The calling code (our middleware) decides what to do:

// Producer returns result, doesn't handle fallback
const result = await kafkaProducer.sendActivity(activityData);

if (result.success) {
  return; // Activity will be written by consumer
}

// Kafka failed - fall back to direct write
logger.warn('Kafka send failed, falling back to Firestore');
await activityService.createActivity(activityData);

This separation of concerns makes each component simpler and more testable.

3. Lazy Connection

The producer doesn’t connect to Kafka at startup. It waits until the first message is sent. This means:

Faster application startup
No connection errors if Kafka isn’t running yet
Graceful handling when Kafka is unavailable

4. Dead Letter Queue

Failed messages go to a Dead Letter Queue (DLQ) instead of being lost or retrying forever.

The DLQ message includes:

Original message content
Error details and stack trace
Metadata (original topic, partition, offset)
Timestamp of failure

This lets us investigate failures without blocking the main processing pipeline.

5. Message Keys for Ordering

We use companyId as the Kafka message key. This ensures all activities for a single company are processed in order (they go to the same partition).

The All-in-One Docker Setup

Everything runs in a single Docker Compose file:

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    # Memory: 256MB limit

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    # Memory: 768MB limit
    # Exposes :9092 for external connections

  kafka-init:
    # Creates topics on startup
    # activities (3 partitions)
    # activities-dlq (1 partition)

  activity-consumer:
    # Our Node.js consumer
    # Memory: 512MB limit

Total memory footprint: ~1.5GB, fits comfortably on t3.small (2GB RAM).

Infrastructure: Why t3.small

For self-hosted Kafka, you need enough resources for Zookeeper + Kafka + Consumer:

	t3.nano	t3.micro	t3.small
RAM	0.5 GB	1 GB	2 GB
CPU Baseline	5%	10%	20%
Monthly Cost	~$3.80	~$7.60	~$15.00

Why t3.small is the minimum:

Kafka needs ~512MB heap minimum
Zookeeper needs ~128MB
Consumer + Node.js needs ~256MB
OS and Docker overhead ~256MB
Total: ~1.2GB, with headroom for spikes

t3.micro (1GB) would work but leaves no margin. One traffic spike and you’re OOM. The extra $7/month for t3.small buys reliability.

The Cost Breakdown

Component	Cost/Month
EC2 t3.small (Kafka + Consumer)	~$15.00
TOTAL	~$15/month

Compare this to managed alternatives:

Confluent Cloud: $75+/month
AWS MSK Serverless: ~$75/month
Heroku Kafka add-on: $100+/month

We’re running a production-grade event streaming system for the cost of a few coffees.

Local Development

The same Docker Compose works locally:

# Start everything
cd consumer
docker-compose up -d

# View logs
docker-compose logs -f activity-consumer

# Stop everything
docker-compose down

Your local backend connects to localhost:9092, the consumer connects to kafka:29092 (internal Docker network).

Deployment Steps

Launch EC2 t3.small (Ubuntu 22.04)
Install Docker
Clone your repository
Add Firebase credentials to consumer/credentials/
Run: docker compose up -d --build
Set Heroku config: KAFKA_BROKERS=your-ec2-ip:9092

That’s it. Kafka, Zookeeper, and Consumer all start together.

Lessons Learned

1. Self-hosting isn’t scary

With Docker Compose, running Kafka is straightforward. The same file works locally and in production. No vendor lock-in, no surprise bills.

2. Design for failure

Every component can fail. The circuit breaker handles Kafka failures. The fallback handles producer failures. The DLQ handles consumer failures. Build resilience into every layer.

3. Right-size your infrastructure

The difference between t3.micro and t3.small is $7/month. That $7 buys you reliability, headroom, and peace of mind. Don’t cheap out on production infrastructure.

4. Native protocol beats REST

Using kafkajs (native Kafka protocol) instead of a REST wrapper means:

Lower latency
Better connection handling
Native consumer groups (no polling)
Proper backpressure

5. All-in-one simplifies operations

Running Kafka, Zookeeper, and Consumer on one instance means:

One server to monitor
One place to check logs
Simple deployment
Lower cost than separate instances

The Code Structure

backend/
  src/
    shared/kafka/
      constants.js          # Configuration
    services/kafka/
      circuit-breaker.js    # Resilience pattern
      client.js             # Kafka client wrapper (kafkajs)
      producer.js           # Activity producer
    middleware/
      activity-tracker.js   # Express middleware
  consumer/
    index.js                # Entry point
    activity-consumer.js    # Kafka consumer
    Dockerfile
    docker-compose.yml      # All-in-one setup

Scaling Considerations

Let’s be honest about the limits of t3.small.

What t3.small handles well:

Hundreds to low thousands of messages/minute
Activity tracking for small-to-medium SaaS
Early-stage products with moderate traffic

Where it breaks down:

Concern	Limitation
Memory	2GB total, ~500MB headroom after Kafka + Zookeeper + Consumer
CPU	20% baseline - sustained load burns credits, then throttles hard
Single broker	No redundancy - Kafka down = no messages
Disk I/O	Limited EBS bandwidth for high throughput

When to upgrade:

t3.medium ($30/mo): 4GB RAM, handles 2-3x more throughput
t3.large ($60/mo): 8GB RAM, comfortable headroom for growth
Multiple instances: Separate Kafka cluster from consumer

How to Scale When the Time Comes

Step 1: Vertical scaling (easiest)

Just upgrade the EC2 instance. Stop the instance, change type to t3.medium or t3.large, start it again. Your docker-compose.yml works unchanged. This buys you 2-4x headroom with zero code changes.

Step 2: Separate Kafka from Consumer

Run Kafka + Zookeeper on one instance, Consumer on another:

Instance 1 (t3.medium):     Instance 2 (t3.micro):
- Zookeeper                 - activity-consumer
- Kafka                     - (just Node.js, light)

Update consumer’s KAFKA_BROKERS to point to Instance 1’s IP. This lets you scale each component independently.

Step 3: Multi-broker Kafka cluster

For redundancy and higher throughput, run multiple Kafka brokers:

# Add to docker-compose.yml
kafka-2:
  image: confluentinc/cp-kafka:7.5.0
  environment:
    KAFKA_BROKER_ID: 2
    # ... same config, different broker ID

kafka-3:
  image: confluentinc/cp-kafka:7.5.0
  environment:
    KAFKA_BROKER_ID: 3

Update topic replication factor to 2 or 3. Now you have fault tolerance.

Step 4: Scale consumers horizontally

Need more processing power? Run multiple consumer instances:

docker-compose up -d --scale activity-consumer=3

Kafka’s consumer groups automatically distribute partitions across instances. Just make sure your topic has enough partitions (we created 3).

Step 5: Move to managed Kafka

When you’re running 3+ brokers across multiple instances, the operational overhead starts to outweigh the cost savings. At that point, Confluent Cloud or AWS MSK makes sense - let them handle the infrastructure while you focus on your product.

The bottom line: t3.small is a great starting point. It’ll handle years of growth for most small-to-medium apps. But it’s not a “scale forever” solution - it’s a “get started cheap and upgrade when needed” solution.

Don’t prematurely optimize. Start here, monitor your metrics, and upgrade when the numbers tell you to.

Conclusion

Event-driven architecture doesn’t have to be expensive or complex. With self-hosted Kafka in Docker, a t3.small instance, and careful design, we built a reliable activity stream for $15/month.

The key principles:

Decouple your main application from ancillary processes
Design for failure at every layer
Keep components focused on single responsibilities
Use Docker for consistent local and production environments
Self-host when managed services are too expensive

If you’re building activity tracking, audit logs, or any async processing pipeline, I hope this architecture gives you some ideas.