Building a Self-Hosted Event-Driven Activity Stream with Kafka
Building a Self-Hosted Event-Driven Activity Stream with Kafka
How we implemented reliable activity tracking for $15/month with Docker.
The Problem
Every SaaS application needs activity tracking. Whether it’s for audit logs, user analytics, or debugging, knowing who did what and when is essential. But as your application grows, writing activity logs synchronously to your database creates problems:
- Increased latency on every API request
- Database contention during high-traffic periods
- Lost activities if the database write fails
- Tight coupling between your API and logging infrastructure
We faced this exact challenge with our home services platform. With thousands of daily API operations—creating appointments, updating customers, generating invoices—we needed a better approach.
The Solution: Self-Hosted Kafka with Docker
We implemented an event-driven activity stream using self-hosted Kafka that decouples activity logging from our main API. Everything runs in Docker containers on a single EC2 instance:
+------------------+ +----------------------------------------+
| Express API | | EC2 t3.small ($15/mo) |
| (Heroku) |------------------->| +----------+ +-------+ +----------+ |
+------------------+ :9092 | | Kafka | |Zookpr | | Consumer | |
| | | :9092 | | :2181 | | (Node) | |
| | +----------+ +-------+ +----------+ |
| (fallback if Kafka fails) | | | |
| | v v |
+----------------------------->| +--------+ +-----------+ |
| | DLQ | | Firestore | |
| +--------+ +-----------+ |
+----------------------------------------+
The flow:
- User performs an action (creates invoice, updates customer, etc.)
- API middleware captures the activity and sends it to Kafka
- Consumer (in same Docker network) receives the message and writes to Firestore
- Failed messages go to a Dead Letter Queue for investigation
Why Self-Hosted?
We originally planned to use Upstash (managed Kafka), but discovered they no longer offer Kafka. After evaluating alternatives:
- Confluent Cloud: $75+/month minimum
- AWS MSK Serverless: $0.10/hr (~$75/month)
- Redpanda Cloud: No free tier
- CloudKarafka: Limited free tier
We decided to self-host. For $15/month, we get:
- Full control over our infrastructure
- No message limits or throttling
- Same Docker Compose setup for local dev and production
- Native Kafka protocol (faster than REST APIs)
Key Design Decisions
1. Circuit Breaker Pattern
We implemented a circuit breaker to prevent cascading failures. If Kafka becomes unavailable, we don’t want every API request waiting for timeouts.
Circuit breaker states:
| State | Description |
|---|---|
| CLOSED | Normal operation, requests flow through |
| OPEN | Service failing, skip Kafka immediately |
| HALF_OPEN | Testing recovery, allow one request |
After 5 consecutive failures, the circuit “opens” and all requests bypass Kafka for 30 seconds. This protects our API’s response times even when Kafka is down.
2. Graceful Fallback
The producer only signals success or failure. The calling code (our middleware) decides what to do:
// Producer returns result, doesn't handle fallback
const result = await kafkaProducer.sendActivity(activityData);
if (result.success) {
return; // Activity will be written by consumer
}
// Kafka failed - fall back to direct write
logger.warn('Kafka send failed, falling back to Firestore');
await activityService.createActivity(activityData);
This separation of concerns makes each component simpler and more testable.
3. Lazy Connection
The producer doesn’t connect to Kafka at startup. It waits until the first message is sent. This means:
- Faster application startup
- No connection errors if Kafka isn’t running yet
- Graceful handling when Kafka is unavailable
4. Dead Letter Queue
Failed messages go to a Dead Letter Queue (DLQ) instead of being lost or retrying forever.
The DLQ message includes:
- Original message content
- Error details and stack trace
- Metadata (original topic, partition, offset)
- Timestamp of failure
This lets us investigate failures without blocking the main processing pipeline.
5. Message Keys for Ordering
We use companyId as the Kafka message key. This ensures all activities for a single company are processed in order (they go to the same partition).
The All-in-One Docker Setup
Everything runs in a single Docker Compose file:
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
# Memory: 256MB limit
kafka:
image: confluentinc/cp-kafka:7.5.0
# Memory: 768MB limit
# Exposes :9092 for external connections
kafka-init:
# Creates topics on startup
# activities (3 partitions)
# activities-dlq (1 partition)
activity-consumer:
# Our Node.js consumer
# Memory: 512MB limit
Total memory footprint: ~1.5GB, fits comfortably on t3.small (2GB RAM).
Infrastructure: Why t3.small
For self-hosted Kafka, you need enough resources for Zookeeper + Kafka + Consumer:
| t3.nano | t3.micro | t3.small | |
|---|---|---|---|
| RAM | 0.5 GB | 1 GB | 2 GB |
| CPU Baseline | 5% | 10% | 20% |
| Monthly Cost | ~$3.80 | ~$7.60 | ~$15.00 |
Why t3.small is the minimum:
- Kafka needs ~512MB heap minimum
- Zookeeper needs ~128MB
- Consumer + Node.js needs ~256MB
- OS and Docker overhead ~256MB
- Total: ~1.2GB, with headroom for spikes
t3.micro (1GB) would work but leaves no margin. One traffic spike and you’re OOM. The extra $7/month for t3.small buys reliability.
The Cost Breakdown
| Component | Cost/Month |
|---|---|
| EC2 t3.small (Kafka + Consumer) | ~$15.00 |
| TOTAL | ~$15/month |
Compare this to managed alternatives:
- Confluent Cloud: $75+/month
- AWS MSK Serverless: ~$75/month
- Heroku Kafka add-on: $100+/month
We’re running a production-grade event streaming system for the cost of a few coffees.
Local Development
The same Docker Compose works locally:
# Start everything
cd consumer
docker-compose up -d
# View logs
docker-compose logs -f activity-consumer
# Stop everything
docker-compose down
Your local backend connects to localhost:9092, the consumer connects to kafka:29092 (internal Docker network).
Deployment Steps
- Launch EC2 t3.small (Ubuntu 22.04)
- Install Docker
- Clone your repository
- Add Firebase credentials to consumer/credentials/
- Run:
docker compose up -d --build - Set Heroku config:
KAFKA_BROKERS=your-ec2-ip:9092
That’s it. Kafka, Zookeeper, and Consumer all start together.
Lessons Learned
1. Self-hosting isn’t scary
With Docker Compose, running Kafka is straightforward. The same file works locally and in production. No vendor lock-in, no surprise bills.
2. Design for failure
Every component can fail. The circuit breaker handles Kafka failures. The fallback handles producer failures. The DLQ handles consumer failures. Build resilience into every layer.
3. Right-size your infrastructure
The difference between t3.micro and t3.small is $7/month. That $7 buys you reliability, headroom, and peace of mind. Don’t cheap out on production infrastructure.
4. Native protocol beats REST
Using kafkajs (native Kafka protocol) instead of a REST wrapper means:
- Lower latency
- Better connection handling
- Native consumer groups (no polling)
- Proper backpressure
5. All-in-one simplifies operations
Running Kafka, Zookeeper, and Consumer on one instance means:
- One server to monitor
- One place to check logs
- Simple deployment
- Lower cost than separate instances
The Code Structure
backend/
src/
shared/kafka/
constants.js # Configuration
services/kafka/
circuit-breaker.js # Resilience pattern
client.js # Kafka client wrapper (kafkajs)
producer.js # Activity producer
middleware/
activity-tracker.js # Express middleware
consumer/
index.js # Entry point
activity-consumer.js # Kafka consumer
Dockerfile
docker-compose.yml # All-in-one setup
Scaling Considerations
Let’s be honest about the limits of t3.small.
What t3.small handles well:
- Hundreds to low thousands of messages/minute
- Activity tracking for small-to-medium SaaS
- Early-stage products with moderate traffic
Where it breaks down:
| Concern | Limitation |
|---|---|
| Memory | 2GB total, ~500MB headroom after Kafka + Zookeeper + Consumer |
| CPU | 20% baseline - sustained load burns credits, then throttles hard |
| Single broker | No redundancy - Kafka down = no messages |
| Disk I/O | Limited EBS bandwidth for high throughput |
When to upgrade:
- t3.medium ($30/mo): 4GB RAM, handles 2-3x more throughput
- t3.large ($60/mo): 8GB RAM, comfortable headroom for growth
- Multiple instances: Separate Kafka cluster from consumer
How to Scale When the Time Comes
Step 1: Vertical scaling (easiest)
Just upgrade the EC2 instance. Stop the instance, change type to t3.medium or t3.large, start it again. Your docker-compose.yml works unchanged. This buys you 2-4x headroom with zero code changes.
Step 2: Separate Kafka from Consumer
Run Kafka + Zookeeper on one instance, Consumer on another:
Instance 1 (t3.medium): Instance 2 (t3.micro):
- Zookeeper - activity-consumer
- Kafka - (just Node.js, light)
Update consumer’s KAFKA_BROKERS to point to Instance 1’s IP. This lets you scale each component independently.
Step 3: Multi-broker Kafka cluster
For redundancy and higher throughput, run multiple Kafka brokers:
# Add to docker-compose.yml
kafka-2:
image: confluentinc/cp-kafka:7.5.0
environment:
KAFKA_BROKER_ID: 2
# ... same config, different broker ID
kafka-3:
image: confluentinc/cp-kafka:7.5.0
environment:
KAFKA_BROKER_ID: 3
Update topic replication factor to 2 or 3. Now you have fault tolerance.
Step 4: Scale consumers horizontally
Need more processing power? Run multiple consumer instances:
docker-compose up -d --scale activity-consumer=3
Kafka’s consumer groups automatically distribute partitions across instances. Just make sure your topic has enough partitions (we created 3).
Step 5: Move to managed Kafka
When you’re running 3+ brokers across multiple instances, the operational overhead starts to outweigh the cost savings. At that point, Confluent Cloud or AWS MSK makes sense - let them handle the infrastructure while you focus on your product.
The bottom line: t3.small is a great starting point. It’ll handle years of growth for most small-to-medium apps. But it’s not a “scale forever” solution - it’s a “get started cheap and upgrade when needed” solution.
Don’t prematurely optimize. Start here, monitor your metrics, and upgrade when the numbers tell you to.
Conclusion
Event-driven architecture doesn’t have to be expensive or complex. With self-hosted Kafka in Docker, a t3.small instance, and careful design, we built a reliable activity stream for $15/month.
The key principles:
- Decouple your main application from ancillary processes
- Design for failure at every layer
- Keep components focused on single responsibilities
- Use Docker for consistent local and production environments
- Self-host when managed services are too expensive
If you’re building activity tracking, audit logs, or any async processing pipeline, I hope this architecture gives you some ideas.
📄 Want a Free App Planning Checklist PDF?
Get our comprehensive checklist to plan your app from idea to launch