How We Built a Scalable Email & SMS System with RabbitMQ
How We Built a Scalable Email & SMS System with RabbitMQ
When you’re running a SaaS platform that sends thousands of transactional emails and SMS messages daily—appointment reminders, invoices, estimates, lead notifications—you quickly realize that “just send it” doesn’t scale.
Here’s how we evolved our messaging architecture from direct API calls to a queue-based system that handles failures gracefully, prevents duplicates, and keeps our main application responsive.
The Problem with Direct Sending
Our initial implementation was straightforward:
await emailService.sendInvoiceEmail(customer.email, invoiceData);
await messagingService.sendSMS(customer.phone, reminderText);
This works fine for a few hundred messages a day. But as we scaled, we hit several issues:
1. Request Timeouts
Email and SMS providers sometimes take 2-5 seconds to respond. When a user clicks “Send Invoice,” they shouldn’t wait for Twilio or SendGrid to acknowledge the message.
2. No Retry Logic
If SendGrid returns a 503, the email just… doesn’t send. The user might not even know it failed.
3. Duplicate Messages
Network hiccups could cause our code to retry, sending the same appointment reminder twice. Customers don’t appreciate that.
4. Scaling Bottlenecks
Our API servers were doing the heavy lifting of email rendering and API calls instead of just handling HTTP requests.
The Solution: Decouple with Message Queues
We introduced RabbitMQ as a message broker between our main backend and a dedicated messaging service. Here’s the architecture:
┌─────────────┐ ┌───────────┐ ┌──────────────────┐
│ Backend │────▶│ RabbitMQ │────▶│ Messaging Service│
│ (API) │ │ (Queue) │ │ (Consumer) │
└─────────────┘ └───────────┘ └──────────────────┘
│ │
│ ▼
│ ┌──────────────────┐
│ │ Email Providers │
│ │ (SendGrid, etc.) │
│ └──────────────────┘
│ │
▼ ▼
┌───────────┐ ┌──────────────┐
│ Events │◀─────│ Twilio │
│ Queue │ └──────────────┘
└───────────┘
How It Works
1. Publishing Messages
When our backend needs to send an email, it publishes a message to RabbitMQ instead of calling the provider directly:
const { getMessagePublisher } = require('./infrastructure/rabbitmq');
async function sendInvoiceEmail(invoice, customer) {
const publisher = getMessagePublisher();
const result = await publisher.publishEmail({
to: customer.email,
subject: `Invoice #${invoice.number} from ${company.name}`,
html: renderInvoiceTemplate(invoice),
context: {
companyId: invoice.companyId,
entityType: 'invoice',
entityId: invoice.id,
recipientId: customer.id
}
});
// Returns immediately - email is queued
return { queued: true, messageId: result.messageId };
}
The API response is instant. The actual email sending happens asynchronously.
2. Queue Structure
We use three queues for different message types:
| Queue | Purpose |
|---|---|
| messaging.email | Standard transactional emails |
| messaging.email.priority | Urgent emails (password resets, OTPs) |
| messaging.sms | All SMS messages |
Priority emails get processed first, ensuring time-sensitive messages aren’t stuck behind a batch of marketing emails.
3. The Consumer Service
A separate Node.js service consumes messages from these queues:
channel.consume('messaging.email', async (message) => {
const { payload, context, retry } = JSON.parse(message.content);
// Idempotency check - prevent duplicates
if (await wasAlreadySent(message.messageId)) {
channel.ack(message);
return;
}
try {
const result = await sendViaProvider(payload, context);
await markAsSent(message.messageId);
await publishEvent('message.sent', {
correlationId: message.messageId,
provider: result.provider
});
channel.ack(message);
} catch (error) {
if (retry.currentAttempt < retry.maxAttempts) {
channel.nack(message, false, true); // Requeue
} else {
channel.nack(message, false, false); // Dead letter queue
await publishEvent('message.failed', {
correlationId: message.messageId,
error: error.message
});
}
}
});
4. Event Feedback Loop
When the consumer sends a message (or fails permanently), it publishes an event back to RabbitMQ. Our backend listens for these events to update notification status:
consumer.on('message.sent', async (event) => {
await notificationService.updateStatus(event.correlationId, 'sent');
});
consumer.on('message.failed', async (event) => {
await notificationService.updateStatus(event.correlationId, 'failed');
});
Key Design Decisions
Idempotency with Redis
Every message has a unique ID. Before sending, the consumer checks Redis to see if that ID was already processed. This prevents duplicates even if a message is redelivered.
async function wasAlreadySent(messageId) {
const key = `sent:${messageId}`;
const exists = await redis.exists(key);
if (!exists) {
await redis.set(key, '1', 'EX', 604800); // 7-day TTL
}
return exists;
}
Graceful Fallback
We kept direct-send code paths intact. If RabbitMQ is down, the system falls back to synchronous sending:
if (publisher.isEnabled()) {
await publisher.publishEmail(emailData); // Queue-based
} else {
await emailService.sendRawEmail(emailData); // Direct fallback
}
Dead Letter Queue
Messages that fail after 3 retries go to a dead letter queue. A separate process alerts our team and allows manual retry.
Separate Cron Jobs
Time-based messaging runs in the messaging service:
| Job | Schedule | Purpose |
|---|---|---|
| Appointment Reminders | Every 15 min | Reminders 7 days and 1 day before |
| Lead Nurture | Every hour | Process drip email campaigns |
Results
After deploying this architecture:
- API response times dropped 40% - No more waiting for email providers
- Message delivery rate improved to 99.7% - Retries catch transient failures
- Zero duplicate messages - Idempotency keys work
- Better visibility - Every message tracked with full context
- Easier debugging - Failed messages in DLQ with error details
When You Don’t Need This
This adds complexity. You probably don’t need it if:
- You send fewer than 1,000 messages/day
- Message delivery isn’t business-critical
- You’re a small team without DevOps capacity
- Your providers have built-in retry
Start simple. Add queues when you feel the pain.
Tech Stack
- Message Broker: RabbitMQ
- Backend: Node.js/Express
- Consumer: Separate Node.js service
- Idempotency Store: Redis
- Email Providers: SendGrid, Mailgun, SMTP
- SMS Provider: Twilio
Conclusion
Decoupling message sending from your main application feels like overkill—until it isn’t. The first time your app stays responsive during a SendGrid outage, or you catch a duplicate before it annoys a customer, you’ll be glad you made the investment.
Build it incrementally. Start with queue infrastructure, add consumers, then migrate high-volume messages first. Keep fallback paths working until you trust the system.
📄 Want a Free App Planning Checklist PDF?
Get our comprehensive checklist to plan your app from idea to launch