Back to Blog

How We Built a Scalable Email & SMS System with RabbitMQ

Dephyned
software architecture Node.js RabbitMQ microservices system design backend

How We Built a Scalable Email & SMS System with RabbitMQ

When you’re running a SaaS platform that sends thousands of transactional emails and SMS messages daily—appointment reminders, invoices, estimates, lead notifications—you quickly realize that “just send it” doesn’t scale.

Here’s how we evolved our messaging architecture from direct API calls to a queue-based system that handles failures gracefully, prevents duplicates, and keeps our main application responsive.

The Problem with Direct Sending

Our initial implementation was straightforward:

await emailService.sendInvoiceEmail(customer.email, invoiceData);
await messagingService.sendSMS(customer.phone, reminderText);

This works fine for a few hundred messages a day. But as we scaled, we hit several issues:

1. Request Timeouts

Email and SMS providers sometimes take 2-5 seconds to respond. When a user clicks “Send Invoice,” they shouldn’t wait for Twilio or SendGrid to acknowledge the message.

2. No Retry Logic

If SendGrid returns a 503, the email just… doesn’t send. The user might not even know it failed.

3. Duplicate Messages

Network hiccups could cause our code to retry, sending the same appointment reminder twice. Customers don’t appreciate that.

4. Scaling Bottlenecks

Our API servers were doing the heavy lifting of email rendering and API calls instead of just handling HTTP requests.

The Solution: Decouple with Message Queues

We introduced RabbitMQ as a message broker between our main backend and a dedicated messaging service. Here’s the architecture:

┌─────────────┐     ┌───────────┐     ┌──────────────────┐
│   Backend   │────▶│  RabbitMQ │────▶│ Messaging Service│
│   (API)     │     │  (Queue)  │     │   (Consumer)     │
└─────────────┘     └───────────┘     └──────────────────┘
                          │                    │
                          │                    ▼
                          │           ┌──────────────────┐
                          │           │ Email Providers  │
                          │           │ (SendGrid, etc.) │
                          │           └──────────────────┘
                          │                    │
                          ▼                    ▼
                    ┌───────────┐      ┌──────────────┐
                    │  Events   │◀─────│    Twilio    │
                    │  Queue    │      └──────────────┘
                    └───────────┘

How It Works

1. Publishing Messages

When our backend needs to send an email, it publishes a message to RabbitMQ instead of calling the provider directly:

const { getMessagePublisher } = require('./infrastructure/rabbitmq');

async function sendInvoiceEmail(invoice, customer) {
  const publisher = getMessagePublisher();

  const result = await publisher.publishEmail({
    to: customer.email,
    subject: `Invoice #${invoice.number} from ${company.name}`,
    html: renderInvoiceTemplate(invoice),
    context: {
      companyId: invoice.companyId,
      entityType: 'invoice',
      entityId: invoice.id,
      recipientId: customer.id
    }
  });

  // Returns immediately - email is queued
  return { queued: true, messageId: result.messageId };
}

The API response is instant. The actual email sending happens asynchronously.

2. Queue Structure

We use three queues for different message types:

QueuePurpose
messaging.emailStandard transactional emails
messaging.email.priorityUrgent emails (password resets, OTPs)
messaging.smsAll SMS messages

Priority emails get processed first, ensuring time-sensitive messages aren’t stuck behind a batch of marketing emails.

3. The Consumer Service

A separate Node.js service consumes messages from these queues:

channel.consume('messaging.email', async (message) => {
  const { payload, context, retry } = JSON.parse(message.content);

  // Idempotency check - prevent duplicates
  if (await wasAlreadySent(message.messageId)) {
    channel.ack(message);
    return;
  }

  try {
    const result = await sendViaProvider(payload, context);
    await markAsSent(message.messageId);
    await publishEvent('message.sent', {
      correlationId: message.messageId,
      provider: result.provider
    });
    channel.ack(message);
  } catch (error) {
    if (retry.currentAttempt < retry.maxAttempts) {
      channel.nack(message, false, true); // Requeue
    } else {
      channel.nack(message, false, false); // Dead letter queue
      await publishEvent('message.failed', {
        correlationId: message.messageId,
        error: error.message
      });
    }
  }
});

4. Event Feedback Loop

When the consumer sends a message (or fails permanently), it publishes an event back to RabbitMQ. Our backend listens for these events to update notification status:

consumer.on('message.sent', async (event) => {
  await notificationService.updateStatus(event.correlationId, 'sent');
});

consumer.on('message.failed', async (event) => {
  await notificationService.updateStatus(event.correlationId, 'failed');
});

Key Design Decisions

Idempotency with Redis

Every message has a unique ID. Before sending, the consumer checks Redis to see if that ID was already processed. This prevents duplicates even if a message is redelivered.

async function wasAlreadySent(messageId) {
  const key = `sent:${messageId}`;
  const exists = await redis.exists(key);
  if (!exists) {
    await redis.set(key, '1', 'EX', 604800); // 7-day TTL
  }
  return exists;
}

Graceful Fallback

We kept direct-send code paths intact. If RabbitMQ is down, the system falls back to synchronous sending:

if (publisher.isEnabled()) {
  await publisher.publishEmail(emailData);  // Queue-based
} else {
  await emailService.sendRawEmail(emailData);  // Direct fallback
}

Dead Letter Queue

Messages that fail after 3 retries go to a dead letter queue. A separate process alerts our team and allows manual retry.

Separate Cron Jobs

Time-based messaging runs in the messaging service:

JobSchedulePurpose
Appointment RemindersEvery 15 minReminders 7 days and 1 day before
Lead NurtureEvery hourProcess drip email campaigns

Results

After deploying this architecture:

  • API response times dropped 40% - No more waiting for email providers
  • Message delivery rate improved to 99.7% - Retries catch transient failures
  • Zero duplicate messages - Idempotency keys work
  • Better visibility - Every message tracked with full context
  • Easier debugging - Failed messages in DLQ with error details

When You Don’t Need This

This adds complexity. You probably don’t need it if:

  • You send fewer than 1,000 messages/day
  • Message delivery isn’t business-critical
  • You’re a small team without DevOps capacity
  • Your providers have built-in retry

Start simple. Add queues when you feel the pain.

Tech Stack

  • Message Broker: RabbitMQ
  • Backend: Node.js/Express
  • Consumer: Separate Node.js service
  • Idempotency Store: Redis
  • Email Providers: SendGrid, Mailgun, SMTP
  • SMS Provider: Twilio

Conclusion

Decoupling message sending from your main application feels like overkill—until it isn’t. The first time your app stays responsive during a SendGrid outage, or you catch a duplicate before it annoys a customer, you’ll be glad you made the investment.

Build it incrementally. Start with queue infrastructure, add consumers, then migrate high-volume messages first. Keep fallback paths working until you trust the system.

📄 Want a Free App Planning Checklist PDF?

Get our comprehensive checklist to plan your app from idea to launch