<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Practical Serverless]]></title><description><![CDATA[**Practical Serverless** is a blog about building real-world serverless systems.

Here you'll find practical insights on designing, building, and operating serverless architectures in production. I write about event-driven systems, cloud-native patterns, scalability, reliability, and the trade-offs that come with distributed systems.

Expect deep dives, architecture breakdowns, lessons learned from real implementations, and pragmatic guidance for engineers building serverless platforms.

If you're interested in serverless, distributed systems, and modern cloud architecture, you're in the right place.
]]></description><link>https://practicalserverless.blog</link><image><url>https://cdn.hashnode.com/uploads/logos/69b3cd56c9e75ce33d841724/a1fe57e2-6356-45b2-823d-a3612b2098ff.png</url><title>Practical Serverless</title><link>https://practicalserverless.blog</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 09 Apr 2026 13:47:50 GMT</lastBuildDate><atom:link href="https://practicalserverless.blog/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[How I Built a Serverless Testing Library That Cuts Test Setup by 90%]]></title><description><![CDATA[Every Lambda test starts the same way: you need an event object — and crafting one is annoying. API Gateway v2 events have 30+ fields, SQS needs message IDs, receipt handles, and ARNs, and DynamoDB St]]></description><link>https://practicalserverless.blog/how-i-built-a-serverless-testing-library-that-cuts-test-setup-by-90</link><guid isPermaLink="true">https://practicalserverless.blog/how-i-built-a-serverless-testing-library-that-cuts-test-setup-by-90</guid><category><![CDATA[serverless, testing, lambda]]></category><category><![CDATA[serverless]]></category><category><![CDATA[Testing]]></category><category><![CDATA[lambda]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 08 Apr 2026 09:19:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/c7044fc0-41a6-4c59-932b-9eb80d38d253.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every Lambda test starts the same way: you need an event object — and crafting one is annoying. API Gateway v2 events have 30+ fields, SQS needs message IDs, receipt handles, and ARNs, and DynamoDB Streams expect marshaled AttributeValue maps. The usual options are copy‑pasting a 60‑line JSON fixture or spending 20 minutes hand‑crafting one from memory.</p>
<p>I built <code>@sls-testing</code> to stop that. It provides typed, composable one‑line builders that give sensible defaults, automatic marshaling, and easy overrides so your tests only express what matters.</p>
<p>The payoff: what used to be a 30–60 line fixture becomes a single builder call — cutting test setup by roughly 90%. Below, I’ll show before/after examples, the API surface, and how it handles common event types (API Gateway, SQS, S3, DynamoDB Streams).</p>
<p>Here's what the before/after looks like.</p>
<h2>The Problem: 60 Lines to Say "POST /users"</h2>
<p>Testing a Lambda handler behind API Gateway v2 requires an <code>APIGatewayProxyEventV2</code> object. Here's the minimum viable event most teams copy around:</p>
<pre><code class="language-typescript">const event = {
  version: '2.0',
  routeKey: '$default',
  rawPath: '/users',
  rawQueryString: '',
  headers: {
    'content-type': 'application/json',
    'accept': 'application/json',
  },
  isBase64Encoded: false,
  body: JSON.stringify({ name: 'Lucas' }),
  requestContext: {
    accountId: '123456789012',
    apiId: 'test-api-id',
    domainName: 'test-api-id.execute-api.us-east-1.amazonaws.com',
    domainPrefix: 'test-api-id',
    http: {
      method: 'POST',
      path: '/users',
      protocol: 'HTTP/1.1',
      sourceIp: '127.0.0.1',
      userAgent: 'jest',
    },
    requestId: 'some-uuid-here',
    routeKey: '$default',
    stage: '$default',
    time: '01/Jan/2024:00:00:00 +0000',
    timeEpoch: 1704067200000,
  },
}
</code></pre>
<p>That's <strong>30+ lines</strong> for an event where the only things you actually care about are the method, path, and body. The rest is structural noise — correct enough to not crash, meaningless to your test.</p>
<p>Now multiply that by every event type in your service. SQS needs <code>messageId</code>, <code>receiptHandle</code>, <code>attributes</code>, <code>eventSourceARN</code>. S3 needs <code>bucket</code>, <code>key</code>, <code>responseElements</code>, <code>userIdentity</code>. DynamoDB Streams need marshalled <code>AttributeValue</code> maps where <code>"hello"</code> becomes <code>{ S: "hello" }</code> and <code>42</code> becomes <code>{ N: "42" }</code>.</p>
<p>Most teams solve this one of three ways:</p>
<ol>
<li><p><strong>Copy-paste JSON fixtures</strong> — Brittle, verbose, drift from reality over time.</p>
</li>
<li><p><strong>Hand-roll factory functions</strong> — Every team writes their own, slightly differently, and they're never complete.</p>
</li>
<li><p><strong>Skip testing</strong> — The honest answer when the setup cost exceeds the perceived value.</p>
</li>
</ol>
<p>None of these are good.</p>
<h2>The Solution: Express Intent, Not Structure</h2>
<p>With <code>@sls-testing/core</code>, the same test becomes:</p>
<pre><code class="language-typescript">import { buildApiGatewayEvent } from '@sls-testing/core'

const event = buildApiGatewayEvent({
  method: 'POST',
  path: '/users',
  body: JSON.stringify({ name: 'Lucas' }),
})
</code></pre>
<p><strong>Three lines. Same fully-typed event.</strong> Every field you didn't specify gets a sensible default — a real-looking request ID, a timestamp, valid ARNs. The TypeScript types come from <code>@types/aws-lambda</code>, so your IDE autocompletes every field if you need to override something specific.</p>
<p>The pattern is the same across all six event types:</p>
<pre><code class="language-typescript">// SQS — bodies auto-serialized, each record gets a unique messageId
const sqsEvent = buildSQSEvent({
  records: [
    { body: { orderId: 'abc-123', amount: 99.9 } },
    { body: { orderId: 'def-456', amount: 49.9 } },
  ],
})

// S3 — just bucket and key, everything else filled in
const s3Event = buildS3Event({
  bucket: 'uploads',
  key: 'images/photo.png',
})

// DynamoDB Streams — plain objects auto-marshalled to AttributeValue
const streamEvent = buildDynamoDBStreamEvent({
  records: [{
    eventName: 'INSERT',
    keys: { id: 'abc' },
    newImage: { id: 'abc', name: 'Lucas', count: 42 },
  }],
})

// EventBridge
const ebEvent = buildEventBridgeEvent({
  source: 'app.orders',
  'detail-type': 'OrderPlaced',
  detail: { orderId: 'abc-123' },
})

// SNS
const snsEvent = buildSNSEvent({
  records: [{ message: { action: 'notify' } }],
})
</code></pre>
<p>The DynamoDB builder is where the savings are most dramatic. Manually constructing a <code>DynamoDBStreamEvent</code> with marshalled values is easily 40-50 lines. The builder does the marshalling for you — pass <code>{ count: 42 }</code> and it becomes <code>{ N: "42" }</code> automatically.</p>
<h2>Beyond Events: Lambda Context</h2>
<p>Events are half the story. Your handler also receives a <code>Context</code> object, and AWS's type definition has 12 fields. Most tests either ignore it (<code>handler(event, {} as any)</code> — hello, runtime crash) or build an incomplete mock.</p>
<pre><code class="language-typescript">import { buildLambdaContext } from '@sls-testing/core'

const context = buildLambdaContext({
  functionName: 'order-service-dev-processOrder',
  memoryLimitInMB: '512',
  remainingTimeOverride: 5000,
})

context.getRemainingTimeInMillis() // 5000 — actually works
</code></pre>
<p>Every field has a default. <code>getRemainingTimeInMillis()</code> returns the value you configure. The <code>awsRequestId</code> is a real UUID. The <code>logGroupName</code> derives from the function name. It's a real <code>Context</code> object, not a type-cast empty object.</p>
<h2>Assertions That Speak Serverless</h2>
<p>The companion package <code>@sls-testing/jest</code> adds custom Jest matchers that understand Lambda response shapes:</p>
<pre><code class="language-typescript">import '@sls-testing/jest'

const result = await handler(event, context)

// Status code assertions
expect(result).toHaveStatusCode(200)
expect(result).toBeSuccessfulApiResponse()  // any 2xx
expect(result).toBeClientError()             // any 4xx
expect(result).toBeServerError()             // any 5xx

// Deep response matching with asymmetric matchers
expect(result).toMatchLambdaResponse({
  statusCode: 201,
  body: { userId: expect.any(String) },
  headers: { 'content-type': 'application/json' },
})

// SQS batch response assertions
expect(result).toHaveNoFailedMessages()
expect(result).toHaveFailedMessage('msg-id-2')
</code></pre>
<p><code>toMatchLambdaResponse</code> automatically parses the JSON body for comparison — you don't need to <code>JSON.parse(result.body)</code> in every test. Asymmetric matchers like <code>expect.any(String)</code> work inside the body, so you can assert structure without pinning every generated value.</p>
<p>The error messages are designed for Lambda. When <code>toHaveStatusCode</code> fails, it shows you both the expected and actual status codes plus the response body — because when a Lambda returns 500 instead of 200, the first thing you need is the error message, not a generic "expected 200 but received 500".</p>
<h2>What the Numbers Actually Look Like</h2>
<p>Let me do the math on a real scenario — a service with three Lambda functions (API Gateway handler, SQS consumer, DynamoDB Stream processor), each with 3-4 test cases.</p>
<h3>Without @sls-testing</h3>
<table>
<thead>
<tr>
<th>Component</th>
<th>Lines</th>
</tr>
</thead>
<tbody><tr>
<td>API Gateway event fixture</td>
<td>~35</td>
</tr>
<tr>
<td>SQS event fixture (2 records)</td>
<td>~45</td>
</tr>
<tr>
<td>DynamoDB Stream event fixture</td>
<td>~50</td>
</tr>
<tr>
<td>Lambda context mock</td>
<td>~20</td>
</tr>
<tr>
<td>Helper: JSON body parser for assertions</td>
<td>~10</td>
</tr>
<tr>
<td>Helper: status code checker</td>
<td>~8</td>
</tr>
<tr>
<td>Copy-paste overhead across test files</td>
<td>~40</td>
</tr>
<tr>
<td><strong>Total test infrastructure</strong></td>
<td><strong>~208</strong></td>
</tr>
</tbody></table>
<h3>With @sls-testing</h3>
<table>
<thead>
<tr>
<th>Component</th>
<th>Lines</th>
</tr>
</thead>
<tbody><tr>
<td>API Gateway event (per test)</td>
<td>3-4</td>
</tr>
<tr>
<td>SQS event (per test)</td>
<td>3-5</td>
</tr>
<tr>
<td>DynamoDB Stream event (per test)</td>
<td>4-6</td>
</tr>
<tr>
<td>Lambda context (per test)</td>
<td>1-3</td>
</tr>
<tr>
<td>Import + matcher setup</td>
<td>2</td>
</tr>
<tr>
<td><strong>Total test infrastructure</strong></td>
<td><strong>~20</strong></td>
</tr>
</tbody></table>
<p>That's roughly a <strong>90% reduction</strong> in test setup code. But the real win isn't the line count — it's the cognitive load. When a test file is 80% fixture and 20% assertion, you can't see what's being tested. When it's 20% setup and 80% assertion, the intent is obvious.</p>
<h2>Design Decisions</h2>
<p>A few choices I made that shaped the library:</p>
<p><strong>Sensible defaults, full override.</strong> Every builder returns a complete, valid event with zero arguments. Pass a <code>DeepPartial</code> override to change any field. This means the simple case is one line, but you can still construct precise edge cases when you need to test specific header combinations or malformed payloads.</p>
<p><strong>Auto-serialization.</strong> SQS bodies and SNS messages are automatically <code>JSON.stringify</code>'d. DynamoDB images are automatically marshalled. You pass plain objects; the builder handles the format Lambda actually receives.</p>
<p><strong>Framework-agnostic core.</strong> <code>@sls-testing/core</code> works with Jest, Vitest, Mocha, or any test runner. The Jest-specific matchers are a separate package. Vitest adapters are planned for v2.</p>
<p><strong>Types from the source.</strong> All event types come from <code>@types/aws-lambda</code> — the community-maintained definitions that match the actual AWS runtime. No custom type definitions that could drift.</p>
<p><strong>Unique identifiers per call.</strong> Every <code>buildSQSEvent()</code> call generates unique <code>messageId</code>s, every context gets a unique <code>awsRequestId</code>. This prevents subtle test pollution where two tests accidentally share the same ID.</p>
<h2>Getting Started</h2>
<pre><code class="language-bash">npm install @sls-testing/core @sls-testing/jest --save-dev
</code></pre>
<p>Add the Jest setup (or import per file):</p>
<pre><code class="language-json">{
  "setupFilesAfterEnv": ["@sls-testing/jest"]
}
</code></pre>
<p>Write a test:</p>
<pre><code class="language-typescript">import { buildApiGatewayEvent, buildLambdaContext } from '@sls-testing/core'
import '@sls-testing/jest'
import { handler } from './handler'

it('creates a user', async () =&gt; {
  const event = buildApiGatewayEvent({
    method: 'POST',
    path: '/users',
    body: JSON.stringify({ name: 'Lucas' }),
  })

  const result = await handler(event, buildLambdaContext())

  expect(result).toHaveStatusCode(201)
  expect(result).toMatchLambdaResponse({
    body: { name: 'Lucas', id: expect.any(String) },
  })
})
</code></pre>
<p>That's it. No fixture files. No factory functions. No <code>as any</code> casts.</p>
<h2>What's Next</h2>
<p>The library is at v1 and covers the six most common Lambda event sources. The roadmap includes:</p>
<ul>
<li><p><strong>Vitest adapter</strong> — Same matchers, native Vitest integration</p>
</li>
<li><p><strong>Serverless Framework plugin</strong> — Bridge <code>serverless.yml</code> config into tests so function names, timeouts, and env vars stay in sync automatically</p>
</li>
<li><p><strong>More event types</strong> — Cognito triggers, CloudWatch Events, Kinesis</p>
</li>
<li><p><strong>Snapshot testing</strong> — Assert that response shapes haven't changed across deploys</p>
</li>
<li><p><strong>Error simulation</strong> — Builders for timeout, OOM, and cold start scenarios</p>
</li>
</ul>
<p>The repo is at <a href="https://github.com/brognilucas/sls-testing">github.com/brognilucas/sls-testing</a>. Contributions welcome — especially if you have event types you'd like to see supported.</p>
<hr />
<p><em>Testing serverless applications shouldn't require more boilerplate than the business logic itself. If your test files are 80% fixture setup, something is wrong with the tooling, not with your tests.</em></p>
]]></content:encoded></item><item><title><![CDATA[How to Choose the Right Database for Your Serverless Application]]></title><description><![CDATA[Serverless promises to free teams from infrastructure worries, but picking the wrong database can hurt your performance, increase your costs, and affect developer velocity.
As with everything in softw]]></description><link>https://practicalserverless.blog/how-to-choose-the-right-database-for-your-serverless-application</link><guid isPermaLink="true">https://practicalserverless.blog/how-to-choose-the-right-database-for-your-serverless-application</guid><category><![CDATA[Databases]]></category><category><![CDATA[serverless]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 01 Apr 2026 18:56:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/4ee74504-c9a1-48dd-9c05-9decbe82c607.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Serverless promises to free teams from infrastructure worries, but picking the wrong database can hurt your performance, increase your costs, and affect developer velocity.</p>
<p>As with everything in software, the database choice comes with trade-offs, and understanding what are those are extremely important. Scaling characteristics, connection handling and concurrency, latency, consistency and transactional needs, operational overhead, and pricing model are all factors to consider.</p>
<p>This article unpacks those trade-offs, compares common patterns (serverless‑native databases, managed relational options, caches, and streaming stores), and offers practical rules of thumb so you can pick a database that fits your application rather than creating new operational headaches. By the end, you’ll have a concise checklist to make the decision faster and more confidently.</p>
<h2>Why database choice matters more in serverless</h2>
<p>In traditional servers, database connections are opened once and reused across thousands of requests. Application instances are long‑lived and predictable. Serverless flips that model: functions spin up, live for seconds or minutes, and vanish. Each invocation may be a fresh process with no previous state, no persistent connection, and no guarantee of locality to previous requests. That changes the calculus: connection limits, cold‑start penalties, and per‑operation pricing matter far more than they did in long‑running servers.</p>
<p>A database that works well behind a long‑lived app can cause connection storms, latency spikes, or runaway costs when used directly from a fleet of ephemeral functions. The goal is to match your workload’s requirements (throughput, latency, consistency, transactions) with a storage option whose trade‑offs align with serverless behavior.</p>
<h2>Key trade-offs to weigh</h2>
<ul>
<li><p>Scaling characteristics: Does the database scale horizontally without connection limits or shard coordination that conflicts with ephemeral clients?</p>
</li>
<li><p>Connection handling and concurrency: Can thousands of short‑lived connections be supported efficiently, or do you need a pooling/proxy layer?</p>
</li>
<li><p>Latency: Are single‑digit‑millisecond reads required, or can you accept higher, variable latency?</p>
</li>
<li><p>Consistency and transactions: Do you need strong ACID guarantees across multiple keys/tables, or is eventual consistency acceptable?</p>
</li>
<li><p>Operational overhead: How much maintenance, tuning, backups, and failover handling will your team manage?</p>
</li>
<li><p>Pricing model: Per‑operation, provisioned capacity, or storage‑centric billing—how do patterns of traffic (spiky vs steady) affect cost?</p>
</li>
</ul>
<h2>Common patterns and how they map to serverless</h2>
<ul>
<li><p>Serverless‑native databases (e.g., serverless NoSQL or fully serverless managed stores):</p>
<ul>
<li><p>Pros: Auto‑scaling, connectionless or HTTP/SDK access, fine‑grained billing, low operational overhead.</p>
</li>
<li><p>Cons: Weaker transactional guarantees or complex modeling for relational data; can be expensive at very high sustained throughput.</p>
</li>
<li><p>When to use: Spiky workloads, simple access patterns, evented architectures, or when you want minimal ops.</p>
</li>
</ul>
</li>
<li><p>Managed relational databases (serverless variants or provisioned RDS/Aurora/etc.):</p>
<ul>
<li><p>Pros: Familiar SQL, strong transactions, complex queries.</p>
</li>
<li><p>Cons: Connection limits and scaling challenges; may require connection pooling (proxy, pooler, or Data API) and can incur cold‑start latency.</p>
</li>
<li><p>When to use: Applications that require ACID across multiple records or complex joins and cannot be re‑modeled easily.</p>
</li>
</ul>
</li>
<li><p>Caches and in‑memory stores (Redis, Memcached, or managed variants):</p>
<ul>
<li><p>Pros: Extremely low latency for hot reads, useful for rate limiting, sessions, and ephemeral state.</p>
</li>
<li><p>Cons: Not a durable primary store (unless using persistence features), additional operational cost, eventual consistency with origin store.</p>
</li>
<li><p>When to use: Read‑heavy, low‑latency needs, offloading hotspots from a primary datastore.</p>
</li>
</ul>
</li>
<li><p>Streaming/append logs (Kafka, Kinesis, Pulsar, streaming databases):</p>
<ul>
<li><p>Pros: Durable event delivery, great for event‑sourcing, async processing, and decoupling components.</p>
</li>
<li><p>Cons: Not a drop‑in replacement for arbitrary reads/transactions; requires different application patterns.</p>
</li>
<li><p>When to use: Event‑driven architectures, audit logs, long‑running workflows.</p>
</li>
</ul>
</li>
</ul>
<h2>Practical rules of thumb</h2>
<ul>
<li><p>If your functions open many short‑lived DB connections, use a serverless‑friendly datastore or a connection proxy. Don’t rely on direct DB connections from unpooled functions.</p>
</li>
<li><p>For strong multi‑row/multi‑table transactions choose managed relational options—but consider a serverless (Data API) or pooled access pattern to avoid connection storms.</p>
</li>
<li><p>For spiky traffic with bursty reads, prefer serverless‑native stores and caches; they scale on demand and bill for usage.</p>
</li>
<li><p>If your app can tolerate eventual consistency, embracing key‑value or document models often reduces complexity and cost.</p>
</li>
<li><p>Use streaming stores for durable event capture and decoupling; combine with a materialized view or read store for low‑latency queries.</p>
</li>
<li><p>Measure cost at expected traffic patterns—serverless pricing can be higher for sustained, heavy throughput than for bursty, intermittent use.</p>
</li>
</ul>
<h2>Closing thoughts</h2>
<p>Choosing a database for serverless shouldn’t be guesswork. Match your access patterns and operational constraints to the storage option whose trade‑offs you can live with, and use small experiments to validate latency, scaling, and cost under realistic load. This keeps serverless simple, where it should be—letting your team move faster without trading away reliability or spiraling costs.</p>
]]></content:encoded></item><item><title><![CDATA[Events, Messages & Commands: The Concepts That Make or Break Your Serverless Architecture]]></title><description><![CDATA[You might have created a Lambda function that "handles events." But take a moment to question yourself about what an event actually is.
Let's forget the object that you can access on the lambda, and t]]></description><link>https://practicalserverless.blog/events-messages-commands-the-concepts-that-make-or-break-your-serverless-architecture</link><guid isPermaLink="true">https://practicalserverless.blog/events-messages-commands-the-concepts-that-make-or-break-your-serverless-architecture</guid><category><![CDATA[serverless]]></category><category><![CDATA[event-driven-architecture]]></category><category><![CDATA[events]]></category><category><![CDATA[architecture]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[software design]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 25 Mar 2026 11:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/896866c8-c91d-4cf3-a27b-413cb2c45870.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You might have created a Lambda function that "handles events." But take a moment to question yourself about what an event actually is.</p>
<p>Let's forget the object that you can access on the lambda, and think of its concept: what makes something an event, and not a command, or a message.</p>
<p>In serverless, I believe knowing this concept matters a lot. The whole ecosystem is built on a deeply event-driven model. EventBridge, SQS, SNS, DynamoDB Streams, and S3 notifications all depend on events.</p>
<p>In this post, we'll return to the basics. We'll explain what events really are, how they differ from commands and messages, and why these differences matter in every serverless system you create.</p>
<h2>What is an event</h2>
<p>A few years ago, during a talk by James Eastham, I learned something crucial: an event is a fact and cannot be undone. Exactly, you can't reverse an event. Consider writing a post for this blog; once you publish it, the action is irreversible. The <code>post.published</code> event has already been triggered.<br />You might wonder: if I delete the post, have I undone the action? Not quite. You haven't reversed the publication; instead, you've added another event to the sequence.</p>
<p>That's the essence of an event. In simple terms, an event represents an action that has occurred in the real world within your system.</p>
<h2>What is a Command</h2>
<p>If an event is something that <em>has happened</em>, a command is something you're <em>asking to happen</em>. It's a request, not a fact. And unlike events, commands can be rejected.</p>
<p>Think of it this way: when a user clicks "Publish" on your blog editor, your frontend might send a <code>PublishPost</code> command to your backend. That command can fail. The post might not meet validation rules, the user might not have the right permissions, or the system might be temporarily unavailable. The command is an intention, not a truth.</p>
<p>This distinction has real architectural consequences. Commands generally have an intended recipient. You don't broadcast a command to anyone who might be listening. You send it to the one service or function responsible for handling it. There's an implicit contract: someone is expected to act on it.</p>
<p>In serverless terms, an SQS queue carrying a <code>ResizeImage</code> instruction is a good example of a command channel. One producer, one consumer, one clear responsibility.</p>
<h2><strong>What is a Message</strong></h2>
<p>A message is the broadest of the three. Both events and commands travel as messages. The word "message" tells you about the <em>transport</em>, not the <em>intent</em>.</p>
<p>This is where a lot of confusion creeps in. Developers see SNS delivering a payload and call it "just a message." Technically, yes. But what matters architecturally is what's <em>inside.</em> Is it announcing something that happened, or requesting something to be done?</p>
<p>Getting that wrong leads to systems where consumers start making assumptions they shouldn't. A consumer that receives an event shouldn't be the one deciding whether the action was valid. That ship has sailed. But a consumer that receives a command absolutely should validate it before acting.</p>
<h2><strong>Why These Differences Matter in Serverless</strong></h2>
<p>In a distributed architecture, the distinction between events and commands changes how you design your application, how you deal with errors, and how do you handle a retry logic.</p>
<p>With events, every listener is an observer. They react to facts. If a <code>user.registered</code> event triggers a welcome email Lambda and that function fails, you don't "undo" the registration — you retry the email. The event remains true regardless.</p>
<p>With commands, the linesteners are executors. They own the outcome. A failed <code>ProcessPayment</code> command is not something you silently retry without careful thought. The intent hasn't been fulfilled, and that matters.</p>
<p>EventBridge is a great example of an event bus done right: it's designed around broadcasting facts to multiple consumers. SQS, on the other hand, lends itself naturally to commands. It's point-to-point, with visibility timeouts and dead-letter queues that reflect the expectation that <em>someone must handle this</em>.</p>
<h2>Conclusion</h2>
<p>Understanding the difference between events, commands, and messages is more than academic — it's foundational to building reliable, scalable serverless systems.</p>
<p>Events are immutable facts about things that have already happened; commands are intent to perform an action; messages are the vehicles that convey either. Treating them correctly changes how you design APIs, choose services, handle failures, and reason about system behavior.</p>
<p>Key takeaways and practical guidance:</p>
<ul>
<li><p>Name things clearly: events in past tense (e.g., <code>post.published</code>), commands as imperatives (e.g., <code>createPost</code>), messages as contextual envelopes.</p>
</li>
<li><p>Model events as immutable facts: persist them, append rather than overwrite, and use them to drive downstream state and side effects.</p>
</li>
<li><p>Use commands when you need explicit intent and control over execution (and choose queuing patterns that preserve ordering and retries).</p>
</li>
<li><p>Expect duplicates and out-of-order delivery in distributed systems: make consumers idempotent and design for eventual consistency.</p>
</li>
<li><p>Keep schemas explicit and versioned; consider a registry or strict contracts for producers and consumers.</p>
</li>
<li><p>Pick the right tool for the job:</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[When Messages Fail: How DLQs Save Your Event-Driven System]]></title><description><![CDATA[In recent interviews, I asked candidates a system-design question about managing failures in a serverless, event-driven architecture. I was surprised by how many didn't include retry mechanisms or a D]]></description><link>https://practicalserverless.blog/when-messages-fail-how-dlqs-save-your-event-driven-system</link><guid isPermaLink="true">https://practicalserverless.blog/when-messages-fail-how-dlqs-save-your-event-driven-system</guid><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 18 Mar 2026 12:04:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/431a5284-e096-4a57-b755-5fcbf3a6ac7c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In recent interviews, I asked candidates a system-design question about managing failures in a serverless, event-driven architecture. I was surprised by how many didn't include retry mechanisms or a Dead Letter Queue (DLQ) for investigation. In serverless systems, where functions are stateless, and communication often depends on event-driven messaging, failures can be silent and difficult to trace, making proper error handling essential. This gap inspired this article, which explains what a DLQ is, why it is important, and how to use one effectively in your serverless and event-driven workflows.</p>
<p>What is a DLQ?</p>
<p>Before explaining the importance of it, let's make sure we are aligned on what a DLQ is.</p>
<p>Dead Letter Queue, or simply DLQ, is a message queue used to store messages that could not be successfully processed by a consumer. When a message can't be successfully processed, regardless of the reason, instead of losing or keeping it, retrying forever, this message is redirected and stored in the DLQ.</p>
<p>Imagine it as a holding area for problem messages. Instead of letting failures vanish or stop your system, the DLQ catches them. This allows engineers to check, fix, and handle them later without affecting the main process.</p>
<p>Why use a DLQ?</p>
<p>Now that we understand what a DLQ is, let's talk about why you should use one and why not having one is a red flag in any event-driven or message-based architecture.</p>
<p>Prevent message loss.</p>
<p>Without a DLQ, a message that fails to be processed can simply disappear. Depending on your configuration, it might be discarded, leaving no trace of what went wrong. A DLQ ensures that no message is silently dropped. You can count on the fact that every failure is preserved and accounted for.</p>
<p>Avoid infinite retry loops.</p>
<p>Retries are great, and we should absolutely have them. But retries alone are not enough. If a message is fundamentally broken, for instance, with an invalid format or references data that no longer exists, it can lead to retrying it indefinitely, which wastes resources, is not cost-efficient, and potentially blocks other messages from being processed. A DLQ acts as the exit door for those unrecoverable failures.</p>
<p>Improved observability and debugging.</p>
<p>When a message lands in a DLQ, it presents an opportunity. You can examine the payload to understand what caused the failure and enhance your system. Without a DLQ, that context is lost, but with one, it provides a valuable feedback loop for your application's reliability.</p>
<p>A useful practice I've learned over the years is that you can use DLQ payloads for writing tests. This helps identify where errors occurred and serves as documentation for the fix.</p>
<p>Operational safety net</p>
<p>Systems fails that is a fact.</p>
<p>Sooner or later, either the network will be unreachable, the third-party service you're integrating with will go down, or perhaps a bug was introduced into your application and the previous payload isn't acceptable anymore.</p>
<p>A DLQ will provide architectural resilience and ensure that transient failures don't cause permanent data loss. Once the underlying issue is resolved, messages can be reprocessed from the DLQ as if nothing had happened.</p>
<p>In short, Build for Failure, Design for Resilience</p>
<p>Dead Letter Queues are a fundamental safety net for event-driven systems: they prevent silent failures, preserve the context needed for diagnosing issues, and allow teams to address problematic messages without disrupting normal processing. When paired with strong observability and clear operational playbooks, DLQs enhance the reliability and maintainability of event-driven systems.</p>
<p>Quick practical checklist:</p>
<ul>
<li><p>Define sensible retry limits and exponential backoff to ensure only truly problematic messages reach the DLQ.</p>
</li>
<li><p>Capture detailed metadata (timestamps, error reasons, processing context) with each dead-lettered message.</p>
</li>
<li><p>Monitor DLQ size and rate, setting alerts for spikes or stagnation.</p>
</li>
<li><p>Provide tools and processes for safe reprocessing, manual inspection, and automated remediation.</p>
</li>
<li><p>Treat DLQs as integral components in architecture reviews and tests.</p>
</li>
</ul>
<p>Adopting DLQs turns failures into actionable insights, keeping your system resilient and operable under real-world conditions.</p>
<p>Lucas Brogni is a Senior Software Engineer with 10+ years of experience building distributed systems.</p>
]]></content:encoded></item><item><title><![CDATA[Why I'm Writing This]]></title><description><![CDATA[I've been building with serverless since 2021.
Not just tinkering — using it as the primary architectural choice for production systems, advocating for it in hiring conversations, writing about it, gi]]></description><link>https://practicalserverless.blog/why-i-m-writing-this</link><guid isPermaLink="true">https://practicalserverless.blog/why-i-m-writing-this</guid><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Fri, 13 Mar 2026 09:48:20 GMT</pubDate><content:encoded><![CDATA[<p>I've been building with serverless since 2021.</p>
<p>Not just tinkering — using it as the primary architectural choice for production systems, advocating for it in hiring conversations, writing about it, giving talks about it, and making it the backbone of my graduate thesis on cloud-native architecture.</p>
<p>And yet, the question I get asked most often isn't about DynamoDB access patterns or cold start optimization. It's this:</p>
<p><strong>"How do I actually know when I've done it right?"</strong></p>
<p>That question has a longer answer than most people expect. That's what this blog is for.</p>
<hr />
<h2>The gap nobody warns you about</h2>
<p>There's a very seductive version of serverless that gets sold in conference talks and documentation pages. Deploy a function. It scales. You pay nothing when it's idle. Zero infrastructure to manage.</p>
<p>All of that is true. None of it prepares you for production.</p>
<p>The real learning curve in serverless isn't writing functions — it's understanding the <em>execution model</em> well enough to make good decisions under pressure. Why does your function behave differently under concurrent load? Why is that DynamoDB error only happening in production? Why did your SQS queue suddenly back up overnight with no error rate spike?</p>
<p>The answers to these questions all trace back to the same place: how Lambda actually works, and how the services around it actually behave. Not in theory. In practice.</p>
<hr />
<h2>What "practical" means here</h2>
<p>I'm not going to write tutorials that walk you through creating an S3 bucket. There are plenty of those. What I want to write — and what I wish had existed when I was figuring this out — is the thinking behind the decisions.</p>
<p>Why you should treat the handler as an entry point and nothing more.<br />Why idempotency isn't optional the moment you introduce asynchronous processing.<br />Why that IAM wildcard that "works fine" is a problem you haven't encountered yet.<br />Why your local environment is an approximation, and which differences will actually matter.</p>
<p>Each post here is going to take a concept that looks simple from the outside — and show you what it actually looks like from the inside of a running production system.</p>
<hr />
<h2>Where this comes from</h2>
<p>My day job is backend engineering on a growth team at a SaaS company. We run a serverless-first stack on AWS: Lambda, DynamoDB, SQS, EventBridge, API Gateway. I've shipped billing systems, built MCP-powered tooling, modernized test infrastructure, and handled zero-downtime schema migrations — all within this architecture.</p>
<p>I've also made most of the mistakes worth making. Misconfigured IAM roles that only failed at runtime. A trigger loop I caught in staging, barely. An SQS processor that quietly stopped processing because I hadn't understood partial batch failures. An observability gap that turned a 20-minute incident into a 3-hour one.</p>
<p>That's not a credentials flex. It's context. The patterns I write about here have been tested in the only environment that really matters.</p>
<hr />
<h2>What's coming</h2>
<p>I'll publish roughly twice a month. No rigid structure — just whatever's most worth writing about. Some posts will be conceptual, building the mental models that underpin everything else. Some will be deeply technical: specific patterns, concrete code, tradeoffs spelled out in full.</p>
<p>A few topics already in the pipeline:</p>
<ul>
<li><p><strong>The execution environment, actually explained</strong> — what init, invoke, and shutdown mean for the code you write every day</p>
</li>
<li><p><strong>Why your tests pass and production still breaks</strong> — the serverless testing gap and how to close it</p>
</li>
<li><p><strong>IAM for people who don't want to read the entire IAM docs</strong> — least privilege, per-function roles, and the wildcards that will eventually hurt you</p>
</li>
<li><p><strong>Idempotency from scratch</strong> — because "process it once" is harder than it sounds when Lambda will retry anything that fails</p>
</li>
</ul>
<p>If there's something specific you've been struggling with, I want to hear it. The goal of this blog is to be useful — not to document what I already know, but to address the questions you're actually asking.</p>
<hr />
<p>One more thing.</p>
<p>Serverless isn't perfect. It's not always the right choice. I'll say so when it isn't. The best thing I can offer here isn't enthusiasm — it's honesty about where the edges are and what happens when you hit them.</p>
<p>Let's get into it.</p>
<hr />
<p><em>Lucas Brogni is a Senior Software Engineer with 10+ years of experience building distributed systems.</em></p>
]]></content:encoded></item></channel></rss>