<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Practical Serverless]]></title><description><![CDATA[**Practical Serverless** is a blog about building real-world serverless systems.

Here you'll find practical insights on designing, building, and operating serverless architectures in production. I write about event-driven systems, cloud-native patterns, scalability, reliability, and the trade-offs that come with distributed systems.

Expect deep dives, architecture breakdowns, lessons learned from real implementations, and pragmatic guidance for engineers building serverless platforms.

If you're interested in serverless, distributed systems, and modern cloud architecture, you're in the right place.
]]></description><link>https://practicalserverless.blog</link><image><url>https://cdn.hashnode.com/uploads/logos/69b3cd56c9e75ce33d841724/a1fe57e2-6356-45b2-823d-a3612b2098ff.png</url><title>Practical Serverless</title><link>https://practicalserverless.blog</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 15 May 2026 22:41:57 GMT</lastBuildDate><atom:link href="https://practicalserverless.blog/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[How I Test Serverless Applications Without Going Insane]]></title><description><![CDATA[Testing serverless isn't hard. Testing serverless well is a different story.
After years of building Lambda-based systems, I've seen the full spectrum: no tests at all, tests that mock everything and ]]></description><link>https://practicalserverless.blog/how-i-test-serverless-applications-without-going-insane</link><guid isPermaLink="true">https://practicalserverless.blog/how-i-test-serverless-applications-without-going-insane</guid><category><![CDATA[serverless]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Testing]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 06 May 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/2475e95a-cd10-459b-b874-e7becbc8c2a3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Testing serverless isn't hard. Testing serverless <em>well</em> is a different story.</p>
<p>After years of building Lambda-based systems, I've seen the full spectrum: no tests at all, tests that mock everything and catch nothing, and test suites so slow they're never run in CI. I've made all those mistakes myself.</p>
<p>This post discusses the approach I find most effective. It's not about comparing frameworks or libraries; rather, it's the setup I consider best for incorporating tests into a serverless application.</p>
<hr />
<h2>The Problem With "Just Test Your Functions"</h2>
<p>The most common advice you'll hear is "Lambda functions are just functions — they're easy to test." And that's technically true. You can, and honestly should, add unit tests for each lambda in isolation.</p>
<p>On the other hand, though, the problem is that most of the interesting bugs in serverless systems don't live inside the function logic. They live at the <em>edges</em>:</p>
<ul>
<li><p>The IAM permission that works locally but not in prod</p>
</li>
<li><p>The SQS message that's shaped differently than you expected</p>
</li>
<li><p>The DynamoDB expression that silently returns nothing instead of erroring</p>
</li>
<li><p>The S3 event notification that fires but your handler throws before processing it</p>
</li>
<li><p>The Lambda is in a loop because the file is being pushed and read from the same S3</p>
</li>
</ul>
<p>If your entire test strategy is mocking <code>aws-sdk</code> and asserting on return values, you're absolutely missing out on potential bugs.</p>
<hr />
<h2>The Three-Layer Model I Actually Use</h2>
<p>I think about serverless testing in three layers, and I'm deliberate about what each one covers.</p>
<h3>Layer 1: Unit Tests — Test the Logic, Not the Infrastructure</h3>
<p>Unit tests should be fast, isolated, and abundant. But "isolated" doesn't mean "mock everything."</p>
<p>Here's the distinction I make: I don't mock my dependencies. I replace them with <strong>Fakes</strong>.</p>
<p>A mock is a hollow stub — you tell it what to return, it returns exactly that, and it never pushes back. A Fake is a real, working implementation of the same interface, just without the infrastructure. It has actual logic: it stores items in memory, enforces constraints, and throws an error on invalid input. It behaves like the real thing, just without DynamoDB behind it.</p>
<pre><code class="language-typescript">// The interface both the real implementation and the Fake satisfy
interface UserRepository {
  save(user: User): Promise&lt;void&gt;;
  findById(id: string): Promise&lt;User | null&gt;;
}

// The Fake — real behavior, no infrastructure
class InMemoryUserRepository implements UserRepository {
  private store = new Map&lt;string, User&gt;();

  async save(user: User): Promise&lt;void&gt; {
    if (!user.id) throw new Error('User must have an id');
    this.store.set(user.id, user);
  }

  async findById(id: string): Promise&lt;User | null&gt; {
    return this.store.get(id) ?? null;
  }
}
</code></pre>
<p>Now your unit tests use <code>InMemoryUserRepository</code> and they actually mean something. If your handler calls <code>findById</code> with the wrong ID, the Fake returns <code>null</code> — just like DynamoDB would. You're testing real behavior, not scripted responses.</p>
<pre><code class="language-typescript">describe('processOrder', () =&gt; {
  it('should reject orders with no items', async () =&gt; {
    const repo = new InMemoryUserRepository();
    const result = await processOrder({ items: [] }, repo);

    expect(result.success).toBe(false);
    expect(result.error).toBe('ORDER_EMPTY');
  });

  it('should not persist an order that fails validation', async () =&gt; {
    const repo = new InMemoryUserRepository();
    await processOrder({ items: [] }, repo);

    // The Fake lets us assert on actual state, not on whether a method was called
    expect(await repo.findById('order-123')).toBeNull();
  });
});
</code></pre>
<p>Fast. Deterministic. And honest about what your code actually does.</p>
<p>The payoff of Fakes comes later: you can run <strong>contract tests</strong> that exercise both the Fake and the real implementation against the same suite. If <code>DynamoUserRepository</code> and <code>InMemoryUserRepository</code> both pass the same contract tests, you know the Fake is trustworthy — and your unit tests are as meaningful as your integration tests.</p>
<pre><code class="language-typescript">// The contract — runs against both implementations
function userRepositoryContract(makeRepo: () =&gt; UserRepository) {
  it('should persist and retrieve a user', async () =&gt; {
    const repo = makeRepo();
    await repo.save({ id: 'u1', email: 'test@example.com' });
    expect(await repo.findById('u1')).toMatchObject({ id: 'u1' });
  });

  it('should return null for unknown IDs', async () =&gt; {
    const repo = makeRepo();
    expect(await repo.findById('nonexistent')).toBeNull();
  });
}

// Run the contract against both
describe('InMemoryUserRepository', () =&gt; userRepositoryContract(() =&gt; new InMemoryUserRepository()));
describe('DynamoUserRepository', () =&gt; userRepositoryContract(() =&gt; new DynamoUserRepository(client, 'users-test')));
</code></pre>
<p>This pattern does require more upfront design — your dependencies need to be behind interfaces, and you need to maintain the Fakes. That's not free. But it pays back in test confidence and, honestly, in better code architecture. If something is hard to Fake, it's usually a sign that the dependency boundary is wrong.</p>
<p><strong>What I cover at this layer:</strong></p>
<ul>
<li><p>Business logic and domain rules</p>
</li>
<li><p>Input validation and error handling</p>
</li>
<li><p>Event parsing and transformation</p>
</li>
<li><p>Edge cases and branching paths</p>
</li>
</ul>
<p><strong>What I don't bother covering here:</strong></p>
<ul>
<li><p>Whether the DynamoDB call actually works</p>
</li>
<li><p>Whether my IAM role has the right permissions</p>
</li>
<li><p>Whether the Lambda timeout is long enough</p>
</li>
</ul>
<h3>Layer 2: Integration Tests — Test the Real AWS Interactions</h3>
<p>This is where most projects fall apart. Developers either skip integration tests entirely (dangerous) or try to run them against real AWS (slow and expensive).</p>
<p>My approach: run integration tests against a local AWS emulator during development and CI, and run a smaller set against a real staging environment before merging to main.</p>
<p>For local emulation, I've been working on tooling around this exact problem — more on that in a future post. For now, the important thing is the principle: your integration tests should exercise <em>actual</em> AWS API calls, not mocked ones.</p>
<p><strong>What I cover at this layer:</strong></p>
<ul>
<li><p>DynamoDB reads and writes (real expressions, real shapes)</p>
</li>
<li><p>SQS send and receive (including batch behavior)</p>
</li>
<li><p>S3 put/get/delete</p>
</li>
<li><p>EventBridge publish and filtering rules</p>
</li>
<li><p>Any service-to-service interaction within your system</p>
</li>
</ul>
<pre><code class="language-typescript">// Example: Real DynamoDB interaction, real data shape
describe('UserRepository', () =&gt; {
  it('should persist and retrieve a user by ID', async () =&gt; {
    const repo = new UserRepository(dynamoClient, 'users-test');
    const user = { id: 'user-123', email: 'test@example.com', createdAt: Date.now() };

    await repo.save(user);
    const retrieved = await repo.findById('user-123');

    expect(retrieved).toMatchObject(user);
  });
});
</code></pre>
<p>Yes, this test talks to a database. That's the point.</p>
<h3>Layer 3: E2E Tests — Test the System, Not the Pieces</h3>
<p>End-to-end tests in serverless are expensive to run and slow to write. I keep this layer thin on purpose.</p>
<p>My rule: <strong>one E2E test per critical user journey, not per endpoint.</strong></p>
<p>For a typical API, that might mean:</p>
<ul>
<li><p>A user can sign up and receive a confirmation email</p>
</li>
<li><p>A payment can be initiated and the webhook handled correctly</p>
</li>
<li><p>A file upload triggers processing and the result is available</p>
</li>
</ul>
<p>These tests deploy real infrastructure (or use a dedicated staging environment) and exercise the full stack. They're not run on every commit — they run on every PR merge to staging, and before every production deploy.</p>
<p><strong>What I cover here:</strong></p>
<ul>
<li><p>Critical paths that involve multiple Lambda functions</p>
</li>
<li><p>Event-driven flows (trigger → process → result)</p>
</li>
<li><p>Anything where the cost of failure is high</p>
</li>
</ul>
<hr />
<h2>The Testing Stack</h2>
<p>Here's what I actually use, without the fluff:</p>
<ul>
<li><p><strong>Jest</strong> — test runner, all layers</p>
</li>
<li><p><code>@sls-testing/jest</code> — utilities I built specifically for Lambda testing (event factories, context mocks, assertion helpers)</p>
</li>
<li><p><strong>Local emulator</strong> — for integration tests in CI (more on this soon)</p>
</li>
<li><p><strong>Real AWS staging</strong> — for E2E tests, isolated account or dedicated stage</p>
</li>
</ul>
<p>I'm not going to tell you that you need every tool in this list. Start with Jest and clean unit tests. Add integration tests when you start seeing production bugs that your unit tests didn't catch. Add E2E tests when your system is complex enough to have meaningful user journeys.</p>
<hr />
<h2>Common Mistakes I See (And Made)</h2>
<p><strong>1. Mocking the entire AWS SDK</strong></p>
<p>You end up testing that you called <code>putItem</code> with certain parameters. DynamoDB doesn't care about your test — and neither should you. Test the outcome, not the invocation.</p>
<p><strong>2. Testing the framework, not your code</strong></p>
<p>If half your tests are verifying that API Gateway parses the request correctly, you're wasting time. API Gateway already has tests. Yours should cover what happens <em>after</em> the parsing.</p>
<p><strong>3. Coupling tests to infrastructure details</strong></p>
<p>If a test breaks because you renamed a DynamoDB table or changed an environment variable, that's a configuration problem, not a testing problem. Keep infrastructure config out of your test logic.</p>
<p><strong>4. No tests on the async paths</strong></p>
<p>Synchronous request/response flows are easy to test. Async flows — SQS consumers, EventBridge subscribers, Step Functions — are where people give up. Don't. These paths often carry your most critical business logic.</p>
<hr />
<h2>The Bottom Line</h2>
<p>My testing strategy in one sentence: <strong>unit test the logic, integration test the AWS interactions, E2E test the user journeys — and be ruthless about keeping each layer small and honest.</strong></p>
<p>You don't need 100% coverage. You need the right coverage in the right places.</p>
<p>If this resonated, I'm working on a full ebook on the topic — <em>Testing Serverless Applications (TDD in the Real World)</em> — that goes much deeper on each layer, including contract testing, load testing, and infrastructure testing with CDK. More on that soon.</p>
<p>In the meantime, if you have questions or want to share how you're approaching this, reply to this post or find me on [LinkedIn / X / wherever].</p>
<hr />
<p><em>Enjoyed this? The</em> <a href="https://practicalserverless.blog"><em>Practical Serverless newsletter</em></a> <em>covers this kind of stuff every week — no filler, just real serverless patterns from the field.</em></p>
<p>AI Disclaimer: AI has been utilized to refine the text. The experiences and content are my own.</p>
]]></content:encoded></item><item><title><![CDATA[Your Lambda Was Correct. It Was Also a Disaster.]]></title><description><![CDATA[If you don't know yet, I have a book published about serverless, and this weekend, I got a message from a reader.
He'd been looking at From Zero to Production with AWS Lambda and had a sharp observati]]></description><link>https://practicalserverless.blog/your-lambda-was-correct-it-was-also-a-disaster</link><guid isPermaLink="true">https://practicalserverless.blog/your-lambda-was-correct-it-was-also-a-disaster</guid><category><![CDATA[serverless]]></category><category><![CDATA[ebook]]></category><category><![CDATA[lambda]]></category><category><![CDATA[S3]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 22 Apr 2026 14:30:00 GMT</pubDate><content:encoded><![CDATA[<p>If you don't know yet, I have a book published about serverless, and this weekend, I got a message from a reader.</p>
<p>He'd been looking at <a href="https://lambdainproduction.com">From Zero to Production with AWS Lambda</a> and had a sharp observation: the value wasn't landing fast enough for someone seeing it for the first time. A lot of serverless content already claims to go "beyond Hello World." What made this one actually different?</p>
<p>Fair point. And an honest one.</p>
<p>I gave him an honest answer back: the goal was never "here's how to deploy a function." It was "here's why you'd structure it this way when you have cold starts to worry about, downstream dependencies that can fail, and a team that needs to maintain this six months from now." I wrote the thing I wished existed when I was moving from toy projects to systems that had to survive production traffic, on-call rotations, and Monday morning debugging sessions.</p>
<p>He pushed back — in the best way. He said that the intention is clear when you explain it. But it needs to be <em>visible</em> without explanation. From the first few pages. He wanted to feel the pressure before I offered the structure.</p>
<p>Then he asked: if you had to pick one production issue that best represents the shift from toy to real system, which one would you bring forward first?</p>
<p>I didn't have to think long.</p>
<hr />
<h2>The S3 Loop</h2>
<p>You have a Lambda function triggered by S3 uploads. A user uploads a CSV, your function processes it, and saves the result back to the same bucket as a JSON file.</p>
<p>It's a common, reasonable pattern. The code is clean. The function works. The test coverage? 100%.</p>
<pre><code class="language-typescript">export const handler = async (event: S3Event): Promise&lt;void&gt; =&gt; {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = record.s3.object.key;

    const csv = await s3.getObject({ Bucket: bucket, Key: key }).promise();
    const json = transform(csv.Body.toString());

    await s3.putObject({
      Bucket: bucket,
      Key: key.replace('.csv', '.json'),
      Body: JSON.stringify(json),
    }).promise();
  }
};
</code></pre>
<p>You deploy it. You run a manual test by uploading a file. It processes the CSV, saves the JSON, and exits cleanly. You move on.</p>
<p>Then you check your logs on Monday morning.</p>
<p>Thousands of invocations. Concurrency at the limit. A bill with a line item you weren't expecting. And the function is still running.</p>
<p>Here's what happened: the function processed the CSV and saved <code>result.json</code> to the bucket. That <code>PUT</code> triggered the Lambda again — because the trigger was watching the entire bucket, not just <code>.csv</code> files. Which ran the function on the JSON. Which wrote something back. Which triggered it again.</p>
<p>The S3 bucket doesn't know or care that the file your Lambda just wrote was the <em>result</em> of processing, not a new file to process. It sees a new object. It fires an event. Lambda picks it up.</p>
<p>No bug. No typo. No oversight in the logic. The function did exactly what it was told, in exactly the environment it was deployed to — and the environment bit back.</p>
<hr />
<h2>Why This Is the Right Story to Tell First</h2>
<p>This is almost impossible to catch locally. You're not running the full event pipeline in your dev environment. You're invoking the function directly with a crafted event. Everything looks fine.</p>
<p>The feedback loop only exists in production.</p>
<p>And that's the point. This isn't an edge case. It's not a gotcha for beginners. It's the kind of thing that happens to people who know what they're doing, because the mistake isn't in the code — it's in the mental model.</p>
<p>When you're learning Lambda, you think about functions in isolation. Input goes in, output comes out, done. But in production, your function doesn't exist in isolation. It exists inside an event-driven system that reacts to state changes. And state changes can come from <em>anywhere</em> — including from your own function.</p>
<hr />
<h2>The Fix Is Simple. The Lesson Isn't.</h2>
<p>For this specific case, the fix takes thirty seconds: use separate buckets for input and output, or filter the trigger by suffix so it only fires on <code>.csv</code> files.</p>
<pre><code class="language-yaml"># serverless.yml
functions:
  processUpload:
    handler: src/handler.handler
    events:
      - s3:
          bucket: uploads-bucket
          event: s3:ObjectCreated:*
          rules:
            - suffix: .csv
</code></pre>
<p>Done. Loop broken.</p>
<p>But if you stop at "use a suffix filter," you've fixed the incident without absorbing the lesson.</p>
<p>The lesson is this: <strong>your code being correct isn't enough. It also needs to behave well in the environment it actually runs in.</strong></p>
<p>Every trigger you configure, every service you write to, every queue you publish to — these are all potential feedback paths. If you don't think about them deliberately, the system will eventually find one you didn't mean to create.</p>
<hr />
<h2>The Questions I Now Ask Before Every Deploy</h2>
<p>After enough of these moments, I've trained myself to slow down before shipping any new Lambda and ask four things:</p>
<p><strong>What happens if this function fires twice on the same input?</strong> Idempotency isn't optional in event-driven systems. Retries happen. Duplicates happen. At-least-once delivery is the default, not the exception. If your function isn't safe to run twice, it will eventually run twice.</p>
<p><strong>What does this function write to, and could that trigger something?</strong> Trace every write: S3 puts, DynamoDB streams, SQS sends, EventBridge publishes. For each one — what's listening? Could that listener, directly or indirectly, trigger this function again?</p>
<p><strong>What happens at scale?</strong> A function that behaves fine at 10 concurrent invocations can behave very differently at 500. Downstream services start throttling. Timeouts compound. Retries amplify the load rather than absorbing it. Think through the failure modes before traffic does it for you.</p>
<p><strong>What would I be looking at on a Monday morning incident?</strong> If this goes wrong at 2am, what does it look like in CloudWatch? How do I know something is wrong before my bill tells me? Logs and alarms aren't an afterthought — they're part of the feature.</p>
<hr />
<h2>Correctness Is Table Stakes</h2>
<p>The bar for production isn't "does it work." It's "does it behave well when the environment doesn't cooperate."</p>
<p>That's what I tried to answer for my reader. It's also what I try to establish early in everything I write and build. Not "here's how Lambda works." But: here's what happens on a Monday morning when something goes wrong — and here's how you build systems that survive it.</p>
<p>Fix the mental model, and you stop making a whole class of mistakes at once.</p>
<hr />
<p><em>This is the kind of thing I go deep on in</em> <a href="https://lambdainproduction.com"><em>From Zero to Production with AWS Lambda</em></a> <em>— no fluff, written for developers who are done with toy examples. And if you want more of this every week, you're already in the right place.</em></p>
<p>Disclosure: This post was edited with AI assistance for clarity and flow—my experiences, analysis, and conclusions remain my own.</p>
]]></content:encoded></item><item><title><![CDATA[Lambda isn't always the answer]]></title><description><![CDATA[Yes, this is a serverless blog and obviously the main objective here is to spread the serverless word, and make people feel more comfortable to use it.
But still, we are talking about technology, and ]]></description><link>https://practicalserverless.blog/lambda-isn-t-always-the-answer</link><guid isPermaLink="true">https://practicalserverless.blog/lambda-isn-t-always-the-answer</guid><category><![CDATA[lambda]]></category><category><![CDATA[serverless]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[software development]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 15 Apr 2026 12:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/2c577832-4fbe-426e-a8ac-faa797ab35e5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Yes, this is a serverless blog and obviously the main objective here is to spread the <em>serverless</em> word, and make people feel more comfortable to use it.</p>
<p>But still, we are talking about technology, and to be more precise, about software architecture, and an important part of that is knowing the trade-offs for choosing the right tool for your problem.</p>
<p>Serverless is great, but it doesn't solve all the problems though.</p>
<h2>The trade-offs</h2>
<p>Let's be real about a few things.</p>
<p>Cold starts hurt in specific situations. If you're building a public-facing API where a small percentage of requests hitting an uninitialized container is acceptable, you'll probably never notice. But if you're building a latency-sensitive application, you will definitely not want to deal with cold starts. It has some workarounds, like Provisioned Concurrency, which solves the problem, but obviously, it adds cost.</p>
<p>Lambdas also have their limitations. You can't run it for over 15 minutes, and the max memory is 10GB, for example.</p>
<p>The observability will be more complicated. With serverless, you're dealing with distributed, short-lived functions that can scale to hundreds of concurrent executions in seconds. Debugging that without proper instrumentation — structured logs, distributed tracing, alerting — is a bad time.</p>
<p>And obviously, there's lock-in. Using managed AWS services is genuinely one of the best parts of going serverless, and also the thing that makes migration expensive if you ever need it. You're making that trade whether you're aware of it or not.</p>
<p>Know these going in. Build with them in mind. Serverless is still worth it — just not blindly.</p>
<h2><strong>When Lambda is the wrong choice</strong></h2>
<p>Talking about trade-offs is useful, but let's make it concrete. Here are real scenarios where you should think twice before reaching for Lambda.</p>
<p><strong>You need long-running, stateful processes.</strong> Video encoding, large data migrations, ML model training — anything that may hit the 15-minute limit or needs to maintain state across steps is a bad fit. You'll end up doing gymnastics with Step Functions to work around Lambda's limits, and at some point, you have to ask: Is this still simpler than just running a container?</p>
<p><strong>Your workload is constant and predictable.</strong> Serverless shines when traffic is spiky or unpredictable. If you have a service that runs at steady, high volume 24/7, the pay-per-invocation model stops being cheap. At scale, a reserved EC2 instance or a Fargate service is often more cost-effective. Run the numbers before you commit.</p>
<p><strong>Latency is non-negotiable.</strong> Cold starts are manageable in most cases, but if you're building something like a high-frequency trading system or a real-time gaming backend where every millisecond matters, serverless is not your friend. Provisioned Concurrency helps, but it also chips away at the cost model that made serverless attractive in the first place.</p>
<p><strong>Your team is already struggling with observability.</strong> Lambda doesn't create observability problems, but it amplifies existing ones. If your team doesn't have solid practices around logging, tracing, and alerting, going serverless will make debugging feel like chasing ghosts. Get the foundation right first.</p>
<p><strong>You're lifting and shifting a monolith.</strong> Taking an existing monolithic application and deploying it as a single Lambda function is almost never a good idea. You get the complexity of serverless without any of the benefits. If you're migrating, the work is in decomposing the application, which probably is not worth changing where it runs.</p>
<h2><strong>Serverless is more than Lambda</strong></h2>
<p>If you've been following the serverless conversation online — or even if you've read my book, <em>From Zero to Production with AWS Lambda</em> — you'd be forgiven for thinking serverless is mostly about Lambda. Lambda gets the blog posts, the talks, the hot takes. But Lambda is just one piece of a much bigger picture.</p>
<p>Serverless is a model, not a service. It's about offloading infrastructure management so you can focus on your business logic. Lambda does that for compute, but the same principle applies across your entire stack.</p>
<p>DynamoDB is serverless. You don't manage servers, you don't provision capacity upfront, you just store data and pay for what you use. The same applies to API Gateway, S3, SQS, SNS, and EventBridge — it's all serverless. This can go on and on.</p>
<p>There are serverless options to enhance your stack, and some can even replace Lambdas. For instance, if you have a long-running process or predictable high-volume traffic, Fargate is a great choice. You still avoid managing servers, enjoy automatic scaling, and pay only for what you use. It's serverless, just not Lambda.</p>
<p>When pieces work together, that's when you unlock the real power. A file lands in S3, triggers a Lambda function, which publishes an event to EventBridge, which fans out to multiple SQS queues, each processed by a different Lambda. A heavy processing job gets handed off to Fargate. No servers. No clusters. No ops team babysitting instances at 3 am.</p>
<p>That's the architecture worth understanding. Lambda is the most visible part of it, but building serverless applications means thinking about the whole ecosystem — and knowing which managed service fits each job, Lambda or otherwise.</p>
]]></content:encoded></item><item><title><![CDATA[How I Built a Serverless Testing Library That Cuts Test Setup by 90%]]></title><description><![CDATA[Every Lambda test starts the same way: you need an event object — and crafting one is annoying. API Gateway v2 events have 30+ fields, SQS needs message IDs, receipt handles, and ARNs, and DynamoDB St]]></description><link>https://practicalserverless.blog/how-i-built-a-serverless-testing-library-that-cuts-test-setup-by-90</link><guid isPermaLink="true">https://practicalserverless.blog/how-i-built-a-serverless-testing-library-that-cuts-test-setup-by-90</guid><category><![CDATA[serverless, testing, lambda]]></category><category><![CDATA[serverless]]></category><category><![CDATA[Testing]]></category><category><![CDATA[lambda]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 08 Apr 2026 09:19:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/c7044fc0-41a6-4c59-932b-9eb80d38d253.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every Lambda test starts the same way: you need an event object — and crafting one is annoying. API Gateway v2 events have 30+ fields, SQS needs message IDs, receipt handles, and ARNs, and DynamoDB Streams expect marshaled AttributeValue maps. The usual options are copy‑pasting a 60‑line JSON fixture or spending 20 minutes hand‑crafting one from memory.</p>
<p>I built <code>@sls-testing</code> to stop that. It provides typed, composable one‑line builders that give sensible defaults, automatic marshaling, and easy overrides so your tests only express what matters.</p>
<p>The payoff: what used to be a 30–60 line fixture becomes a single builder call — cutting test setup by roughly 90%. Below, I’ll show before/after examples, the API surface, and how it handles common event types (API Gateway, SQS, S3, DynamoDB Streams).</p>
<p>Here's what the before/after looks like.</p>
<h2>The Problem: 60 Lines to Say "POST /users"</h2>
<p>Testing a Lambda handler behind API Gateway v2 requires an <code>APIGatewayProxyEventV2</code> object. Here's the minimum viable event most teams copy around:</p>
<pre><code class="language-typescript">const event = {
  version: '2.0',
  routeKey: '$default',
  rawPath: '/users',
  rawQueryString: '',
  headers: {
    'content-type': 'application/json',
    'accept': 'application/json',
  },
  isBase64Encoded: false,
  body: JSON.stringify({ name: 'Lucas' }),
  requestContext: {
    accountId: '123456789012',
    apiId: 'test-api-id',
    domainName: 'test-api-id.execute-api.us-east-1.amazonaws.com',
    domainPrefix: 'test-api-id',
    http: {
      method: 'POST',
      path: '/users',
      protocol: 'HTTP/1.1',
      sourceIp: '127.0.0.1',
      userAgent: 'jest',
    },
    requestId: 'some-uuid-here',
    routeKey: '$default',
    stage: '$default',
    time: '01/Jan/2024:00:00:00 +0000',
    timeEpoch: 1704067200000,
  },
}
</code></pre>
<p>That's <strong>30+ lines</strong> for an event where the only things you actually care about are the method, path, and body. The rest is structural noise — correct enough to not crash, meaningless to your test.</p>
<p>Now multiply that by every event type in your service. SQS needs <code>messageId</code>, <code>receiptHandle</code>, <code>attributes</code>, <code>eventSourceARN</code>. S3 needs <code>bucket</code>, <code>key</code>, <code>responseElements</code>, <code>userIdentity</code>. DynamoDB Streams need marshalled <code>AttributeValue</code> maps where <code>"hello"</code> becomes <code>{ S: "hello" }</code> and <code>42</code> becomes <code>{ N: "42" }</code>.</p>
<p>Most teams solve this one of three ways:</p>
<ol>
<li><p><strong>Copy-paste JSON fixtures</strong> — Brittle, verbose, drift from reality over time.</p>
</li>
<li><p><strong>Hand-roll factory functions</strong> — Every team writes their own, slightly differently, and they're never complete.</p>
</li>
<li><p><strong>Skip testing</strong> — The honest answer when the setup cost exceeds the perceived value.</p>
</li>
</ol>
<p>None of these are good.</p>
<h2>The Solution: Express Intent, Not Structure</h2>
<p>With <code>@sls-testing/core</code>, the same test becomes:</p>
<pre><code class="language-typescript">import { buildApiGatewayEvent } from '@sls-testing/core'

const event = buildApiGatewayEvent({
  method: 'POST',
  path: '/users',
  body: JSON.stringify({ name: 'Lucas' }),
})
</code></pre>
<p><strong>Three lines. Same fully-typed event.</strong> Every field you didn't specify gets a sensible default — a real-looking request ID, a timestamp, valid ARNs. The TypeScript types come from <code>@types/aws-lambda</code>, so your IDE autocompletes every field if you need to override something specific.</p>
<p>The pattern is the same across all six event types:</p>
<pre><code class="language-typescript">// SQS — bodies auto-serialized, each record gets a unique messageId
const sqsEvent = buildSQSEvent({
  records: [
    { body: { orderId: 'abc-123', amount: 99.9 } },
    { body: { orderId: 'def-456', amount: 49.9 } },
  ],
})

// S3 — just bucket and key, everything else filled in
const s3Event = buildS3Event({
  bucket: 'uploads',
  key: 'images/photo.png',
})

// DynamoDB Streams — plain objects auto-marshalled to AttributeValue
const streamEvent = buildDynamoDBStreamEvent({
  records: [{
    eventName: 'INSERT',
    keys: { id: 'abc' },
    newImage: { id: 'abc', name: 'Lucas', count: 42 },
  }],
})

// EventBridge
const ebEvent = buildEventBridgeEvent({
  source: 'app.orders',
  'detail-type': 'OrderPlaced',
  detail: { orderId: 'abc-123' },
})

// SNS
const snsEvent = buildSNSEvent({
  records: [{ message: { action: 'notify' } }],
})
</code></pre>
<p>The DynamoDB builder is where the savings are most dramatic. Manually constructing a <code>DynamoDBStreamEvent</code> with marshalled values is easily 40-50 lines. The builder does the marshalling for you — pass <code>{ count: 42 }</code> and it becomes <code>{ N: "42" }</code> automatically.</p>
<h2>Beyond Events: Lambda Context</h2>
<p>Events are half the story. Your handler also receives a <code>Context</code> object, and AWS's type definition has 12 fields. Most tests either ignore it (<code>handler(event, {} as any)</code> — hello, runtime crash) or build an incomplete mock.</p>
<pre><code class="language-typescript">import { buildLambdaContext } from '@sls-testing/core'

const context = buildLambdaContext({
  functionName: 'order-service-dev-processOrder',
  memoryLimitInMB: '512',
  remainingTimeOverride: 5000,
})

context.getRemainingTimeInMillis() // 5000 — actually works
</code></pre>
<p>Every field has a default. <code>getRemainingTimeInMillis()</code> returns the value you configure. The <code>awsRequestId</code> is a real UUID. The <code>logGroupName</code> derives from the function name. It's a real <code>Context</code> object, not a type-cast empty object.</p>
<h2>Assertions That Speak Serverless</h2>
<p>The companion package <code>@sls-testing/jest</code> adds custom Jest matchers that understand Lambda response shapes:</p>
<pre><code class="language-typescript">import '@sls-testing/jest'

const result = await handler(event, context)

// Status code assertions
expect(result).toHaveStatusCode(200)
expect(result).toBeSuccessfulApiResponse()  // any 2xx
expect(result).toBeClientError()             // any 4xx
expect(result).toBeServerError()             // any 5xx

// Deep response matching with asymmetric matchers
expect(result).toMatchLambdaResponse({
  statusCode: 201,
  body: { userId: expect.any(String) },
  headers: { 'content-type': 'application/json' },
})

// SQS batch response assertions
expect(result).toHaveNoFailedMessages()
expect(result).toHaveFailedMessage('msg-id-2')
</code></pre>
<p><code>toMatchLambdaResponse</code> automatically parses the JSON body for comparison — you don't need to <code>JSON.parse(result.body)</code> in every test. Asymmetric matchers like <code>expect.any(String)</code> work inside the body, so you can assert structure without pinning every generated value.</p>
<p>The error messages are designed for Lambda. When <code>toHaveStatusCode</code> fails, it shows you both the expected and actual status codes plus the response body — because when a Lambda returns 500 instead of 200, the first thing you need is the error message, not a generic "expected 200 but received 500".</p>
<h2>What the Numbers Actually Look Like</h2>
<p>Let me do the math on a real scenario — a service with three Lambda functions (API Gateway handler, SQS consumer, DynamoDB Stream processor), each with 3-4 test cases.</p>
<h3>Without @sls-testing</h3>
<table>
<thead>
<tr>
<th>Component</th>
<th>Lines</th>
</tr>
</thead>
<tbody><tr>
<td>API Gateway event fixture</td>
<td>~35</td>
</tr>
<tr>
<td>SQS event fixture (2 records)</td>
<td>~45</td>
</tr>
<tr>
<td>DynamoDB Stream event fixture</td>
<td>~50</td>
</tr>
<tr>
<td>Lambda context mock</td>
<td>~20</td>
</tr>
<tr>
<td>Helper: JSON body parser for assertions</td>
<td>~10</td>
</tr>
<tr>
<td>Helper: status code checker</td>
<td>~8</td>
</tr>
<tr>
<td>Copy-paste overhead across test files</td>
<td>~40</td>
</tr>
<tr>
<td><strong>Total test infrastructure</strong></td>
<td><strong>~208</strong></td>
</tr>
</tbody></table>
<h3>With @sls-testing</h3>
<table>
<thead>
<tr>
<th>Component</th>
<th>Lines</th>
</tr>
</thead>
<tbody><tr>
<td>API Gateway event (per test)</td>
<td>3-4</td>
</tr>
<tr>
<td>SQS event (per test)</td>
<td>3-5</td>
</tr>
<tr>
<td>DynamoDB Stream event (per test)</td>
<td>4-6</td>
</tr>
<tr>
<td>Lambda context (per test)</td>
<td>1-3</td>
</tr>
<tr>
<td>Import + matcher setup</td>
<td>2</td>
</tr>
<tr>
<td><strong>Total test infrastructure</strong></td>
<td><strong>~20</strong></td>
</tr>
</tbody></table>
<p>That's roughly a <strong>90% reduction</strong> in test setup code. But the real win isn't the line count — it's the cognitive load. When a test file is 80% fixture and 20% assertion, you can't see what's being tested. When it's 20% setup and 80% assertion, the intent is obvious.</p>
<h2>Design Decisions</h2>
<p>A few choices I made that shaped the library:</p>
<p><strong>Sensible defaults, full override.</strong> Every builder returns a complete, valid event with zero arguments. Pass a <code>DeepPartial</code> override to change any field. This means the simple case is one line, but you can still construct precise edge cases when you need to test specific header combinations or malformed payloads.</p>
<p><strong>Auto-serialization.</strong> SQS bodies and SNS messages are automatically <code>JSON.stringify</code>'d. DynamoDB images are automatically marshalled. You pass plain objects; the builder handles the format Lambda actually receives.</p>
<p><strong>Framework-agnostic core.</strong> <code>@sls-testing/core</code> works with Jest, Vitest, Mocha, or any test runner. The Jest-specific matchers are a separate package. Vitest adapters are planned for v2.</p>
<p><strong>Types from the source.</strong> All event types come from <code>@types/aws-lambda</code> — the community-maintained definitions that match the actual AWS runtime. No custom type definitions that could drift.</p>
<p><strong>Unique identifiers per call.</strong> Every <code>buildSQSEvent()</code> call generates unique <code>messageId</code>s, every context gets a unique <code>awsRequestId</code>. This prevents subtle test pollution where two tests accidentally share the same ID.</p>
<h2>Getting Started</h2>
<pre><code class="language-bash">npm install @sls-testing/core @sls-testing/jest --save-dev
</code></pre>
<p>Add the Jest setup (or import per file):</p>
<pre><code class="language-json">{
  "setupFilesAfterEnv": ["@sls-testing/jest"]
}
</code></pre>
<p>Write a test:</p>
<pre><code class="language-typescript">import { buildApiGatewayEvent, buildLambdaContext } from '@sls-testing/core'
import '@sls-testing/jest'
import { handler } from './handler'

it('creates a user', async () =&gt; {
  const event = buildApiGatewayEvent({
    method: 'POST',
    path: '/users',
    body: JSON.stringify({ name: 'Lucas' }),
  })

  const result = await handler(event, buildLambdaContext())

  expect(result).toHaveStatusCode(201)
  expect(result).toMatchLambdaResponse({
    body: { name: 'Lucas', id: expect.any(String) },
  })
})
</code></pre>
<p>That's it. No fixture files. No factory functions. No <code>as any</code> casts.</p>
<h2>What's Next</h2>
<p>The library is at v1 and covers the six most common Lambda event sources. The roadmap includes:</p>
<ul>
<li><p><strong>Vitest adapter</strong> — Same matchers, native Vitest integration</p>
</li>
<li><p><strong>Serverless Framework plugin</strong> — Bridge <code>serverless.yml</code> config into tests so function names, timeouts, and env vars stay in sync automatically</p>
</li>
<li><p><strong>More event types</strong> — Cognito triggers, CloudWatch Events, Kinesis</p>
</li>
<li><p><strong>Snapshot testing</strong> — Assert that response shapes haven't changed across deploys</p>
</li>
<li><p><strong>Error simulation</strong> — Builders for timeout, OOM, and cold start scenarios</p>
</li>
</ul>
<p>The repo is at <a href="https://github.com/brognilucas/sls-testing">github.com/brognilucas/sls-testing</a>. Contributions welcome — especially if you have event types you'd like to see supported.</p>
<hr />
<p><em>Testing serverless applications shouldn't require more boilerplate than the business logic itself. If your test files are 80% fixture setup, something is wrong with the tooling, not with your tests.</em></p>
]]></content:encoded></item><item><title><![CDATA[How to Choose the Right Database for Your Serverless Application]]></title><description><![CDATA[Serverless promises to free teams from infrastructure worries, but picking the wrong database can hurt your performance, increase your costs, and affect developer velocity.
As with everything in softw]]></description><link>https://practicalserverless.blog/how-to-choose-the-right-database-for-your-serverless-application</link><guid isPermaLink="true">https://practicalserverless.blog/how-to-choose-the-right-database-for-your-serverless-application</guid><category><![CDATA[Databases]]></category><category><![CDATA[serverless]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 01 Apr 2026 18:56:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/4ee74504-c9a1-48dd-9c05-9decbe82c607.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Serverless promises to free teams from infrastructure worries, but picking the wrong database can hurt your performance, increase your costs, and affect developer velocity.</p>
<p>As with everything in software, the database choice comes with trade-offs, and understanding what are those are extremely important. Scaling characteristics, connection handling and concurrency, latency, consistency and transactional needs, operational overhead, and pricing model are all factors to consider.</p>
<p>This article unpacks those trade-offs, compares common patterns (serverless‑native databases, managed relational options, caches, and streaming stores), and offers practical rules of thumb so you can pick a database that fits your application rather than creating new operational headaches. By the end, you’ll have a concise checklist to make the decision faster and more confidently.</p>
<h2>Why database choice matters more in serverless</h2>
<p>In traditional servers, database connections are opened once and reused across thousands of requests. Application instances are long‑lived and predictable. Serverless flips that model: functions spin up, live for seconds or minutes, and vanish. Each invocation may be a fresh process with no previous state, no persistent connection, and no guarantee of locality to previous requests. That changes the calculus: connection limits, cold‑start penalties, and per‑operation pricing matter far more than they did in long‑running servers.</p>
<p>A database that works well behind a long‑lived app can cause connection storms, latency spikes, or runaway costs when used directly from a fleet of ephemeral functions. The goal is to match your workload’s requirements (throughput, latency, consistency, transactions) with a storage option whose trade‑offs align with serverless behavior.</p>
<h2>Key trade-offs to weigh</h2>
<ul>
<li><p>Scaling characteristics: Does the database scale horizontally without connection limits or shard coordination that conflicts with ephemeral clients?</p>
</li>
<li><p>Connection handling and concurrency: Can thousands of short‑lived connections be supported efficiently, or do you need a pooling/proxy layer?</p>
</li>
<li><p>Latency: Are single‑digit‑millisecond reads required, or can you accept higher, variable latency?</p>
</li>
<li><p>Consistency and transactions: Do you need strong ACID guarantees across multiple keys/tables, or is eventual consistency acceptable?</p>
</li>
<li><p>Operational overhead: How much maintenance, tuning, backups, and failover handling will your team manage?</p>
</li>
<li><p>Pricing model: Per‑operation, provisioned capacity, or storage‑centric billing—how do patterns of traffic (spiky vs steady) affect cost?</p>
</li>
</ul>
<h2>Common patterns and how they map to serverless</h2>
<ul>
<li><p>Serverless‑native databases (e.g., serverless NoSQL or fully serverless managed stores):</p>
<ul>
<li><p>Pros: Auto‑scaling, connectionless or HTTP/SDK access, fine‑grained billing, low operational overhead.</p>
</li>
<li><p>Cons: Weaker transactional guarantees or complex modeling for relational data; can be expensive at very high sustained throughput.</p>
</li>
<li><p>When to use: Spiky workloads, simple access patterns, evented architectures, or when you want minimal ops.</p>
</li>
</ul>
</li>
<li><p>Managed relational databases (serverless variants or provisioned RDS/Aurora/etc.):</p>
<ul>
<li><p>Pros: Familiar SQL, strong transactions, complex queries.</p>
</li>
<li><p>Cons: Connection limits and scaling challenges; may require connection pooling (proxy, pooler, or Data API) and can incur cold‑start latency.</p>
</li>
<li><p>When to use: Applications that require ACID across multiple records or complex joins and cannot be re‑modeled easily.</p>
</li>
</ul>
</li>
<li><p>Caches and in‑memory stores (Redis, Memcached, or managed variants):</p>
<ul>
<li><p>Pros: Extremely low latency for hot reads, useful for rate limiting, sessions, and ephemeral state.</p>
</li>
<li><p>Cons: Not a durable primary store (unless using persistence features), additional operational cost, eventual consistency with origin store.</p>
</li>
<li><p>When to use: Read‑heavy, low‑latency needs, offloading hotspots from a primary datastore.</p>
</li>
</ul>
</li>
<li><p>Streaming/append logs (Kafka, Kinesis, Pulsar, streaming databases):</p>
<ul>
<li><p>Pros: Durable event delivery, great for event‑sourcing, async processing, and decoupling components.</p>
</li>
<li><p>Cons: Not a drop‑in replacement for arbitrary reads/transactions; requires different application patterns.</p>
</li>
<li><p>When to use: Event‑driven architectures, audit logs, long‑running workflows.</p>
</li>
</ul>
</li>
</ul>
<h2>Practical rules of thumb</h2>
<ul>
<li><p>If your functions open many short‑lived DB connections, use a serverless‑friendly datastore or a connection proxy. Don’t rely on direct DB connections from unpooled functions.</p>
</li>
<li><p>For strong multi‑row/multi‑table transactions choose managed relational options—but consider a serverless (Data API) or pooled access pattern to avoid connection storms.</p>
</li>
<li><p>For spiky traffic with bursty reads, prefer serverless‑native stores and caches; they scale on demand and bill for usage.</p>
</li>
<li><p>If your app can tolerate eventual consistency, embracing key‑value or document models often reduces complexity and cost.</p>
</li>
<li><p>Use streaming stores for durable event capture and decoupling; combine with a materialized view or read store for low‑latency queries.</p>
</li>
<li><p>Measure cost at expected traffic patterns—serverless pricing can be higher for sustained, heavy throughput than for bursty, intermittent use.</p>
</li>
</ul>
<h2>Closing thoughts</h2>
<p>Choosing a database for serverless shouldn’t be guesswork. Match your access patterns and operational constraints to the storage option whose trade‑offs you can live with, and use small experiments to validate latency, scaling, and cost under realistic load. This keeps serverless simple, where it should be—letting your team move faster without trading away reliability or spiraling costs.</p>
]]></content:encoded></item><item><title><![CDATA[Events, Messages & Commands: The Concepts That Make or Break Your Serverless Architecture]]></title><description><![CDATA[You might have created a Lambda function that "handles events." But take a moment to question yourself about what an event actually is.
Let's forget the object that you can access on the lambda, and t]]></description><link>https://practicalserverless.blog/events-messages-commands-the-concepts-that-make-or-break-your-serverless-architecture</link><guid isPermaLink="true">https://practicalserverless.blog/events-messages-commands-the-concepts-that-make-or-break-your-serverless-architecture</guid><category><![CDATA[serverless]]></category><category><![CDATA[event-driven-architecture]]></category><category><![CDATA[events]]></category><category><![CDATA[architecture]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[software design]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 25 Mar 2026 11:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/896866c8-c91d-4cf3-a27b-413cb2c45870.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You might have created a Lambda function that "handles events." But take a moment to question yourself about what an event actually is.</p>
<p>Let's forget the object that you can access on the lambda, and think of its concept: what makes something an event, and not a command, or a message.</p>
<p>In serverless, I believe knowing this concept matters a lot. The whole ecosystem is built on a deeply event-driven model. EventBridge, SQS, SNS, DynamoDB Streams, and S3 notifications all depend on events.</p>
<p>In this post, we'll return to the basics. We'll explain what events really are, how they differ from commands and messages, and why these differences matter in every serverless system you create.</p>
<h2>What is an event</h2>
<p>A few years ago, during a talk by James Eastham, I learned something crucial: an event is a fact and cannot be undone. Exactly, you can't reverse an event. Consider writing a post for this blog; once you publish it, the action is irreversible. The <code>post.published</code> event has already been triggered.<br />You might wonder: if I delete the post, have I undone the action? Not quite. You haven't reversed the publication; instead, you've added another event to the sequence.</p>
<p>That's the essence of an event. In simple terms, an event represents an action that has occurred in the real world within your system.</p>
<h2>What is a Command</h2>
<p>If an event is something that <em>has happened</em>, a command is something you're <em>asking to happen</em>. It's a request, not a fact. And unlike events, commands can be rejected.</p>
<p>Think of it this way: when a user clicks "Publish" on your blog editor, your frontend might send a <code>PublishPost</code> command to your backend. That command can fail. The post might not meet validation rules, the user might not have the right permissions, or the system might be temporarily unavailable. The command is an intention, not a truth.</p>
<p>This distinction has real architectural consequences. Commands generally have an intended recipient. You don't broadcast a command to anyone who might be listening. You send it to the one service or function responsible for handling it. There's an implicit contract: someone is expected to act on it.</p>
<p>In serverless terms, an SQS queue carrying a <code>ResizeImage</code> instruction is a good example of a command channel. One producer, one consumer, one clear responsibility.</p>
<h2><strong>What is a Message</strong></h2>
<p>A message is the broadest of the three. Both events and commands travel as messages. The word "message" tells you about the <em>transport</em>, not the <em>intent</em>.</p>
<p>This is where a lot of confusion creeps in. Developers see SNS delivering a payload and call it "just a message." Technically, yes. But what matters architecturally is what's <em>inside.</em> Is it announcing something that happened, or requesting something to be done?</p>
<p>Getting that wrong leads to systems where consumers start making assumptions they shouldn't. A consumer that receives an event shouldn't be the one deciding whether the action was valid. That ship has sailed. But a consumer that receives a command absolutely should validate it before acting.</p>
<h2><strong>Why These Differences Matter in Serverless</strong></h2>
<p>In a distributed architecture, the distinction between events and commands changes how you design your application, how you deal with errors, and how do you handle a retry logic.</p>
<p>With events, every listener is an observer. They react to facts. If a <code>user.registered</code> event triggers a welcome email Lambda and that function fails, you don't "undo" the registration — you retry the email. The event remains true regardless.</p>
<p>With commands, the linesteners are executors. They own the outcome. A failed <code>ProcessPayment</code> command is not something you silently retry without careful thought. The intent hasn't been fulfilled, and that matters.</p>
<p>EventBridge is a great example of an event bus done right: it's designed around broadcasting facts to multiple consumers. SQS, on the other hand, lends itself naturally to commands. It's point-to-point, with visibility timeouts and dead-letter queues that reflect the expectation that <em>someone must handle this</em>.</p>
<h2>Conclusion</h2>
<p>Understanding the difference between events, commands, and messages is more than academic — it's foundational to building reliable, scalable serverless systems.</p>
<p>Events are immutable facts about things that have already happened; commands are intent to perform an action; messages are the vehicles that convey either. Treating them correctly changes how you design APIs, choose services, handle failures, and reason about system behavior.</p>
<p>Key takeaways and practical guidance:</p>
<ul>
<li><p>Name things clearly: events in past tense (e.g., <code>post.published</code>), commands as imperatives (e.g., <code>createPost</code>), messages as contextual envelopes.</p>
</li>
<li><p>Model events as immutable facts: persist them, append rather than overwrite, and use them to drive downstream state and side effects.</p>
</li>
<li><p>Use commands when you need explicit intent and control over execution (and choose queuing patterns that preserve ordering and retries).</p>
</li>
<li><p>Expect duplicates and out-of-order delivery in distributed systems: make consumers idempotent and design for eventual consistency.</p>
</li>
<li><p>Keep schemas explicit and versioned; consider a registry or strict contracts for producers and consumers.</p>
</li>
<li><p>Pick the right tool for the job:</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[When Messages Fail: How DLQs Save Your Event-Driven System]]></title><description><![CDATA[In recent interviews, I asked candidates a system-design question about managing failures in a serverless, event-driven architecture. I was surprised by how many didn't include retry mechanisms or a D]]></description><link>https://practicalserverless.blog/when-messages-fail-how-dlqs-save-your-event-driven-system</link><guid isPermaLink="true">https://practicalserverless.blog/when-messages-fail-how-dlqs-save-your-event-driven-system</guid><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Wed, 18 Mar 2026 12:04:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69b3cd56c9e75ce33d841724/431a5284-e096-4a57-b755-5fcbf3a6ac7c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In recent interviews, I asked candidates a system-design question about managing failures in a serverless, event-driven architecture. I was surprised by how many didn't include retry mechanisms or a Dead Letter Queue (DLQ) for investigation. In serverless systems, where functions are stateless, and communication often depends on event-driven messaging, failures can be silent and difficult to trace, making proper error handling essential. This gap inspired this article, which explains what a DLQ is, why it is important, and how to use one effectively in your serverless and event-driven workflows.</p>
<p>What is a DLQ?</p>
<p>Before explaining the importance of it, let's make sure we are aligned on what a DLQ is.</p>
<p>Dead Letter Queue, or simply DLQ, is a message queue used to store messages that could not be successfully processed by a consumer. When a message can't be successfully processed, regardless of the reason, instead of losing or keeping it, retrying forever, this message is redirected and stored in the DLQ.</p>
<p>Imagine it as a holding area for problem messages. Instead of letting failures vanish or stop your system, the DLQ catches them. This allows engineers to check, fix, and handle them later without affecting the main process.</p>
<p>Why use a DLQ?</p>
<p>Now that we understand what a DLQ is, let's talk about why you should use one and why not having one is a red flag in any event-driven or message-based architecture.</p>
<p>Prevent message loss.</p>
<p>Without a DLQ, a message that fails to be processed can simply disappear. Depending on your configuration, it might be discarded, leaving no trace of what went wrong. A DLQ ensures that no message is silently dropped. You can count on the fact that every failure is preserved and accounted for.</p>
<p>Avoid infinite retry loops.</p>
<p>Retries are great, and we should absolutely have them. But retries alone are not enough. If a message is fundamentally broken, for instance, with an invalid format or references data that no longer exists, it can lead to retrying it indefinitely, which wastes resources, is not cost-efficient, and potentially blocks other messages from being processed. A DLQ acts as the exit door for those unrecoverable failures.</p>
<p>Improved observability and debugging.</p>
<p>When a message lands in a DLQ, it presents an opportunity. You can examine the payload to understand what caused the failure and enhance your system. Without a DLQ, that context is lost, but with one, it provides a valuable feedback loop for your application's reliability.</p>
<p>A useful practice I've learned over the years is that you can use DLQ payloads for writing tests. This helps identify where errors occurred and serves as documentation for the fix.</p>
<p>Operational safety net</p>
<p>Systems fails that is a fact.</p>
<p>Sooner or later, either the network will be unreachable, the third-party service you're integrating with will go down, or perhaps a bug was introduced into your application and the previous payload isn't acceptable anymore.</p>
<p>A DLQ will provide architectural resilience and ensure that transient failures don't cause permanent data loss. Once the underlying issue is resolved, messages can be reprocessed from the DLQ as if nothing had happened.</p>
<p>In short, Build for Failure, Design for Resilience</p>
<p>Dead Letter Queues are a fundamental safety net for event-driven systems: they prevent silent failures, preserve the context needed for diagnosing issues, and allow teams to address problematic messages without disrupting normal processing. When paired with strong observability and clear operational playbooks, DLQs enhance the reliability and maintainability of event-driven systems.</p>
<p>Quick practical checklist:</p>
<ul>
<li><p>Define sensible retry limits and exponential backoff to ensure only truly problematic messages reach the DLQ.</p>
</li>
<li><p>Capture detailed metadata (timestamps, error reasons, processing context) with each dead-lettered message.</p>
</li>
<li><p>Monitor DLQ size and rate, setting alerts for spikes or stagnation.</p>
</li>
<li><p>Provide tools and processes for safe reprocessing, manual inspection, and automated remediation.</p>
</li>
<li><p>Treat DLQs as integral components in architecture reviews and tests.</p>
</li>
</ul>
<p>Adopting DLQs turns failures into actionable insights, keeping your system resilient and operable under real-world conditions.</p>
<p>Lucas Brogni is a Senior Software Engineer with 10+ years of experience building distributed systems.</p>
]]></content:encoded></item><item><title><![CDATA[Why I'm Writing This]]></title><description><![CDATA[I've been building with serverless since 2021.
Not just tinkering — using it as the primary architectural choice for production systems, advocating for it in hiring conversations, writing about it, gi]]></description><link>https://practicalserverless.blog/why-i-m-writing-this</link><guid isPermaLink="true">https://practicalserverless.blog/why-i-m-writing-this</guid><dc:creator><![CDATA[Lucas Brogni]]></dc:creator><pubDate>Fri, 13 Mar 2026 09:48:20 GMT</pubDate><content:encoded><![CDATA[<p>I've been building with serverless since 2021.</p>
<p>Not just tinkering — using it as the primary architectural choice for production systems, advocating for it in hiring conversations, writing about it, giving talks about it, and making it the backbone of my graduate thesis on cloud-native architecture.</p>
<p>And yet, the question I get asked most often isn't about DynamoDB access patterns or cold start optimization. It's this:</p>
<p><strong>"How do I actually know when I've done it right?"</strong></p>
<p>That question has a longer answer than most people expect. That's what this blog is for.</p>
<hr />
<h2>The gap nobody warns you about</h2>
<p>There's a very seductive version of serverless that gets sold in conference talks and documentation pages. Deploy a function. It scales. You pay nothing when it's idle. Zero infrastructure to manage.</p>
<p>All of that is true. None of it prepares you for production.</p>
<p>The real learning curve in serverless isn't writing functions — it's understanding the <em>execution model</em> well enough to make good decisions under pressure. Why does your function behave differently under concurrent load? Why is that DynamoDB error only happening in production? Why did your SQS queue suddenly back up overnight with no error rate spike?</p>
<p>The answers to these questions all trace back to the same place: how Lambda actually works, and how the services around it actually behave. Not in theory. In practice.</p>
<hr />
<h2>What "practical" means here</h2>
<p>I'm not going to write tutorials that walk you through creating an S3 bucket. There are plenty of those. What I want to write — and what I wish had existed when I was figuring this out — is the thinking behind the decisions.</p>
<p>Why you should treat the handler as an entry point and nothing more.<br />Why idempotency isn't optional the moment you introduce asynchronous processing.<br />Why that IAM wildcard that "works fine" is a problem you haven't encountered yet.<br />Why your local environment is an approximation, and which differences will actually matter.</p>
<p>Each post here is going to take a concept that looks simple from the outside — and show you what it actually looks like from the inside of a running production system.</p>
<hr />
<h2>Where this comes from</h2>
<p>My day job is backend engineering on a growth team at a SaaS company. We run a serverless-first stack on AWS: Lambda, DynamoDB, SQS, EventBridge, API Gateway. I've shipped billing systems, built MCP-powered tooling, modernized test infrastructure, and handled zero-downtime schema migrations — all within this architecture.</p>
<p>I've also made most of the mistakes worth making. Misconfigured IAM roles that only failed at runtime. A trigger loop I caught in staging, barely. An SQS processor that quietly stopped processing because I hadn't understood partial batch failures. An observability gap that turned a 20-minute incident into a 3-hour one.</p>
<p>That's not a credentials flex. It's context. The patterns I write about here have been tested in the only environment that really matters.</p>
<hr />
<h2>What's coming</h2>
<p>I'll publish roughly twice a month. No rigid structure — just whatever's most worth writing about. Some posts will be conceptual, building the mental models that underpin everything else. Some will be deeply technical: specific patterns, concrete code, tradeoffs spelled out in full.</p>
<p>A few topics already in the pipeline:</p>
<ul>
<li><p><strong>The execution environment, actually explained</strong> — what init, invoke, and shutdown mean for the code you write every day</p>
</li>
<li><p><strong>Why your tests pass and production still breaks</strong> — the serverless testing gap and how to close it</p>
</li>
<li><p><strong>IAM for people who don't want to read the entire IAM docs</strong> — least privilege, per-function roles, and the wildcards that will eventually hurt you</p>
</li>
<li><p><strong>Idempotency from scratch</strong> — because "process it once" is harder than it sounds when Lambda will retry anything that fails</p>
</li>
</ul>
<p>If there's something specific you've been struggling with, I want to hear it. The goal of this blog is to be useful — not to document what I already know, but to address the questions you're actually asking.</p>
<hr />
<p>One more thing.</p>
<p>Serverless isn't perfect. It's not always the right choice. I'll say so when it isn't. The best thing I can offer here isn't enthusiasm — it's honesty about where the edges are and what happens when you hit them.</p>
<p>Let's get into it.</p>
<hr />
<p><em>Lucas Brogni is a Senior Software Engineer with 10+ years of experience building distributed systems.</em></p>
]]></content:encoded></item></channel></rss>