ShedLock: Distributed Scheduling Without the Chaos

It was a perfectly ordinary Wednesday morning. I was reviewing the logs from our notification service — the one that sends a daily digest email to users — when I noticed something unsettling. Three copies of the same email had gone out to every user. Three. And the timestamps? Within milliseconds of each other.

The culprit wasn’t a bug in the business logic. It was something more subtle: we were running three pods in Kubernetes, and each one had happily fired the @Scheduled task at exactly the same time. The scheduler has no idea that other instances of itself exist. It just does its job — which in this case, was sending three times as many emails as intended.

That’s the day I added ShedLock to our project. And I haven’t looked back since.

The Problem: @Scheduled in a Multi-Pod World

Spring’s @Scheduled annotation is fantastic for single-instance applications. It’s simple, declarative, and works beautifully. But the moment you scale horizontally — which in 2025 means pretty much always, given Kubernetes — you get into trouble:

┌─────────────────────────────────────────────────────────────────────┐
│                     Kubernetes Cluster                               │
│                                                                      │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐     │
│  │     Pod 1       │  │     Pod 2       │  │     Pod 3       │     │
│  │                 │  │                 │  │                 │     │
│  │  @Scheduled     │  │  @Scheduled     │  │  @Scheduled     │     │
│  │  cron="0 8 * *" │  │  cron="0 8 * *" │  │  cron="0 8 * *" │     │
│  │       │         │  │       │         │  │       │         │     │
│  │       ▼         │  │       ▼         │  │       ▼         │     │
│  │  sendDigest()   │  │  sendDigest()   │  │  sendDigest()   │     │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘     │
│                                                                      │
│  ⚠️  All three pods fire independently at 08:00                     │
│  ⚠️  3 emails sent per user instead of 1                            │
└─────────────────────────────────────────────────────────────────────┘

You might think: “just use Quartz with a JDBC cluster store.” And you’d be right — for complex, stateful jobs with cron triggers, persistence, and misfire handling, Quartz is the proper tool. But for simpler, stateless scheduled tasks, Quartz brings a lot of overhead: 11 database tables, a full scheduler setup, job and trigger configuration… For sending a daily digest, that’s a lot of ceremony.

There’s a better-proportioned tool for this job.

What ShedLock Actually Is

Let me be very precise here, because I’ve seen people misunderstand this tool:

ShedLock is not a scheduler. It does not replace @Scheduled. It does exactly one thing: it ensures that a scheduled task is executed at most once at the same time across all running instances of your application.

It does this by writing a time-limited lock to a shared store (a database table, Redis, MongoDB, etc.) before executing the task. The first pod to write the lock wins and runs the task. Every other pod sees the lock, shrugs, and skips silently.

┌─────────────────────────────────────────────────────────────────────┐
│                  With ShedLock                                       │
│                                                                      │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐     │
│  │     Pod 1       │  │     Pod 2       │  │     Pod 3       │     │
│  │  @Scheduled     │  │  @Scheduled     │  │  @Scheduled     │     │
│  │  @SchedulerLock │  │  @SchedulerLock │  │  @SchedulerLock │     │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘     │
│           │                    │                    │               │
│           └────────────────────┼────────────────────┘               │
│                                │ try to acquire lock                 │
│                                ▼                                     │
│                   ┌────────────────────────┐                        │
│                   │  Shared Lock Store     │                        │
│                   │  (PostgreSQL / Redis)  │                        │
│                   └────────────────────────┘                        │
│                                                                      │
│   ✅ Pod 1 acquires the lock → runs the task                        │
│   ⏭️  Pod 2 sees the lock → skips silently                          │
│   ⏭️  Pod 3 sees the lock → skips silently                          │
└─────────────────────────────────────────────────────────────────────┘

The Mechanism: One Row, One Winner

The way ShedLock works under the hood is elegant in its simplicity. It uses a single database table (one row per lock name) and a clever atomic upsert to determine which pod gets to run the task:

-- What ShedLock executes under the hood (PostgreSQL dialect):
INSERT INTO shedlock (name, lock_until, locked_at, locked_by)
VALUES ('sendDailyDigest', NOW() + INTERVAL '10 minutes', NOW(), 'pod-1-hostname')
ON CONFLICT (name) DO UPDATE
  SET lock_until = EXCLUDED.lock_until,
      locked_at  = EXCLUDED.locked_at,
      locked_by  = EXCLUDED.locked_by
WHERE shedlock.lock_until <= NOW();  -- only if the existing lock has expired

-- If WHERE is false → 0 rows affected → this pod SKIPS the task
-- If WHERE is true  → 1 row updated  → this pod RUNS the task

The atomicity of the upsert is the critical piece. Because INSERT ... ON CONFLICT ... WHERE is evaluated in a single database operation, there’s no race condition. Even if all three pods fire simultaneously, only one of them will get the row update — the database’s transaction isolation takes care of the rest.

The Lock Table Schema

CREATE TABLE shedlock (
    name       VARCHAR(64)  NOT NULL,   -- unique lock identity
    lock_until TIMESTAMP    NOT NULL,   -- auto-expiry safety net (lockAtMostFor)
    locked_at  TIMESTAMP    NOT NULL,   -- when the lock was acquired
    locked_by  VARCHAR(255) NOT NULL,   -- pod hostname that holds the lock
    PRIMARY KEY (name)
);

That’s it. One table, four columns. Compare that to Quartz’s 11 tables and you start to appreciate the proportionality of this tool.

┌──────────────────────────────────────────────────────────────────────────────┐
│  shedlock table — live state                                                 │
├──────────────────────┬─────────────────────────┬──────────────────────┬─────┤
│ name                 │ lock_until              │ locked_at            │ ... │
├──────────────────────┼─────────────────────────┼──────────────────────┼─────┤
│ "sendDailyDigest"    │ 2026-03-08 08:10:00     │ 2026-03-08 08:00:00  │ ... │
│ "generateReport"     │ 2026-03-08 02:05:00     │ 2026-03-08 02:00:00  │ ... │
└──────────────────────┴─────────────────────────┴──────────────────────┴─────┘
       │                        │
       │                        └─ If NOW() > lock_until → lock is free
       └─ one row per distinct lock name

Setting It Up — Step by Step

1. Maven Dependencies

<!-- Core ShedLock + Spring integration -->
<dependency>
    <groupId>net.javacrumbs.shedlock</groupId>
    <artifactId>shedlock-spring</artifactId>
    <version>5.10.2</version>
</dependency>

<!-- JDBC provider (uses your existing DataSource) -->
<dependency>
    <groupId>net.javacrumbs.shedlock</groupId>
    <artifactId>shedlock-provider-jdbc-template</artifactId>
    <version>5.10.2</version>
</dependency>

No Redis, no ZooKeeper, no extra infrastructure. If you already have a relational database in your stack — and you almost certainly do — you’re good to go.

2. Configuration Bean

@Configuration
@EnableScheduling
@EnableSchedulerLock(
    defaultLockAtMostFor = "10m",   // hard safety cap: max time a lock can be held
    defaultLockAtLeastFor = "1m"    // minimum hold time even if task finishes fast
)
public class ShedLockConfiguration {

    @Bean
    public LockProvider lockProvider(DataSource dataSource) {
        return new JdbcTemplateLockProvider(
            JdbcTemplateLockProvider.Configuration.builder()
                .withJdbcTemplate(new JdbcTemplate(dataSource))
                .withTableName("shedlock")
                .usingDbTime()  // use DB clock, not JVM clock (clock-skew resilient)
                .build()
        );
    }
}

One thing I want to highlight: usingDbTime(). This tells ShedLock to use the database server’s clock for all timestamp operations, rather than the JVM’s system clock. In a Kubernetes cluster where pods might have very slight clock drift between them, this eliminates an entire class of subtle timing bugs. Always use it.

3. Annotate Your Task

// Before ShedLock — runs on ALL pods simultaneously
@Scheduled(cron = "0 0 8 * * *")
public void sendDailyDigest() {
    userService.findAllActiveUsers()
        .forEach(emailService::sendDigest);  // ⚠️ 3x sends in a 3-pod cluster
}

// After ShedLock — runs on exactly ONE pod
@Scheduled(cron = "0 0 8 * * *")
@SchedulerLock(
    name           = "sendDailyDigest",  // unique name in the shedlock table
    lockAtMostFor  = "9m",               // override default: max lock duration
    lockAtLeastFor = "30s"               // min hold: prevents fast-task re-runs
)
public void sendDailyDigest() {
    userService.findAllActiveUsers()
        .forEach(emailService::sendDigest);  // ✅ exactly 1 send per pod cluster
}

Two annotations. No changes to the business logic inside the method. No lock acquisition code, no try/finally, no shared state. The entire coordination logic lives in the Spring AOP layer, completely transparent to your code.

The Lock Timing Parameters — Getting Them Right

This is where most people get tripped up. ShedLock has two timing parameters, and both matter:

lockAtMostFor   — "If I crash while holding this lock, release it after X time"
lockAtLeastFor  — "Even if I finish in 1ms, hold the lock for at least X time"

Let me walk through the lock lifecycle on a timeline to make this concrete:

Example: @Scheduled(fixedRate = 600000)  // every 10 minutes
         lockAtMostFor  = "9m"
         lockAtLeastFor = "1m"

T=0:00  Pod 1 fires → acquires lock → lock_until = T+9m
        Pod 2 fires → sees lock (lock_until > now) → SKIPS
        Pod 3 fires → sees lock (lock_until > now) → SKIPS

T=0:15  Pod 1 finishes the task (took only 15 seconds)
        → ShedLock updates lock_until to T=1:00 (lockAtLeastFor kicks in)
        → Pod 2 and 3 would still skip if they re-checked now

T=1:00  lockAtLeastFor expires → lock is naturally released
        (no other pod is trying at this moment — they fired at T=0)

T=10:00 Next @Scheduled cycle fires on all pods
        Lock has been released long ago → Pod 1 (or 2 or 3) acquires it again ✅

--- Crash scenario ---
T=0:00  Pod 1 acquires lock → lock_until = T+9m
T=2:00  Pod 1 crashes (JVM killed mid-task)
T=9:00  lock_until expires (lockAtMostFor safety net fires)
T=10:00 Pod 2 fires, acquires lock → resumes processing ✅

Key rule: lockAtMostFor should ALWAYS be < scheduling interval
          to prevent permanent lock starvation

The lockAtLeastFor parameter is the one I see overlooked most often. Without it, if your task finishes in 50 milliseconds and you have 3 pods with slightly different startup times, all three pods might run the task before any lock has a chance to be seen by the others. Set it to something reasonable — at least a few seconds, or a minute for tasks that run on intervals of several minutes.

What Can Go Wrong?

ShedLock is reliable, but it’s not magic. Here are the failure scenarios you should understand before putting it in production:

┌────────────────────────────────────┬─────────────────────────────┬──────────────────────────────────────┐
│ Failure Scenario                   │ What Happens                │ Mitigation                           │
├────────────────────────────────────┼─────────────────────────────┼──────────────────────────────────────┤
│ Pod crashes while holding lock     │ Lock stuck until expiry     │ lockAtMostFor safety cap             │
│ All pods down at scheduled time    │ Task doesn't run at all     │ Accept it, or add catch-up logic     │
│ Database unavailable               │ All pods skip the task      │ DB high availability (replication)   │
│ Task runs longer than lockAtMostFor│ Two pods run concurrently   │ Set lockAtMostFor < schedule interval │
│ Two services share same lock name  │ Silent conflict / one skips │ Namespace your lock names            │
│ lockAtLeastFor too short           │ Fast task may run twice     │ Always set a sane minimum            │
└────────────────────────────────────┴─────────────────────────────┴──────────────────────────────────────┘

The Overlap Scenario in Detail

The most dangerous failure mode is when the task takes longer than lockAtMostFor:

⚠️ BAD — lockAtMostFor >= scheduling interval

@Scheduled(fixedRate = 600000)        // fires every 10 min
@SchedulerLock(lockAtMostFor = "15m") // lock expires after 15 min

T=0:00   Pod 1 acquires lock → task starts (running slowly...)
T=10:00  Pod 2 fires → lock still held (lock_until = T+15m) → SKIPS ✅
T=15:00  lock_until expires (Pod 1 still running!)
T=20:00  Pod 2 fires → acquires lock → starts task
         Pod 1 is STILL running the previous execution 💥 overlap!

✅ CORRECT — lockAtMostFor < scheduling interval

@Scheduled(fixedRate = 600000)        // fires every 10 min
@SchedulerLock(lockAtMostFor = "9m")  // lock expires after 9 min (1m before next fire)

→ If the task is still running at T=9m, something is seriously wrong.
  At least the overlap exposure window is T=9m to T=10m, not indefinite.

Honest Pros and Cons

What ShedLock does really well

Zero infrastructure overhead. Reuses your existing DataSource. One table, four columns. No Redis, no ZooKeeper, no sidecar containers.
AOP transparency. Your task method has no lock-related code in it. It’s completely business-logic clean. Makes testing straightforward too.
Dead simple migration. Swap the lock provider (JDBC → Redis → MongoDB) by changing one dependency and one bean. Zero changes in annotated methods. ShedLock supports 20+ providers.
Clock-skew resilience. With usingDbTime(), all lock timestamps are evaluated by the database clock, not individual pod JVM clocks. Minor clock drift between pods is a non-issue.
Silent skips. When a pod can’t acquire the lock, nothing happens. No exception, no error log (by default). Multi-pod deployments stay clean and quiet.
Predictable recovery. A pod crash always resolves within lockAtMostFor. No operator intervention needed, no manual cleanup of stale locks.

Where it falls short

Not a scheduler — easy to misconfigure. If you remove @Scheduled, the task never runs. If you remove @SchedulerLock, it runs on all pods. Both must be present. This has caught colleagues off guard.
At-most-once, not at-least-once. If all pods are down when the scheduled time fires, the task simply doesn’t run. Unlike Quartz (which persists job state and handles misfires), ShedLock has no concept of “I missed a run, let me catch up.” For many tasks, this is fine. For critical business jobs, it isn’t.
Database dependency for the lock itself. If your database is unavailable, every pod will fail to acquire the lock and skip execution. For an app that’s already DB-dependent this is usually an acceptable correlated failure — but worth being explicit about.
lockAtLeastFor adds latency. If you need to manually re-trigger a task for operations reasons (say, an engineer hits a “/trigger-now” endpoint), lockAtLeastFor means you’re waiting for that minimum time to elapse. It’s a real friction point in incident response.
No monitoring out of the box. ShedLock provides no metrics, no health endpoints, no dashboard. You’ll need to add your own observability: query the shedlock table periodically, expose lock state in Actuator health, or emit a metric when a lock is skipped.
Lock name collisions are silent. If two different scheduled tasks in different services share the same lock name (and the same database schema), one will silently prevent the other from running. Namespace your lock names — always prefix with the service or module name.

ShedLock vs. Quartz Clustering — When to Use Which

┌────────────────────────────┬──────────────────────────────┬──────────────────────────────┐
│ Concern                    │ ShedLock                     │ Quartz Clustering            │
├────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ Setup complexity           │ 1 table, 2 annotations       │ 11 tables, full config       │
│ Job persistence            │ No (stateless)               │ Yes (state in DB)            │
│ Misfire handling           │ No                           │ Yes (configurable policies)  │
│ Cron scheduling            │ Via @Scheduled               │ Native                       │
│ Job parameters             │ Not supported                │ JobDataMap                   │
│ Cluster coordination       │ Simple "one wins" lock       │ Full distributed scheduler   │
│ Good for                   │ Simple @Scheduled protection │ Complex, persistent jobs     │
│ Infrastructure footprint   │ Minimal                      │ Significant                  │
│ At-least-once guarantee    │ No                           │ Yes (misfire recovery)       │
└────────────────────────────┴──────────────────────────────┴──────────────────────────────┘

Rule of thumb:
  → Stateless, simple periodic tasks (reports, digests, cleanup)?  → ShedLock
  → Stateful, persistent, complex scheduling with misfire recovery? → Quartz
  → Both in the same service?                                       → Absolutely valid

Operational Tips from the Trenches

Always namespace lock names. Use a prefix that uniquely identifies your service: "notification-service.sendDailyDigest" instead of just "sendDailyDigest". One day your service will share a database schema with another service, and you’ll thank yourself.
Monitor the shedlock table. A simple query in your monitoring stack alerts you if locks are stuck or tasks aren’t running as expected:

-- Are any locks currently held?
SELECT name, locked_by,
       lock_until,
       EXTRACT(EPOCH FROM (lock_until - NOW())) / 60 AS minutes_remaining,
       lock_until > NOW() AS is_locked
FROM shedlock
ORDER BY locked_at DESC;

Force-release a stuck lock when needed. If a pod is confirmed dead but the lock hasn’t expired yet, you can release it manually. Use with caution:

-- Force-expire a stuck lock (use only when the holding pod is confirmed dead)
UPDATE shedlock
SET lock_until = NOW() - INTERVAL '1 second'
WHERE name = 'notification-service.sendDailyDigest';

Keep lockAtMostFor shorter than your scheduling interval. This is the golden rule. If your task fires every 10 minutes, set lockAtMostFor to 9 minutes. This guarantees the lock always expires before the next cycle, preventing starvation.
Use usingDbTime() always. You get clock-skew resilience for free. There’s no downside.
Test the skipping behavior explicitly. ShedLock provides a SimpleLock interface you can mock in tests. Verify that your application handles the “task was skipped” path correctly — especially for tasks where downstream systems might notice the absence.

Wrapping Up

ShedLock is one of those tools that earns its place in the stack not by doing many things, but by doing one thing perfectly. If you run Spring Boot services in a horizontally scaled environment and you have any @Scheduled tasks that should run exactly once per cluster — not per pod — ShedLock is the proportionate, low-ceremony solution.

Just remember what it is and what it isn’t. It’s a distributed lock coordinator for scheduled tasks. It’s not a scheduler, not a job store, and not a replacement for Quartz. Use it for simple periodic tasks where “run once per cluster” is the requirement and “missed execution” is an acceptable edge case. For everything more complex than that, reach for Quartz — or combine both, which is a perfectly valid and often sensible architecture.

After that Wednesday email incident, I’ve added ShedLock to every microservice that has a @Scheduled task in it. The setup takes about 15 minutes, the annotation takes about 30 seconds, and I’ve never had a duplicate-execution incident since. That’s the kind of return on investment I can get behind.

Have a question, or a ShedLock horror story of your own? Drop a comment below — I’m always up for a good distributed systems conversation.