Event Time vs Processing Time

6 min read

Reading Progress0%
Streaming Systems Index
Tier 1 -- Foundations
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design
Streaming Systems Index
Tier 1 -- Foundations
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design

Event Time vs Processing Time

1. What Is It?

is the timestamp embedded in the event itself — when the thing actually happened in the real world (click occurred at 14:03:12.044). is the wall clock on the machine running the stream processor when it sees the event ( received this at 14:03:18.770). They are almost never equal, because networks, retries, mobile-app backgrounding, and processing lag all delay events.

The problem this distinction solves: if you compute "clicks per minute" using , the results shift depending on how fast your pipeline is running and whether your consumers are caught up. Using , the answer is the same whether you compute now, after a one-hour outage, or during a backfill. Without event time, all your time-window analytics are wrong as soon as anything is late.

QUICK CHECK

A streaming pipeline counts user logins per minute to power a dashboard. During a network outage lasting 30 minutes, events backed up and were delivered late. When the pipeline processed the backlog, the login counts for the outage window appeared artificially low. Which approach would produce accurate counts for the outage window?

Choose one answer

2. How It Works

  1. The (the mobile app, the web server, the IoT device) stamps the event with its own clock and includes that timestamp in the payload.
  2. The stream processor extracts the event-time field when it ingests the event.
  3. The processor uses to assign events to windows ("this click belongs to the 14:03 minute window"), independent of when the event arrived.
  4. Because events can arrive out of order or late, the processor uses watermarks to decide "no more events older than T are likely to arrive — it is safe to finalize windows up to T."
  5. , in contrast, is just System.currentTimeMillis() at the operator — used when latency is the only concern and skew doesn't matter (e.g. heartbeats, monitoring).

Concrete example. "Sessions per minute" from a mobile app:

  • 14:00:00 — user clicks. Phone offline.
  • 14:00:00–14:05:00 — events queued locally.
  • 14:05:01 — phone reconnects, batch of 12 events flush to backend, arrive at the stream processor 14:05:03.
  • window: All 12 events fall into the 14:05 minute. The 14:00 minute shows zero. The chart spikes at 14:05.
  • window: Each event lands in its true minute. The 14:00 minute eventually reaches 12 once the late events arrive. The chart correctly attributes activity to when it happened.
QUICK CHECK

A mobile app tracks user sessions and sends click events to a stream processor. Due to intermittent connectivity, a batch of 12 events that occurred between 14:00 and 14:01 arrives at the processor at 14:06. A stream processor configured to use event time with watermarks processes these events. What happens to those 12 events?

Choose one answer

3. What Mid-Senior SWEs Actually Need to Know

  • Embedded timestamps must be trustworthy. If your has a wrong clock, your event-time analytics are wrong. For untrusted clients, stamp at the edge (load balancer, gateway) as soon as the event arrives in your trust boundary.
  • Watermarks gate window emission. A 1-minute keyed by does not emit at processing wall-clock 14:01:00 — it emits when the crosses 14:01:00, and the configured allowed lateness controls how long after that late events can still update the window before its state is purged.
  • Idle partitions break watermarks. Watermarks are computed per and the global is the min across partitions. A that stops receiving events freezes the watermark and stalls windowed outputs forever — until you configure idle partition detection.
  • is a policy decision. You set: "discard events older than N seconds past the watermark," "side-output them to a fix-up ," or "fire updates retroactively." Each costs something different.
  • is the right choice when latency matters more than correctness: alerting on "no heartbeat for 30s," rate-limiting, debug counters.
  • Common misunderstanding: "Why does my window not fire?" — Almost always: no events arrived to advance the watermark, an idle source held it back, or the timestamp extractor is broken (returning 0 or processing-time).

Quick usage (DataStream):

DataStream<Click> withTimestamps = clicks.assignTimestampsAndWatermarks(
    WatermarkStrategy
        .<Click>forBoundedOutOfOrderness(Duration.ofSeconds(10))
        .withTimestampAssigner((click, recordTimestamp) -> click.getEventTimeMillis())
        .withIdleness(Duration.ofMinutes(1))
);

4. Tradeoffs & Decisions

If you need...Use...Choose the other when...
Correct analytics regardless of pipeline lag, replayabilityEvent timeYou don't trust producer clocks and edge-stamping is impossible
Lowest possible latency, no concept of "lateness"Processing timeWindow correctness depends on real-world time of the event
Reproducible results across reruns / backfillsEvent timeThe pipeline is purely operational with no historical accuracy requirement
Stable behavior during catch-up after a consumer outageEvent timeYou actually want results to reflect "when we processed it" (e.g. SLO violation alerts)

Key tradeoff: correctness vs latency. + watermarks delays window emission until you're reasonably sure events have arrived (or the allowed lateness expires). emits immediately but the values shift under any pipeline lag.

Secondary tradeoff: trust. is only as honest as the source clock. The further from a controlled environment (server log) toward an uncontrolled one (mobile, IoT), the more you should consider edge-stamping or capping how far back an event can claim to be.

QUICK CHECK

A mobile app streams user-activity events to a backend analytics pipeline. After a 2-hour consumer outage, the pipeline catches up by processing a large backlog of events. Your goal is to ensure that windowed aggregations (e.g., hourly active users) reflect when events actually occurred, not when the pipeline happened to process them. Which approach best achieves this, and what is its primary cost?

Choose one answer

5. Interview & System Design Cheat Sheet

  • and are different clocks; almost all correctness mistakes in streaming come from mixing them up.
  • is what makes a streaming job deterministic across replays and backfills — process the same input twice, get the same window outputs.
  • Watermarks are the mechanism that makes event-time windows finite — without watermarks, "the 14:00 minute" stays open forever waiting for one more late event.
  • A frozen = a stalled output. The first place to look is idle partitions, then a broken timestamp extractor, then a single straggler lagging.
  • For mobile / IoT, edge-stamp event time on ingress as a backstop against bad client clocks while still preserving the -claimed timestamp.

Common follow-ups:

  • "What's a , in one sentence?" — A promise that no event with timestamp ≤ T will be observed after the watermark passes T (subject to the allowed lateness escape hatch).
  • "How do you choose the allowed lateness?" — From the observed distribution of processing_time - event_time in production. P99 of that distribution is a reasonable floor; weigh emission delay against fraction of events you're willing to drop.
  • "Backfill is dumping a week of historical events into the pipeline. How do windows behave?" — With event time, exactly as they would have at the time — the watermark advances rapidly because events come in pre-aged. With , all backfilled events fall into "now" and you get garbage.

If asked to design X, anchor on this: If the answer involves "per minute," "per hour," "session," or "trailing window," you need event time and watermarks. Stating that out loud — and the implied allowed-lateness policy — is what separates senior from junior streaming designs.

QUICK CHECK

A streaming pipeline processes clickstream events and computes hourly session counts. During a backfill, you replay one week of historical events through the pipeline. Which behavior would you observe depending on whether windows are keyed on event time vs. processing time?

Choose one answer