Event Time vs Processing Time
6 min read
Streaming Systems Index
Tier 1 -- Foundations
Event-Driven Foundations
Kafka Mental Model
Stream Processing Landscape
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design
Streaming Systems Index
Tier 1 -- Foundations
Event-Driven Foundations
Kafka Mental Model
Stream Processing Landscape
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design
Event Time vs Processing Time
1. What Is It?
is the timestamp embedded in the event itself — when the thing actually happened in the real world (click occurred at 14:03:12.044). is the wall clock on the machine running the stream processor when it sees the event ( received this at 14:03:18.770). They are almost never equal, because networks, retries, mobile-app backgrounding, and processing lag all delay events.
The problem this distinction solves: if you compute "clicks per minute" using , the results shift depending on how fast your pipeline is running and whether your consumers are caught up. Using , the answer is the same whether you compute now, after a one-hour outage, or during a backfill. Without event time, all your time-window analytics are wrong as soon as anything is late.
A streaming pipeline counts user logins per minute to power a dashboard. During a network outage lasting 30 minutes, events backed up and were delivered late. When the pipeline processed the backlog, the login counts for the outage window appeared artificially low. Which approach would produce accurate counts for the outage window?
2. How It Works
- The (the mobile app, the web server, the IoT device) stamps the event with its own clock and includes that timestamp in the payload.
- The stream processor extracts the event-time field when it ingests the event.
- The processor uses to assign events to windows ("this click belongs to the 14:03 minute window"), independent of when the event arrived.
- Because events can arrive out of order or late, the processor uses watermarks to decide "no more events older than T are likely to arrive — it is safe to finalize windows up to T."
- , in contrast, is just
System.currentTimeMillis()at the operator — used when latency is the only concern and skew doesn't matter (e.g. heartbeats, monitoring).
Concrete example. "Sessions per minute" from a mobile app:
- 14:00:00 — user clicks. Phone offline.
- 14:00:00–14:05:00 — events queued locally.
- 14:05:01 — phone reconnects, batch of 12 events flush to backend, arrive at the stream processor 14:05:03.
- window: All 12 events fall into the 14:05 minute. The 14:00 minute shows zero. The chart spikes at 14:05.
- window: Each event lands in its true minute. The 14:00 minute eventually reaches 12 once the late events arrive. The chart correctly attributes activity to when it happened.
A mobile app tracks user sessions and sends click events to a stream processor. Due to intermittent connectivity, a batch of 12 events that occurred between 14:00 and 14:01 arrives at the processor at 14:06. A stream processor configured to use event time with watermarks processes these events. What happens to those 12 events?
3. What Mid-Senior SWEs Actually Need to Know
- Embedded timestamps must be trustworthy. If your has a wrong clock, your event-time analytics are wrong. For untrusted clients, stamp at the edge (load balancer, gateway) as soon as the event arrives in your trust boundary.
- Watermarks gate window emission. A 1-minute keyed by does not emit at processing wall-clock 14:01:00 — it emits when the crosses 14:01:00, and the configured allowed lateness controls how long after that late events can still update the window before its state is purged.
- Idle partitions break watermarks. Watermarks are computed per and the global is the min across partitions. A that stops receiving events freezes the watermark and stalls windowed outputs forever — until you configure idle partition detection.
- is a policy decision. You set: "discard events older than N seconds past the watermark," "side-output them to a fix-up ," or "fire updates retroactively." Each costs something different.
- is the right choice when latency matters more than correctness: alerting on "no heartbeat for 30s," rate-limiting, debug counters.
- Common misunderstanding: "Why does my window not fire?" — Almost always: no events arrived to advance the watermark, an idle source held it back, or the timestamp extractor is broken (returning 0 or processing-time).
Quick usage (DataStream):
DataStream<Click> withTimestamps = clicks.assignTimestampsAndWatermarks( WatermarkStrategy .<Click>forBoundedOutOfOrderness(Duration.ofSeconds(10)) .withTimestampAssigner((click, recordTimestamp) -> click.getEventTimeMillis()) .withIdleness(Duration.ofMinutes(1)) );
4. Tradeoffs & Decisions
| If you need... | Use... | Choose the other when... |
|---|---|---|
| Correct analytics regardless of pipeline lag, replayability | Event time | You don't trust producer clocks and edge-stamping is impossible |
| Lowest possible latency, no concept of "lateness" | Processing time | Window correctness depends on real-world time of the event |
| Reproducible results across reruns / backfills | Event time | The pipeline is purely operational with no historical accuracy requirement |
| Stable behavior during catch-up after a consumer outage | Event time | You actually want results to reflect "when we processed it" (e.g. SLO violation alerts) |
Key tradeoff: correctness vs latency. + watermarks delays window emission until you're reasonably sure events have arrived (or the allowed lateness expires). emits immediately but the values shift under any pipeline lag.
Secondary tradeoff: trust. is only as honest as the source clock. The further from a controlled environment (server log) toward an uncontrolled one (mobile, IoT), the more you should consider edge-stamping or capping how far back an event can claim to be.
A mobile app streams user-activity events to a backend analytics pipeline. After a 2-hour consumer outage, the pipeline catches up by processing a large backlog of events. Your goal is to ensure that windowed aggregations (e.g., hourly active users) reflect when events actually occurred, not when the pipeline happened to process them. Which approach best achieves this, and what is its primary cost?
5. Interview & System Design Cheat Sheet
- and are different clocks; almost all correctness mistakes in streaming come from mixing them up.
- is what makes a streaming job deterministic across replays and backfills — process the same input twice, get the same window outputs.
- Watermarks are the mechanism that makes event-time windows finite — without watermarks, "the 14:00 minute" stays open forever waiting for one more late event.
- A frozen = a stalled output. The first place to look is idle partitions, then a broken timestamp extractor, then a single straggler lagging.
- For mobile / IoT, edge-stamp event time on ingress as a backstop against bad client clocks while still preserving the -claimed timestamp.
Common follow-ups:
- "What's a , in one sentence?" — A promise that no event with timestamp ≤ T will be observed after the watermark passes T (subject to the allowed lateness escape hatch).
- "How do you choose the allowed lateness?" — From the observed distribution of
processing_time - event_timein production. P99 of that distribution is a reasonable floor; weigh emission delay against fraction of events you're willing to drop. - "Backfill is dumping a week of historical events into the pipeline. How do windows behave?" — With event time, exactly as they would have at the time — the watermark advances rapidly because events come in pre-aged. With , all backfilled events fall into "now" and you get garbage.
If asked to design X, anchor on this: If the answer involves "per minute," "per hour," "session," or "trailing window," you need event time and watermarks. Stating that out loud — and the implied allowed-lateness policy — is what separates senior from junior streaming designs.
A streaming pipeline processes clickstream events and computes hourly session counts. During a backfill, you replay one week of historical events through the pipeline. Which behavior would you observe depending on whether windows are keyed on event time vs. processing time?
Glossary History
Click dotted jargon to save explanations here.
Glossary History
Click dotted jargon to save explanations here.