In This Article
When a switch goes down at 2 AM, the difference between knowing in 30 seconds and knowing in 5 minutes is the difference between a brief blip and a cascading outage that takes half your network with it. Most monitoring dashboards make you wait — you refresh the page, stare at stale data, and hope the next poll cycle catches the problem. That lag is where incidents grow from minor to major.
With Down Device v3.0.0, we're shipping real-time WebSocket updates for device monitoring. Your dashboard now reflects the actual state of your infrastructure within 30 seconds of a change — no manual refresh, no polling lag, no guesswork. This post covers why we built it, how the architecture works under the hood, and the smart alerting features that come with it.
Why Real-Time Matters
Traditional monitoring dashboards rely on the browser polling the API at intervals. You load the page, see a snapshot of your infrastructure, and that snapshot is already aging. If your polling interval is 60 seconds and a device goes offline one second after your last poll, you won't see it for another 59 seconds — at best. In practice, most dashboards poll less frequently to reduce server load, and many require a full page refresh to fetch updated data.
For IT admins managing dozens or hundreds of devices, this creates a real operational problem:
- Delayed incident detection. You're watching the dashboard, but the dashboard is lying to you. A device went down 45 seconds ago and the status still shows green. You don't start investigating until the next refresh, and by then the outage has had time to cascade.
- Manual refresh fatigue. During an active incident, operators compulsively refresh the page to get current data. That's wasted cognitive effort that should be spent diagnosing and resolving the problem.
- Missed state transitions. A device might go down and recover between two poll intervals. You never see the outage, never investigate the root cause, and the intermittent failure continues until it becomes a hard failure.
- Stale NOC displays. If you have a monitoring dashboard on a wall screen in your NOC or server room, a polling-based display can show "all clear" while devices are actively offline. That defeats the entire purpose of a status board.
Real-time delivery via WebSockets eliminates all of these problems. Status changes arrive at the browser the moment they're detected — not on the next poll cycle, not on the next page refresh, but within seconds of the actual event.
What We Built
Down Device v3.0.0 introduces a WebSocket-based real-time update system for device monitoring. Here's what it delivers:
- 30-second check intervals. Monitoring workers check each device every 30 seconds via ICMP or SNMP. Every check result is published the moment it completes.
- Instant dashboard updates. When a device changes state — online to offline, offline to online, or a response time spike — your browser receives the update in real time. No refresh needed. The dashboard simply updates.
- Persistent WebSocket connections. Your browser maintains an open WebSocket connection to the Down Device API. Updates are pushed to you as they happen, rather than you requesting them on a timer.
- Automatic reconnection. If the WebSocket connection drops — due to a network interruption, a laptop waking from sleep, or a brief API restart — the client automatically reconnects and resynchronizes without any user action.
- Redis pub/sub backbone. The entire system is built on Redis pub/sub, which provides the low-latency message routing needed to move check results from distributed workers to connected browsers in milliseconds.
What This Means in Practice
Open your Down Device dashboard and leave it open. When a device goes offline anywhere in your infrastructure, you'll see the status change within 30 seconds — automatically. No clicking, no refreshing, no waiting. This works whether you're watching one device or a thousand.
How It Works: The Real-Time Pipeline
The real-time system is a pipeline with four stages. Each stage is designed to be fast, reliable, and independently scalable. Here's how a device status change goes from your network to your browser.
Stage 1: Workers Execute Checks
ARQ workers — lightweight async task processors — run monitoring checks against your devices on a 30-second cycle. Each worker picks up check tasks from a Redis queue, executes the ICMP ping or SNMP query, and records the result: the device's status (online, offline, degraded), response time, packet loss, and any error information.
Workers are distributed across monitoring regions and connected via WireGuard VPN, so checks run close to your infrastructure regardless of where the Down Device API is hosted. A worker in your region pings your device, and the result is available in under a second.
Stage 2: Workers Publish to Redis
As soon as a check completes, the worker publishes the result to a Redis pub/sub channel. Redis pub/sub is a message broadcasting system — when a message is published to a channel, every subscriber on that channel receives it immediately. There's no queue to drain, no batch window to wait for. The message goes out the instant it arrives.
Each account has its own pub/sub channel, which means the API server only receives updates relevant to the accounts its connected clients belong to. This keeps message volume manageable even at scale and ensures strict tenant isolation — you never receive data about another customer's devices.
Stage 3: API Server Broadcasts via WebSocket
The Down Device API server subscribes to the relevant Redis pub/sub channels for each connected user. When a check result arrives on a channel, the API server immediately forwards it to every WebSocket client authenticated to that account.
This is where the multi-tenant architecture matters. The API server maintains a mapping of WebSocket connections to account IDs. When a message arrives on an account's channel, it's routed only to browsers that belong to that account. The server never broadcasts data to the wrong tenant, and the filtering happens at the connection level — not in the browser.
Stage 4: Browser Receives and Renders
The browser's WebSocket client receives the JSON payload containing the updated device status, parses it, and updates the dashboard in place. There's no full-page re-render, no API call to fetch the latest state, no flash of loading spinners. The specific device row in your dashboard updates its status indicator, response time, and last-checked timestamp — and that's it.
If the WebSocket connection is interrupted for any reason, the client automatically reconnects using an exponential backoff strategy. On reconnection, it fetches the latest state via a standard API call to ensure nothing was missed during the gap, then resumes listening for real-time updates.
The Full Path in Numbers
Worker checks device (ICMP round-trip: ~5–50ms) → publishes to Redis (~1ms) → API server receives and routes (~1ms) → WebSocket delivers to browser (~10–50ms over the internet). Total pipeline latency from check completion to dashboard update: typically under 100 milliseconds. Combined with the 30-second check interval, you see status changes within 30 seconds of an actual state change on your network.
Email Alerts with Smart Cooldown
Real-time dashboard updates are powerful when you're watching the screen, but you can't watch a dashboard 24/7. That's where email alerts come in — and getting email alerts right is harder than it sounds.
Starting with v3.1.0, Down Device sends email notifications when a device goes offline and when it comes back online. Every alert includes the device name, IP address, the time the state change was detected, and the monitoring region that observed it. You get the information you need to start investigating without having to log in and look it up.
But anyone who has managed a monitoring system knows the real problem with email alerts: alert fatigue. A device with an intermittent connection might flap between online and offline every few minutes, generating a flood of emails that bury the alerts that actually matter. Your inbox fills up with "Device X is offline" / "Device X is online" pairs, and you start ignoring all of them — including the one that says your core router just went down.
Down Device addresses this with cooldown protection. After sending an offline alert for a device, the system enforces a cooldown period before sending another alert for the same device. If the device flaps back online and then offline again within the cooldown window, you don't get a second wave of emails. You get one offline alert, one recovery alert when the device stabilizes, and silence in between.
This approach gives you two critical properties:
- You always know about new incidents. The first offline event for any device always triggers an alert. You're never in the dark about a device going down.
- You're never buried by flapping. Repeated state changes for the same device are suppressed during the cooldown period. Your inbox stays manageable, and every alert you do receive is worth reading.
Notification Preferences
Not everyone on your team needs the same alerts. The network engineer responsible for core infrastructure wants to know about every switch and router state change. The developer who added a test server to monitoring doesn't want emails about it at 3 AM. The billing admin doesn't need device alerts at all.
With v3.2.0, Down Device introduces per-user notification preferences. Each team member can independently control their alert settings:
- Offline alerts. Toggle email notifications for device-offline events on or off. Enabled by default for all users.
- Online (recovery) alerts. Toggle email notifications for device-recovery events on or off. Some users want to know the moment a device comes back; others only care about the initial outage and will check recovery on the dashboard.
These preferences are per-user, not per-account. Each person on your team configures their own notification settings without affecting anyone else. An admin can have both offline and online alerts enabled, while a viewer on the same account can have alerts disabled entirely.
This matters for teams of any size. Even a two-person team benefits from being able to have one person receive all alerts while the other only checks the dashboard during business hours. For larger teams with on-call rotations, individual notification preferences mean the on-call engineer gets alerts while everyone else sleeps undisturbed.
Coming Next
Notification preferences in v3.2.0 cover email alerts for device online/offline events. We're working on expanding this to include additional channels (Slack, webhook, SMS) and per-device alert rules in future releases. The foundation is built — the granularity will keep growing.
What This All Adds Up To
Real-time WebSocket updates, smart email alerts with cooldown protection, and granular notification preferences work together to solve a single problem: making sure you know about infrastructure issues the moment they happen, without drowning in noise.
Here's what changes in your day-to-day workflow:
- Your dashboard is always current. Open it once, leave it open. Every status change arrives automatically. No more refresh-and-wait.
- Your inbox is actionable. When you get an alert email, it means something. Cooldown protection ensures you're not sifting through duplicate notifications to find the one that matters.
- Your team isn't over-alerted. Each person controls their own notification preferences. The right people get the right alerts, and nobody else gets woken up.
These features ship as part of Down Device v3.0.0 through v3.2.0. If you're already a Down Device user, WebSocket updates are live now — just open your dashboard. Email alerts and notification preferences roll out in the subsequent releases.
See Your Infrastructure in Real Time
Down Device delivers device status updates to your browser within 30 seconds via WebSockets. Pair that with smart email alerts and per-user notification preferences, and you have monitoring that keeps you informed without overwhelming you. Free plan available — no credit card required.
Start Free TrialWrapping Up
Monitoring that makes you wait isn't really monitoring — it's periodic checking with a pretty interface. Real-time delivery changes the dynamic. Your dashboard becomes a live view of your infrastructure, not a snapshot that was accurate 60 seconds ago. Email alerts reach you within seconds of an actual state change, filtered through cooldown logic so you trust every notification you receive.
The architecture behind it — workers publishing to Redis pub/sub, the API server routing messages to authenticated WebSocket connections, the browser updating in place — is designed to be fast and reliable at scale. Whether you're monitoring 10 devices or 10,000, the pipeline delivers updates in under 100 milliseconds from check completion to dashboard render.
If you've been refreshing your monitoring dashboard to check on devices, that stops today. Start your free trial and see what real-time monitoring actually looks like, or reach out to our team if you have questions about how it fits your infrastructure.