Your room is saturated with radio signals. Every second, your router emits signals that pass through walls, bounce off furniture, and scatter off your body. These signals carry rich information about the space and the objects within it — we just usually ignore the signal itself and only care about the data packets it delivers.
For over a decade, a group of researchers has been asking a question: if we analyze the physical characteristics of these signals carefully, how much can we infer about the people in a space? The answer is more than most people expect, but less than science fiction suggests. WiFi can indeed “see through” walls in a limited sense, but what it sees is not a video feed — it is a set of statistical patterns: whether someone is present, what they are doing, how fast they are breathing. The real state of this field is far more interesting, and far more complicated, than the popular narrative.
The things WiFi/RF sensing can do fall along a clear capability gradient.
The easiest is presence detection: is someone in the room? This is nearly trivial at the signal level, because a human body significantly alters the statistical properties of multipath signals. Slightly harder is activity recognition: walking, sitting down, falling, cooking. Further up the ladder is respiration and heart rate monitoring, which exploits the subtle signal fluctuations caused by chest micro-movements, achieving breathing rate accuracy within ±1 BPM. Harder still is skeletal pose estimation — recovering the 3D coordinates of a dozen or more body joints from radio signals. The hardest is static scene reconstruction: using radio signals to generate image-like spatial maps.
This gradient reflects a fundamental physical principle: motion produces change, and change is easy to detect. When a person walks, the Doppler shift and multipath variations in the signal are pronounced. When a person stands still, they are nearly indistinguishable from a piece of furniture at the signal level. Breathing detection works precisely because the chest still exhibits weak periodic motion. Fully static scene reconstruction requires isolating extremely faint target reflections from strong noise — orders of magnitude harder.
This also explains why the vast majority of WiFi sensing results focus on human activity recognition and pose estimation, while through-wall imaging remains largely at the proof-of-concept stage.
The research timeline begins in 2013, when Dina Katabi’s team at MIT published Wi-Vi. Using two transmit antennas and one receive antenna, it employed an elegant interference cancellation scheme to suppress the strong reflections from walls and furniture, isolating only the signal changes caused by moving people. This was the first convincing academic demonstration that radio signals in the WiFi frequency band could detect human presence and coarse position through walls.
After Wi-Vi, the same team iterated rapidly. WiTrack (2014) achieved through-wall 3D tracking by measuring the time of flight of radio reflections, with a median error of 10–13 centimeters. RF-Capture (2015) went further, recovering a 3D skeletal outline of the human body from reflected signals using approximately 20 antennas and under 2 GHz of bandwidth. These early systems ran on USRP software-defined radio platforms costing $2,000 or more per unit, paired with custom antenna arrays. They proved physical feasibility but were far from consumer products.
In parallel, a separate line of work used cheaper commercial NICs. Intel 5300 and Atheros chipsets could export CSI (Channel State Information). Based on temporal patterns in CSI variation, researchers achieved recognition of activities such as walking, sitting, and falling. The hardware barrier was much lower, but CSI extraction depended on specific NIC models and patched drivers — not something any WiFi device could do.
2018 marked a turning point. RF-Pose, also from Katabi’s team, was published at CVPR. It used a CNN to estimate 2D skeletal pose from FMCW radar signals in the WiFi frequency band. Its key innovation was the training methodology: cameras and OpenPose provided labels for the RF signals. Once training was complete, the cameras could be removed and the system operated on RF signals alone, including through walls. RF-Pose3D extended this to 3D skeletons and multi-person scenarios that same year. This cross-modal supervision paradigm had far-reaching influence, solving the fundamental difficulty that humans cannot directly annotate radio signals.
In recent years the trend has shifted toward practicality and standardization. GoPose (2021) achieved 14-joint 3D skeletal estimation with commercial WiFi devices, with an average error of about 4.5 centimeters. SenseFi (2023) released the first open-source WiFi CSI sensing benchmark library. WiMANS (2024) published the first multi-user simultaneous activity sensing dataset. CSI-Bench (2025) collected over 460 hours of data across 26 real-world environments, covering fall detection, breathing monitoring, localization, and more. The research community is moving from individual capability demonstrations to systematic evaluation and generalization testing.
A decade of research accumulation, and the results look impressive. But a conspicuous fact remains: WiFi through-wall sensing has yet to produce a large-scale consumer application. The reasons come in layers.
The first layer is hardware fragmentation and naming confusion. CSI extraction is not an open interface in standard WiFi protocols. Early research relied heavily on the Intel 5300 NIC (discontinued a decade ago), Atheros chipsets requiring custom firmware, or expensive USRP software-defined radios. The more recent ESP32-S3 costs only a few dollars, but it is a 2.4 GHz single-antenna IoT chip — a different device category from a home router. More importantly, landmark systems like RF-Pose actually used FMCW radar rather than standard WiFi signals, just operating in the WiFi frequency band. All of these get categorized as “WiFi sensing” in the literature, but they differ fundamentally in engineering deployability. The vast majority of papers claiming “WiFi sensing” use hardware that bears little resemblance to the router currently working in your home.
The second layer is environmental generalization. CSI is extremely sensitive to room layout. A different room, a moved piece of furniture, an opened door — any of these change the multipath pattern, potentially invalidating a trained model. Most experiments take place in controlled labs with fixed rooms, a handful of subjects, and predefined action lists. Performance drops substantially in real home environments. Cross-domain generalization is widely acknowledged as the greatest technical challenge.
The third layer is multi-person scenarios. When multiple people are present simultaneously, their reflected signals superimpose, making individual separation very difficult. The emergence of the WiMANS dataset shows the community recognizes this problem, but solutions remain at an early exploratory stage.
The fourth layer is privacy tension. WiFi sensing is often positioned as a privacy-friendly alternative to cameras because it captures no images. But through-wall sensing capability itself raises a novel class of privacy concerns: if a neighbor’s router can sense your activity inside your home, does that protect privacy or violate it? This tension has no easy answer.
The preceding sections mapped out the field through its capability gradient and research evolution. If you want to understand the physical foundations of these capabilities — how WiFi signals actually accomplish these feats, and why presence detection is easy while static scene reconstruction is nearly impossible — this section unpacks the underlying signal mechanics.
When a router transmits a signal, it does not travel in a single straight line to your phone. The signal reflects and scatters off walls, floors, ceilings, furniture, and human bodies, arriving at the receiver via multiple paths. Each path has a different length, so each copy arrives at a different time. Each reflection alters the signal’s amplitude and phase. What the receiver sees is the superposition of all these copies.
In communications, this is a nuisance. Signals from different paths interfere with each other — constructively at some frequencies, destructively at others — causing frequency-selective fading. WiFi protocols invest substantial engineering effort to combat this.
From a sensing perspective, however, this multipath structure is the useful signal. Each reflection path encodes information about the position, size, and material of a reflecting object. If something in the environment moves — a person walking from the living room to the kitchen — the paths reflecting off that person change in delay, amplitude, and phase, and the overall channel response shifts accordingly. The question becomes: can we measure this channel response with enough resolution to extract useful spatial information?
This brings us to CSI, the core concept. CSI is a per-frequency sample of the channel transfer function. For each transmit-receive antenna pair, at each OFDM subcarrier frequency k, the channel is described by a complex number H(k) with both amplitude and phase components. Amplitude tells you how much the signal is attenuated at that frequency; phase tells you the effective delay it has experienced. Lining up H(k) across all subcarriers gives a discrete sampling of the channel’s frequency response.
The number of sample points depends on the hardware. The Intel 5300 NIC, widely used in early research, reports 30 subcarriers per antenna pair across a 20 MHz channel — 270 complex values total in a 3×3 MIMO configuration. The later Atheros chipset provides 114 subcarriers across 40 MHz, roughly four times the frequency-domain resolution.
It is worth explaining how CSI differs from ordinary signal strength (RSSI). RSSI is a single scalar that most WiFi devices can report — it compresses the entire channel’s received power into one number. When a person moves, some subcarriers may strengthen due to constructive interference while others weaken due to destructive interference, but RSSI averages these out, discarding a large amount of information. CSI preserves the per-subcarrier frequency-selective structure, giving downstream sensing algorithms a far richer input. This is why CSI-based sensing dramatically outperforms RSSI-based methods.
WiFi’s OFDM modulation is naturally suited for channel sensing. OFDM divides the available bandwidth B into N narrowband subcarriers, each experiencing approximately flat fading, so H(k) at each subcarrier is a clean channel sample. Together, they provide a discrete Fourier sampling of the channel frequency response.
Here lies a critical physical constraint: range resolution Δr ≈ c/(2B), where c is the speed of light and B is total bandwidth. At 20 MHz, WiFi’s range resolution is about 7.5 meters. At 80 MHz, about 1.9 meters. At 160 MHz (WiFi 6), about 0.94 meters. For comparison, a 4 GHz FMCW millimeter-wave radar achieves about 3.75 centimeters. This gap is the fundamental bottleneck in WiFi’s spatial resolving power. More subcarriers can improve channel estimation accuracy, but total bandwidth sets the ceiling on resolution.
MIMO provides resolution along a different axis: angle. Multiple transmit and receive antennas form a virtual antenna array, with each pair observing the reflection scene from a slightly different spatial vantage point. Through beamforming or spatial spectrum estimation algorithms (MUSIC, ESPRIT), the system can estimate the angle of arrival of reflected paths. More antennas and a wider array aperture yield finer angular resolution. A 3×3 MIMO system provides 9 independent spatial channels — far fewer than the dozens or hundreds of elements in a phased-array radar, but enough in certain scenarios to distinguish reflection sources at different angles.
With the measurement framework established, the core difficulty of through-wall sensing becomes clear. A wall is a large, close, smooth reflector. Its reflected signal is three to five orders of magnitude stronger than the reflection from a human body. This dominant “wall flash” overwhelms H(f) at the receiver, burying the human contribution beneath what is effectively the measurement’s noise floor.
In communications this can be ignored — data transmission does not care whether the channel is shaped by a wall or a person. In sensing, it is the central challenge: the signal contribution from your detection target (a person) is suppressed by several orders of magnitude beneath the background you do not care about (walls, furniture).
Wi-Vi, introduced earlier, solved this with MIMO transmit nulling. The system adjusts the amplitude and phase of the two TX antennas so that all static-object reflections destructively interfere at the receiver. Concretely, if the static-scene channel from TX1 to RX is h₁ and from TX2 to RX is h₂, the system sets the TX2 signal to −(h₁/h₂) times the TX1 signal. The static component sums to zero at the receiver. After calibration, the residual signal comes only from moving people. In communications this technique suppresses interference at a neighboring receiver; Wi-Vi turned it around to suppress the entire static scene.
Wi-Vi’s approach reveals a more general principle: WiFi sensing is inherently good at detecting motion and inherently poor at sensing static scenes.
A moving person continuously alters the multipath structure. Reflected path lengths change, producing Doppler shifts (a person walking at 1 m/s at 5 GHz creates approximately a 33 Hz shift). Reflected amplitudes and phases change as well. These time-varying signatures are straightforward to isolate through temporal differencing, high-pass filtering, or Doppler spectral analysis. The static background cancels in the differencing; only the moving component survives.
A stationary person produces a constant reflection, indistinguishable in character from a wall or a chair. Temporal differencing and Doppler filtering are powerless against it. The only discriminator is the absolute spatial signature — shape, position, reflectivity — of the person versus the furniture, which demands the kind of fine spatial resolution that WiFi’s limited bandwidth and small antenna arrays struggle to provide.
Breathing detection occupies an interesting middle ground. Normal respiration produces chest wall displacement of approximately 1 to 12 millimeters. At 5 GHz (wavelength 60 mm), a 5 mm chest displacement changes the round-trip path by 10 mm, yielding roughly a 60-degree phase shift. This tiny periodic phase variation (at 0.2 to 0.33 Hz, i.e., 12 to 20 breaths per minute) can be detected with carefully designed phase-tracking algorithms. But if a person’s chest movement is too small to register above the phase noise floor — extremely shallow breathing, or a measurement range too large for sufficient signal-to-noise ratio — that person becomes indistinguishable from a chair at the signal level.
This is the fundamental reason dynamic sensing and static scene reconstruction differ so dramatically in difficulty. Dynamic sensing can exploit temporal variation as a free background-subtraction mechanism. Static scene reconstruction requires resolving each object’s spatial contribution from absolute channel measurements — demanding bandwidth and antenna counts far beyond what WiFi offers.
With a clear picture of both the technical capability boundary and the commercialization headwinds, we can assess which applications are most likely to leave the lab first. The best candidates are those tolerant of lower precision, deployed in relatively fixed environments, with a clear value proposition.
Fall detection and presence monitoring for elderly care is a strong candidate. It only needs to detect coarse-grained information — is someone there, did they fall — with no requirement for skeletal precision. Deployment environments are typically fixed bedrooms or living rooms, making environmental generalization manageable. And it addresses a real, high-value pain point. Sleep breathing monitoring is similar: tracking sleep quality via breathing-induced signal micro-variations, with a fixed deployment location and a single-purpose scenario.
Room-level occupancy detection for smart homes also has commercial potential: knowing which rooms are occupied to control lighting and HVAC. This is far simpler than activity recognition and demands much less signal resolution.
By contrast, high-precision through-wall pose estimation and real-time multi-person tracking are more likely to remain in specialized domains (security, search and rescue) than to reach the consumer market in the foreseeable future.
In 2025, IEEE ratified the 802.11bf standard — an important milestone for WiFi sensing. It defines sensing-capable protocol modifications at the PHY and MAC layers, covering sub-7 GHz and 60 GHz bands, and is backward-compatible with 802.11ax and 802.11be.
The significance of 802.11bf is this: it gives WiFi chipset vendors a unified sensing interface specification. Future WiFi chips implementing this standard can natively provide CSI data without firmware hacks or specific NIC models. This addresses the hardware fragmentation layer.
But there is a time gap between a standard’s ratification and chips reaching mass production, another gap before consumers purchase new routers, and yet another before an application ecosystem develops around router-level sensing. 802.11bf is the starting point for product infrastructure. It provides the necessary conditions for WiFi sensing to move from academia to engineering, but a standard alone is far from sufficient. Environmental generalization, multi-person scenarios, and privacy frameworks are problems the standard itself does not solve.
WiFi through-wall sensing is a field where the physics has been thoroughly validated, the engineering is not yet ready, and the market is still waiting for conditions to mature. Its story looks more like the typical long cycle from basic research to industrial deployment than a technology that will change the world overnight.