Passive Acoustic Footstep Classification in Tunnel Environments

A human walking through a 1.8-meter concrete tunnel at normal pace — approximately 1.2 m/s — produces a seismic-acoustic signature that is, in principle, identifiable. The footstep contact impulse generates a broadband ground vibration signal peaking in the 10–200 Hz range, with the exact spectral content depending on gait, footwear, floor surface, and the acoustic properties of the tunnel structure. The signal is detectable. The harder problem is classification: distinguishing that signal from the ambient noise environment of an operational tunnel, which may include dripping or running water, rock micro-fracture events (popping sounds from stress redistribution), mechanical ventilation fans, and, in active mining contexts, heavy equipment vibration that can exceed the footstep signal by 30–40 dB.

This article covers what the technical literature reports about footstep detection accuracy, where those reports diverge from operational reality, and what the feature engineering and classifier choices look like in practice.

The Seismic Footstep Signal: What You're Actually Measuring

A footstep generates two coupled signals: an airborne acoustic pressure wave and a ground-coupled seismic wave. In tunnel environments, both propagate simultaneously, but with very different characteristics. The airborne acoustic signal travels at roughly 340 m/s, attenuates at approximately 6 dB per doubling of distance in free field, and undergoes strong multipath reflection in concrete corridors that can create standing-wave patterns in the frequency domain. The seismic ground wave travels at 200–600 m/s in concrete (depending on aggregate composition and cure state), attenuates more slowly with distance (Rayleigh wave geometric spreading follows 1/√r rather than 1/r for spherical spreading), and is less affected by airborne multipath.

For detection and localization purposes, geophones or accelerometers mounted in contact with the floor or wall are generally more reliable than microphones in high-noise-floor environments, because the ground-coupled signal competes against a different (usually lower-energy) noise floor than the airborne channel. The tradeoff is bandwidth: geophones are sensitive in the 1–200 Hz range but have limited response above that; MEMS accelerometers extend to several kHz but have higher noise floors at low frequency. The relevant footstep energy is concentrated below 200 Hz, which makes geophones a natural choice for this application, despite their bulkier form factor relative to microphones.

Feature Extraction: MFCC and Spectral Approaches

The signal processing pipeline for footstep classification follows a general acoustic event recognition architecture: preprocessing (filtering, normalization), feature extraction, and classification. The dominant feature extraction approaches in published literature for footstep and footfall detection are:

Mel-frequency cepstral coefficients (MFCCs): the standard representation for audio event classification, derived by applying a mel-scale filterbank to the power spectrum and computing the DCT. MFCCs capture the spectral envelope shape in a compact representation (typically 13–20 coefficients per frame) that is somewhat invariant to recording distance and excitation level. For tunnel footstep detection, MFCC features need to be computed over a frame window that captures at least one full footstep cycle — for walking at 1.2 m/s, a window of 100–200 ms covers the heel-strike to toe-off interval.
Spectral centroid and bandwidth: simpler single-number features that characterize the "center of mass" and spread of the power spectrum. Less informative than MFCCs for multi-class classification, but computationally cheap and useful as secondary features in ensemble classifiers.
Short-time Fourier transform (STFT) spectrogram features: direct time-frequency representation, often fed to convolutional neural network classifiers trained to recognize the visual pattern of footstep events in spectrogram images. Computationally more expensive than MFCC but can capture temporal-spectral patterns that MFCCs aggregate out.
Envelope-based cadence features: the rhythmic periodicity of walking (approximately 1.7–2.2 steps per second at normal gait) is itself a detection feature. Autocorrelation of the signal envelope over 2–5 second windows can detect periodic footstep cadence against aperiodic background noise. This is particularly useful for detecting presence (is someone walking?) rather than localization (where are they?).

Classifier Choices and Published Accuracy

Published accuracy figures for footstep classification in controlled laboratory settings are generally high — reported detection rates of 85–95% with false positive rates below 5% appear in academic literature covering seismic footstep detection in building floors, soil surfaces, and simulated tunnel geometries. Support vector machines (SVMs) with MFCC features and random forest classifiers with multi-feature inputs are the most commonly reported approaches in older literature. More recent work uses convolutional neural networks (CNNs) on spectrogram features, with reported improvements of 5–10 percentage points over SVM baselines in controlled conditions.

These numbers deserve careful interpretation. Controlled laboratory conditions for footstep detection research typically involve quiet environments, a single known surface type, and footstep signals recorded at known short distances. Tunnel environments introduce several confounds that controlled studies rarely capture:

Variable ambient noise floor: water flow in drainage channels, ventilation fan harmonics, and rock stress events produce signals with spectral content overlapping the footstep band. In wet tunnels with active drainage, the noise floor in the 20–100 Hz band can exceed typical footstep signal levels by 10–20 dB at 10+ meter source distances.
Unknown surface-structure coupling: footstep-to-geophone signal transmission efficiency depends strongly on the coupling between tunnel floor and the geophone mounting. A geophone cast into concrete has very different transfer characteristics from one sitting in loose gravel, and the same footstep at the same distance may produce 15–20 dB difference in measured amplitude depending on mounting.
Variable gait and footwear: a person wearing hard-soled boots on a concrete floor produces a very different impulse signature than a person wearing soft-soled shoes on a dirt floor. Classifiers trained on one footwear/surface combination show significant performance degradation on unseen combinations in our internal testing — a finding consistent with what other practitioners report in open-source defense research.

We want to be clear about the limits of what we can claim here: our internal field characterization data covers a limited set of surface types and tunnel geometries, and generalizing from that to universal performance figures would be misleading. The honest statement is that footstep classification in clean controlled conditions is a solved problem; footstep classification in operational tunnels with complex noise environments is still a work in progress across the field, not just for us.

The Multi-Source Noise Problem

The three primary interference sources in tunnel seismic-acoustic environments deserve individual treatment because they fail in different frequency bands and require different mitigation approaches.

Water flow: dripping water produces discrete impulse events with broadband spectral content that can masquerade as footsteps at high event rates. Running water (streamflow in drainage channels) produces continuous broadband noise from 10 Hz to several kHz with spectral characteristics that shift with flow rate. Adaptive noise cancellation using a reference sensor in the drainage channel can reduce this interference by 15–20 dB in the overlapping band, at the cost of an additional sensor and the complexity of designing an effective cancellation filter.

Rock settlement and micro-fracture: in active tunnels or those under geologically active overburden, stress redistribution produces acoustic emissions that range from single sharp clicks (micro-fracture events) to rolling low-frequency rumbles (large-scale settlement). Micro-fracture events are particularly problematic because their impulse shape is similar to a footstep heel-strike at short distances. Distinguishing them from footsteps requires either multi-node TDOA localization (to check whether the apparent source location moves coherently as a walking person would) or waveform features that exploit the different decay characteristics of mechanical fracture versus floor-coupled footstep impacts.

Mechanical ventilation: ventilation fans in tunnels typically operate at 10–60 Hz fundamental frequency with harmonics extending to several hundred Hz. Fan vibration couples to tunnel structure and produces persistent periodic signals that can saturate STFT spectrogram features in the bands most relevant for footstep detection. Notch filtering at known fan frequencies is the straightforward mitigation, but requires fan frequency monitoring (which changes with load) and complicates the signal chain.

TDOA Localization and Classification Synergy

The most operationally useful configuration for passive acoustic footstep detection in tunnel corridors is not a single-node classifier but a multi-node array with TDOA-based localization feeding into event classification. Here is why: many of the ambiguous events — rock settlement, water flow transients — produce seismic signatures that are stationary in space. A walking person produces a sequence of events that propagate along the tunnel at 1–1.5 m/s. A TDOA array with three or more nodes spaced 10–20 meters apart can compute the apparent source position for each detected event. If successive events show spatially coherent motion at human walking speed, the classification confidence goes up substantially even if the individual event signatures are ambiguous.

This architecture — where localization and classification are coupled rather than independent — is more complex to implement and requires the inter-node timing accuracy discussed in our earlier piece on GPS-denied timing. But it eliminates a whole class of false positives that single-node classification cannot resolve. The operational value of that false positive reduction is significant for tunnel threat detection: an alert threshold that triggers on every ambiguous seismic event in a high-noise-floor tunnel will be ignored after the first week of deployment.

Practical Deployment Parameters

Based on what the literature reports and what we've characterized in internal testing (with the caveats noted above), some practical starting-point parameters for tunnel footstep detection systems:

Geophone spacing for single-person coverage in a 1.8m concrete tunnel: 15–25 meters between nodes in quiet environments; 8–12 meters in high-noise-floor environments (reduces to manageable SNR at those distances)
Minimum detectable footstep SNR for MFCC-based classification: approximately 6–10 dB above noise floor in the 20–100 Hz band
Detection latency target: under 2 seconds from footstep event to alert generation; this requires on-node or near-node compute, not round-trip to a cloud classifier
Walking speed estimation accuracy via cadence analysis: ±0.2 m/s for steady gait; degrades substantially for stop-and-go movement patterns

The detection problem is tractable in many operational tunnel environments. The classification problem — distinguishing human footsteps from other tunnel events with operational reliability — requires careful noise characterization specific to the deployment environment before committing to a sensor spacing and algorithm design. No footstep classifier designed for one tunnel type should be assumed to transfer without re-validation to a different tunnel, even one that appears physically similar.