Chapter 04

Radio Frequency Interference & Spectral Kurtosis Filtering

Detecting and excising human-made contamination with adaptive statistical tests, known-RFI catalogs, and ON/OFF cadence logic.

Author: Saman Tabatabaeian — Deep Field Labs MitraSETI Tutorial Series

You already read spectrograms as images of power versus time and frequency, and you know that de-Doppler search integrates along lines of constant drift to find narrowband carriers. This chapter addresses the obstacle that dominates every sensitive wideband SETI pipeline: radio frequency interference (RFI)—human-made emission that masquerades as science. We define RFI and its taxonomy, review classical mitigation tricks and their limits, then build spectral kurtosis (SK) from basic statistics. Finally we connect adaptive thresholds, known-RFI catalogs, and ON/OFF cadence into the layered strategy MitraSETI uses in practice.

1. What Is RFI (Radio Frequency Interference)?

Radio frequency interference is any human-made radio signal that contaminates astronomical data. It is not a calibration bug or a software glitch: it is real electromagnetic energy from transmitters, digital electronics, and power systems, captured by the same antennas and backends that record cosmic emission.

Where RFI comes from

Modern civilization floods the spectrum. Representative sources include:

Satellite navigation and communication: GPS (L1 ≈ 1575.42 MHz, L2 ≈ 1227.60 MHz), GLONASS, Galileo, Iridium (~1616–1626.5 MHz), Starlink and other megaconstellations in L band and Ku band.
Cellular infrastructure: LTE and 5G base stations and handsets across allocated bands (roughly hundreds of MHz to a few GHz depending on region and band plan).
Local area networking: Wi‑Fi at ~2.4 GHz and ~5 GHz, Bluetooth, and other ISM-band devices.
Domestic and industrial electronics: Microwave ovens (~2.45 GHz leakage), switching power supplies, poorly shielded USB and display cables.
Radar and sensing: Aircraft radar, weather radar (often S band around ~2.7–3 GHz), automotive radar in the tens of GHz.
Power and broadcast: Power-line harmonics and arcing, analog and digital TV, FM radio, and paging systems where they still exist.

None of these need to be "pointed at" the telescope. Side lobes, scattering, cable pickup, and site electronics couple interference into the data stream.

Scale of the problem

RFI is often vastly stronger than the astrophysical or technosignature-like signals we hope to detect. Order-of-magnitude arguments are common in the literature: terrestrial leakage can exceed weak cosmic carriers by factors of 10⁶ or more in power, depending on band, site, and integration. That is why sensitivity alone is not enough—you must reject enormous numbers of false paths before human review is tractable.

Why RFI is the number-one enemy of SETI

Narrowband SETI searches look for concentrated power in frequency, sometimes drifting in time because of Doppler. RFI often looks exactly the same:

A terrestrial transmitter produces a narrowband ridge in a spectrogram.
If the source is moving relative to the observatory—think of a low-Earth-orbit satellite passing overhead—the observed frequency can drift, mimicking a celestial line unless you model geometry carefully.

For these reasons, most threshold crossings are not E.T. They are RFI, instrumental effects, or rare astrophysical narrowband emitters. RFI is the default explanation for a candidate until observational design and layered filters say otherwise.

2. Types of RFI

RFI is diverse. Grouping it by morphology helps you choose mitigations.

Figure 4.1 — RFI taxonomy. Each type has a distinct spectrogram signature, guiding the choice of mitigation strategy.

Broadband RFI

Broadband interference affects many frequency channels at once. Examples include arcing, some digital buses, and power-line events. On a spectrogram it often appears as broad horizontal features (elevated power across much of the band for a short time) or a raised noise floor over many channels.

Narrowband RFI

Narrowband RFI sits in one channel or a small cluster of channels—classic carrier-like emission from a radio transmitter, a GPS line, or a local oscillator leaking at a fixed offset. Visually it is a vertical or slowly wandering ridge at nearly constant frequency.

Pulsed and intermittent RFI

Intermittent RFI turns on and off: radar rotations, Wi‑Fi beacon and data frames, frequency-hopping links, and devices that transmit in bursts. The waterfall shows dashes, dots, or regular flashes rather than a solid line.

Persistent RFI

Persistent RFI is always present for the duration of the observation: local oscillator leakage, a continuous broadcast, or a nearby always-on transmitter. It may be narrowband or structured, but it does not "go away" when you wait a few seconds.

Moving-source RFI

Moving-source RFI comes from platforms whose line-of-sight velocity changes: satellite passes, aircraft, and sometimes automobiles. The observed frequency drifts in time for the same reason a Doppler-shifted siren changes pitch—so drift alone does not prove an extraterrestrial origin. This is one reason de-Doppler search must be paired with pointing logic, multi-beam tests, or catalog checks, not used in isolation.

3. Traditional RFI Mitigation

Before spectral kurtosis and machine-learning stacks, astronomers relied on simpler tools. Each has a place; each has clear failure modes.

Frequency blanking

Frequency blanking means zeroing or masking channels (or whole sub-bands) known to be bad—GPS bands, radar allocations, and so on. It is simple and fast, but crude: you lose sensitivity everywhere those frequencies might have carried science, and static masks do not track new services or drifting leaks.

Time blanking

Time blanking discards time steps when total power or some statistic exceeds a threshold—useful during lightning, radar sweeps, or someone opening a microwave door behind the control room. Again, you lose data; aggressive blanking can chop long integrations needed for weak signals.

Median filtering (along time or frequency)

Subtracting the median per channel (or per time slice) removes slowly varying DC offsets and some standing-wave structure. It is robust compared to the mean. It does not by itself remove pulsed RFI that departs from a smooth baseline, and it can distort real broad astronomical structure if applied blindly.

ON/OFF cadence (reference sky)

A powerful observational strategy: point at the target (ON), then at a reference position on the sky (OFF) with the same equipment and duration. Astrophysical emission from the target direction should appear in ON but not in OFF (assuming the reference is empty at that resolution). Terrestrial RFI often appears in both, because it enters through the environment and side lobes, not from the distant source alone.

✦

Fun Fact — ABACAD Cadence

Breakthrough Listen often uses extended cadences such as ABACAD: A = ON-target, B, C, D = distinct OFF pointings (not always identical reference positions). The repeating six-scan pattern ON → OFF → ON → OFF → ON → OFF accumulates multiple ON realizations while sampling several reference fields—improving robustness against structured sky emission in any single OFF.

4. What Is Kurtosis?

Spectral kurtosis is built on moments of a distribution—quantities that summarize shape beyond the average.

Moments: mean through kurtosis

Draw a sample x₁, x₂, …, xₙ (for example, power measurements in one frequency channel).

Mean (1st moment): μ = (1/n) Σ xᵢ — the center of mass of the data.
Variance (2nd central moment): σ² = (1/n) Σ (xᵢ − μ)² — typical spread around the mean.
Skewness (3rd standardized moment): measures asymmetry—whether the long tail points left or right compared to a symmetric bell curve.
Kurtosis (4th moment territory): measures tailedness—how often large deviations from the mean occur relative to a Gaussian.

Textbooks disagree on normalization constants; the important idea is the comparison to Gaussian.

★

Key Concept — What Is Kurtosis?

Kurtosis is the 4th-moment statistic that measures tailedness of a distribution. For a Gaussian, raw kurtosis equals 3 (excess kurtosis = 0). Values above 3 indicate heavy tails (frequent outliers); values below 3 indicate light tails (data compressed toward the center). In SETI, RFI forces one of these departures channel by channel.

Kurtosis and the Gaussian

For a normal (Gaussian) distribution, one common convention gives excess kurtosis = 0 (and raw kurtosis = 3). Elsewhere you may see only "kurtosis" without the word "excess"—always check which definition is used.

Heavy tails (outliers, rare huge spikes): kurtosis > 3 in the raw sense, or positive excess kurtosis.
Light tails (values clamped or compressed toward the center): kurtosis < 3, or negative excess kurtosis.

Figure 4.2 — Three distributions with different kurtosis. The Gaussian (blue, κ = 3) is the reference. Heavy-tailed distributions (yellow) produce rare extreme outliers—the signature of bursty RFI. Light-tailed distributions (green) indicate clamped or saturated signals.

Intuition

Imagine a bell curve. Kurtosis asks: "Compared to this bell, how often do extreme values show up?" If very large or very small events happen more often than Gaussian statistics predict, kurtosis is high. If the distribution is too uniform or too tight—little variation—kurtosis is low. RFI often forces one of these departures in time, channel by channel.

5. Spectral Kurtosis (SK) — Nita & Gary (2010)

Spectral kurtosis applies that idea per frequency channel: for each channel, you have a time series of power estimates S₁, S₂, …, Sₘ from successive integrations. You ask whether those Sᵢ behave like Gaussian noise in a sense appropriate to the estimator.

Per-channel test

There are N_f channels; SK is computed independently for each—no mixing across frequency in the basic formulation. Channels dominated by stationary Gaussian-like noise should yield SK near a reference value (in the radio-SK literature, calibrated estimators map Gaussian noise to SK ≈ 1 for the normalized form used in pipelines).

Interpreting deviations

SK ≫ 1: Bursty or intermittent RFI injects occasional huge powers → heavy tails in the distribution of Sᵢ.
SK ≪ 1: Continuous broadband or saturating behavior can make powers abnormally uniform or clamped → light tails relative to the Gaussian reference.

So SK is a statistical anomaly detector in the time direction, separately in each spectral bin.

Estimator

Let M be the number of time steps and Sᵢ the power in the channel at step i. The ratio form used in teaching and in many derivations is:

★

Key Concept — SK Estimator Formula

The basic spectral kurtosis estimator:

SK = (M × Σ Sᵢ²) / (Σ Sᵢ)² − 1

This compares the second moment of the powers to the square of the mean—a building block for detecting non-stationarity in the time series Sᵢ.

MitraSETI follows Nita & Gary (2010) and applies a finite-sample correction:

SK = ((M + 1) / (M − 1)) × ( (M × Σ Sᵢ²) / (Σ Sᵢ)² − 1 )

The prefactor (M + 1)/(M − 1) is ≈ 1 when M is large. Gaussian noise → SK ≈ 1, constant-power (saturated) → SK ≈ 0, impulsive RFI → SK ≫ 1.

6. The Problem with Fixed Thresholds

A naive policy: flag any channel with SK < 0.8 or SK > 1.2 (numbers chosen for illustration).

⚠

Warning — Fixed Thresholds Fail

Fixed SK cuts fail on real observatories because:

Gain and bandpass vary with time, temperature, and hardware state.
Different files have different noise statistics after integration and digitization.
On some Breakthrough Listen extracts, fixed cuts can flag 100% of channels—everything appears "non-Gaussian" to the crude rule—while on others they flag 0% and miss obvious RFI.

One size does not fit all. Thresholds must adapt to the empirical distribution of SK in each chunk or file.

7. MitraSETI's Adaptive Thresholds

MitraSETI (MitraSETIPipeline.compute_spectral_kurtosis() in pipeline.py) uses robust location and scale on the vector of per-channel SK values.

Figure 4.3 — Adaptive SK thresholds. The median and MAD-derived σ shift with each file's noise statistics, catching outlier channels (pink) while keeping clean channels (green) within the acceptance band.

Median and MAD

Compute SK for every channel.
Let median(SK) be the median of those values—more robust than the mean when many channels are contaminated.
Compute MAD = median( |SKᵢ − median(SK)| ) — the median absolute deviation.
Convert to a Gaussian-equivalent scale: σ = 1.4826 × MAD (the constant maps MAD to the standard deviation of a normal distribution).

Thresholds

Lower = median(SK) − N × σ · Upper = median(SK) + N × σ · default N = 3 _SK_N_SIGMA = 3.0

Channels outside the band are flagged. If MAD is zero (degenerate case), the implementation falls back to the standard deviation of SK as a scale estimate.

Per-channel normalization (bandpass removal)

Before SK, each channel's power is divided by its median over time. That removes slow bandpass shape and gain differences between channels so SK reflects temporal statistics rather than absolute calibration. Small powers are floored to avoid division by zero.

8. What Happens After Flagging?

Flagged channels are not zeroed. MitraSETI replaces them with the per-channel median along time (the same column median concept: for each flagged frequency bin, substitute the median power in that bin across time).

★

Key Concept — Why Median-Fill, Not Zero-Fill

Hard zeroing creates sharp spectral notches. Downstream FFT-based processing and de-Doppler integration can ring and scatter energy into neighboring bins, producing artifacts that look like narrowband structure—false detections near band edges and notches.

Median replacement keeps the noise floor statistically more consistent while excising the corrupted samples. If more than 50% of channels would be flagged, the mask is skipped and a warning is logged—better to retain data than to obliterate the band when SK fails catastrophically.

9. Known RFI Database (MitraSETI v0.2.0)

Statistics cannot encode allocated services and common false positives by name. MitraSETI v0.2.0 ships a catalog (catalog/rfi_database.py) of 27 terrestrial RFI entries with center frequency, bandwidth, and metadata.

✦

Fun Fact — 27-Entry RFI Database

MitraSETI v0.2.0 ships a curated catalog of 27 known terrestrial RFI sources. When a candidate's frequency falls inside a catalog entry, the pipeline labels it immediately as a known terrestrial service—complexity is O(N_candidates × 27), trivial at scale. This catches the most common false positives fast, but unknown or mis-cataloged emitters still require other layers.

Examples you will see cross-matched against candidates include:

GPS L1 (~1575.42 MHz), GPS L2 (~1227.60 MHz)
Iridium (~1621.35 MHz in common summaries; allocations span a broader L-band segment)
Wi‑Fi 2.4 GHz and 5 GHz blocks
LTE band examples and other cellular allocations
Aircraft and weather radar bands

10. ON/OFF Cadence Filtering (Detailed)

Breakthrough Listen observing pattern

A typical six-scan cadence is:

ON → OFF → ON → OFF → ON → OFF A B A C A D Three ON-target (A) integrations alternating with three OFF (reference sky: B, C, D) integrations

Often parsed from standardized filenames (GUPPI-style or target_ON_#.fil / target_OFF_#.fil conventions).

Physical discrimination

A real celestial narrowband signal at the target should appear in all ON scans (subject to sensitivity and scintillation) and in none of the OFF scans at the same frequency and drift (within tolerance)—it is not in the reference beam.
Terrestrial RFI is usually not tied to the target direction in the same way; it often appears in both ON and OFF because it couples through side lobes, diffraction, and local electronics.

MitraSETI implementation

The scripts/cadence_filter.py tool:

Parses Breakthrough Listen–style filenames and groups scans by target.
Identifies ON and OFF sequences.
Runs de-Doppler (or consumes candidates) per file.
Cross-matches hits by frequency and drift rate within tunable tolerances.
Enforces multi-ON consensus: by default a signal must appear in at least two of the three ON scans with matching frequency and drift (CLI: --min-on, default 2).
Compares against OFF: signals that also match OFF detections are treated as RFI (direction-independent in practice).

This stage uses how the telescope was pointed—information no in-file SK statistic can replace.

11. Layered Defense — Why One Method Is Not Enough

No single test closes the book on RFI:

Spectral kurtosis catches temporal non-Gaussianity—bursts and saturation—but can miss narrowband persistent RFI that is stable and Gaussian-like in power from integration to integration within a chunk.
The known RFI database catches cataloged terrestrial services but cannot label new or uncataloged transmitters.
ON/OFF cadence exploits sky geometry but requires paired observations and correct filename grouping; it is useless for single-pointing archives.

Figure 4.4 — MitraSETI's layered RFI defense. Each layer catches failure modes the others miss. In internal benchmarks over 100 Breakthrough Listen files, the combined rejection fraction reached ~99.996%.

Together, these layers (plus additional pipeline stages such as clustering and learned classifiers in the full MitraSETI stack) complement one another: each layer catches failure modes the others miss. In internal benchmarks over 100 Breakthrough Listen files, the combined RFI rejection fraction reached on the order of 99.996%—illustrating why defense in depth is standard practice, not an optional refinement.

Summary

RFI is human-made contamination; it can exceed weak cosmic signals by orders of magnitude and mimic SETI-like narrowband and drifting features.
Taxonomy (broadband, narrowband, pulsed, persistent, moving) guides mitigation choice.
Classical tools—frequency/time blanking, median baselines, ON/OFF including ABACAD-style cadences—remain important but imperfect.
Kurtosis summarizes tail behavior relative to a Gaussian; spectral kurtosis applies that per channel over time (Nita & Gary 2010).
Fixed SK cuts fail across diverse BL files; median + MAD thresholds adapt per observation, with per-channel median normalization before SK.
Flagged data should be median-filled, not zeroed, to avoid spectral artifacts.
v0.2.0's 27-entry RFI catalog gives fast labels for common bands.
ON/OFF filtering with ≥2 of 3 ON consensus and OFF rejection uses pointing to separate Earth from sky.
Layered filtering is mandatory for high rejection rates without sacrificing the rare real candidate.

References

Nita, G. M., & Gary, D. E. (2010). The generalized spectral kurtosis estimator. Monthly Notices of the Royal Astronomical Society, 406(1), 60–72. https://academic.oup.com/mnras/article-abstract/406/1/60/986632

Try it in the Cloud

RFI filtering runs automatically on every upload. See how your observation file is cleaned before classification.

Open MitraSETI Cloud →