Chapter 04

Radio Frequency Interference & Spectral Kurtosis Filtering

Detecting and excising human-made contamination with adaptive statistical tests, known-RFI catalogs, and ON/OFF cadence logic.

Author: Saman Tabatabaeian — Deep Field Labs MitraSETI Tutorial Series

You already read spectrograms as images of power versus time and frequency, and you know that de-Doppler search integrates along lines of constant drift to find narrowband carriers. This chapter addresses the obstacle that dominates every sensitive wideband SETI pipeline: radio frequency interference (RFI)—human-made emission that masquerades as science. We define RFI and its taxonomy, review classical mitigation tricks and their limits, then build spectral kurtosis (SK) from basic statistics. Finally we connect adaptive thresholds, known-RFI catalogs, and ON/OFF cadence into the layered strategy MitraSETI uses in practice.

1. What Is RFI (Radio Frequency Interference)?

Radio frequency interference is any human-made radio signal that contaminates astronomical data. It is not a calibration bug or a software glitch: it is real electromagnetic energy from transmitters, digital electronics, and power systems, captured by the same antennas and backends that record cosmic emission.

Where RFI comes from

Modern civilization floods the spectrum. Representative sources include:

None of these need to be "pointed at" the telescope. Side lobes, scattering, cable pickup, and site electronics couple interference into the data stream.

Scale of the problem

RFI is often vastly stronger than the astrophysical or technosignature-like signals we hope to detect. Order-of-magnitude arguments are common in the literature: terrestrial leakage can exceed weak cosmic carriers by factors of 10⁶ or more in power, depending on band, site, and integration. That is why sensitivity alone is not enough—you must reject enormous numbers of false paths before human review is tractable.

Why RFI is the number-one enemy of SETI

Narrowband SETI searches look for concentrated power in frequency, sometimes drifting in time because of Doppler. RFI often looks exactly the same:

For these reasons, most threshold crossings are not E.T. They are RFI, instrumental effects, or rare astrophysical narrowband emitters. RFI is the default explanation for a candidate until observational design and layered filters say otherwise.

2. Types of RFI

RFI is diverse. Grouping it by morphology helps you choose mitigations.

RFI Taxonomy RFI Broadband freq → Many channels, short bursts Narrowband freq → One channel, persistent Pulsed / Intermittent freq → On-off bursts, radar sweeps Moving Source freq → Satellite pass, Doppler drift Persistent RFI can be any of the above — always present for the observation duration Morphology guides mitigation: broadband → time blanking · narrowband → frequency blanking · moving → geometry checks Figure 4.1 — RFI taxonomy. Each type has a distinct spectrogram signature, guiding the choice of mitigation strategy.

Broadband RFI

Broadband interference affects many frequency channels at once. Examples include arcing, some digital buses, and power-line events. On a spectrogram it often appears as broad horizontal features (elevated power across much of the band for a short time) or a raised noise floor over many channels.

Narrowband RFI

Narrowband RFI sits in one channel or a small cluster of channels—classic carrier-like emission from a radio transmitter, a GPS line, or a local oscillator leaking at a fixed offset. Visually it is a vertical or slowly wandering ridge at nearly constant frequency.

Pulsed and intermittent RFI

Intermittent RFI turns on and off: radar rotations, Wi‑Fi beacon and data frames, frequency-hopping links, and devices that transmit in bursts. The waterfall shows dashes, dots, or regular flashes rather than a solid line.

Persistent RFI

Persistent RFI is always present for the duration of the observation: local oscillator leakage, a continuous broadcast, or a nearby always-on transmitter. It may be narrowband or structured, but it does not "go away" when you wait a few seconds.

Moving-source RFI

Moving-source RFI comes from platforms whose line-of-sight velocity changes: satellite passes, aircraft, and sometimes automobiles. The observed frequency drifts in time for the same reason a Doppler-shifted siren changes pitch—so drift alone does not prove an extraterrestrial origin. This is one reason de-Doppler search must be paired with pointing logic, multi-beam tests, or catalog checks, not used in isolation.

3. Traditional RFI Mitigation

Before spectral kurtosis and machine-learning stacks, astronomers relied on simpler tools. Each has a place; each has clear failure modes.

Frequency blanking

Frequency blanking means zeroing or masking channels (or whole sub-bands) known to be bad—GPS bands, radar allocations, and so on. It is simple and fast, but crude: you lose sensitivity everywhere those frequencies might have carried science, and static masks do not track new services or drifting leaks.

Time blanking

Time blanking discards time steps when total power or some statistic exceeds a threshold—useful during lightning, radar sweeps, or someone opening a microwave door behind the control room. Again, you lose data; aggressive blanking can chop long integrations needed for weak signals.

Median filtering (along time or frequency)

Subtracting the median per channel (or per time slice) removes slowly varying DC offsets and some standing-wave structure. It is robust compared to the mean. It does not by itself remove pulsed RFI that departs from a smooth baseline, and it can distort real broad astronomical structure if applied blindly.

ON/OFF cadence (reference sky)

A powerful observational strategy: point at the target (ON), then at a reference position on the sky (OFF) with the same equipment and duration. Astrophysical emission from the target direction should appear in ON but not in OFF (assuming the reference is empty at that resolution). Terrestrial RFI often appears in both, because it enters through the environment and side lobes, not from the distant source alone.

Fun Fact — ABACAD Cadence

Breakthrough Listen often uses extended cadences such as ABACAD: A = ON-target, B, C, D = distinct OFF pointings (not always identical reference positions). The repeating six-scan pattern ON → OFF → ON → OFF → ON → OFF accumulates multiple ON realizations while sampling several reference fields—improving robustness against structured sky emission in any single OFF.

4. What Is Kurtosis?

Spectral kurtosis is built on moments of a distribution—quantities that summarize shape beyond the average.

Moments: mean through kurtosis

Draw a sample x₁, x₂, …, xₙ (for example, power measurements in one frequency channel).

Textbooks disagree on normalization constants; the important idea is the comparison to Gaussian.

Key Concept — What Is Kurtosis?

Kurtosis is the 4th-moment statistic that measures tailedness of a distribution. For a Gaussian, raw kurtosis equals 3 (excess kurtosis = 0). Values above 3 indicate heavy tails (frequent outliers); values below 3 indicate light tails (data compressed toward the center). In SETI, RFI forces one of these departures channel by channel.

Kurtosis and the Gaussian

For a normal (Gaussian) distribution, one common convention gives excess kurtosis = 0 (and raw kurtosis = 3). Elsewhere you may see only "kurtosis" without the word "excess"—always check which definition is used.

Kurtosis: Tail Behavior Compared to a Gaussian Value Probability Light tails (κ < 3) Gaussian (κ = 3) Heavy tails (κ > 3) Fat tails: rare extreme outliers (RFI spikes) Figure 4.2 — Three distributions with different kurtosis. The Gaussian (blue, κ = 3) is the reference. Heavy-tailed distributions (yellow) produce rare extreme outliers—the signature of bursty RFI. Light-tailed distributions (green) indicate clamped or saturated signals.

Intuition

Imagine a bell curve. Kurtosis asks: "Compared to this bell, how often do extreme values show up?" If very large or very small events happen more often than Gaussian statistics predict, kurtosis is high. If the distribution is too uniform or too tight—little variation—kurtosis is low. RFI often forces one of these departures in time, channel by channel.

5. Spectral Kurtosis (SK) — Nita & Gary (2010)

Spectral kurtosis applies that idea per frequency channel: for each channel, you have a time series of power estimates S₁, S₂, …, Sₘ from successive integrations. You ask whether those Sᵢ behave like Gaussian noise in a sense appropriate to the estimator.

Per-channel test

There are N_f channels; SK is computed independently for each—no mixing across frequency in the basic formulation. Channels dominated by stationary Gaussian-like noise should yield SK near a reference value (in the radio-SK literature, calibrated estimators map Gaussian noise to SK ≈ 1 for the normalized form used in pipelines).

Interpreting deviations

So SK is a statistical anomaly detector in the time direction, separately in each spectral bin.

Estimator

Let M be the number of time steps and Sᵢ the power in the channel at step i. The ratio form used in teaching and in many derivations is:

Key Concept — SK Estimator Formula

The basic spectral kurtosis estimator:

SK = (M × Σ Sᵢ²) / (Σ Sᵢ)² − 1

This compares the second moment of the powers to the square of the mean—a building block for detecting non-stationarity in the time series Sᵢ.

MitraSETI follows Nita & Gary (2010) and applies a finite-sample correction:

SK = ((M + 1) / (M − 1)) × ( (M × Σ Sᵢ²) / (Σ Sᵢ)² − 1 )

The prefactor (M + 1)/(M − 1) is ≈ 1 when M is large. Gaussian noise → SK ≈ 1, constant-power (saturated) → SK ≈ 0, impulsive RFI → SK ≫ 1.

6. The Problem with Fixed Thresholds

A naive policy: flag any channel with SK < 0.8 or SK > 1.2 (numbers chosen for illustration).

Warning — Fixed Thresholds Fail

Fixed SK cuts fail on real observatories because:

One size does not fit all. Thresholds must adapt to the empirical distribution of SK in each chunk or file.

7. MitraSETI's Adaptive Thresholds

MitraSETI (MitraSETIPipeline.compute_spectral_kurtosis() in pipeline.py) uses robust location and scale on the vector of per-channel SK values.

Adaptive SK Thresholds: median(SK) ± N × σ Frequency Channel Index SK value median(SK) +N·σ −N·σ RFI RFI RFI RFI Clean channel Flagged (outside band) Acceptance band Figure 4.3 — Adaptive SK thresholds. The median and MAD-derived σ shift with each file's noise statistics, catching outlier channels (pink) while keeping clean channels (green) within the acceptance band.

Median and MAD

  1. Compute SK for every channel.
  2. Let median(SK) be the median of those values—more robust than the mean when many channels are contaminated.
  3. Compute MAD = median( |SKᵢ − median(SK)| ) — the median absolute deviation.
  4. Convert to a Gaussian-equivalent scale: σ = 1.4826 × MAD (the constant maps MAD to the standard deviation of a normal distribution).

Thresholds

Lower = median(SK) − N × σ   ·   Upper = median(SK) + N × σ   ·   default N = 3 _SK_N_SIGMA = 3.0

Channels outside the band are flagged. If MAD is zero (degenerate case), the implementation falls back to the standard deviation of SK as a scale estimate.

Per-channel normalization (bandpass removal)

Before SK, each channel's power is divided by its median over time. That removes slow bandpass shape and gain differences between channels so SK reflects temporal statistics rather than absolute calibration. Small powers are floored to avoid division by zero.

8. What Happens After Flagging?

Flagged channels are not zeroed. MitraSETI replaces them with the per-channel median along time (the same column median concept: for each flagged frequency bin, substitute the median power in that bin across time).

Key Concept — Why Median-Fill, Not Zero-Fill

Hard zeroing creates sharp spectral notches. Downstream FFT-based processing and de-Doppler integration can ring and scatter energy into neighboring bins, producing artifacts that look like narrowband structure—false detections near band edges and notches.

Median replacement keeps the noise floor statistically more consistent while excising the corrupted samples. If more than 50% of channels would be flagged, the mask is skipped and a warning is logged—better to retain data than to obliterate the band when SK fails catastrophically.

9. Known RFI Database (MitraSETI v0.2.0)

Statistics cannot encode allocated services and common false positives by name. MitraSETI v0.2.0 ships a catalog (catalog/rfi_database.py) of 27 terrestrial RFI entries with center frequency, bandwidth, and metadata.

Fun Fact — 27-Entry RFI Database

MitraSETI v0.2.0 ships a curated catalog of 27 known terrestrial RFI sources. When a candidate's frequency falls inside a catalog entry, the pipeline labels it immediately as a known terrestrial service—complexity is O(N_candidates × 27), trivial at scale. This catches the most common false positives fast, but unknown or mis-cataloged emitters still require other layers.

Examples you will see cross-matched against candidates include:

10. ON/OFF Cadence Filtering (Detailed)

Breakthrough Listen observing pattern

A typical six-scan cadence is:

ON → OFF → ON → OFF → ON → OFF A B A C A D Three ON-target (A) integrations alternating with three OFF (reference sky: B, C, D) integrations

Often parsed from standardized filenames (GUPPI-style or target_ON_#.fil / target_OFF_#.fil conventions).

Physical discrimination

MitraSETI implementation

The scripts/cadence_filter.py tool:

  1. Parses Breakthrough Listen–style filenames and groups scans by target.
  2. Identifies ON and OFF sequences.
  3. Runs de-Doppler (or consumes candidates) per file.
  4. Cross-matches hits by frequency and drift rate within tunable tolerances.
  5. Enforces multi-ON consensus: by default a signal must appear in at least two of the three ON scans with matching frequency and drift (CLI: --min-on, default 2).
  6. Compares against OFF: signals that also match OFF detections are treated as RFI (direction-independent in practice).

This stage uses how the telescope was pointed—information no in-file SK statistic can replace.

11. Layered Defense — Why One Method Is Not Enough

No single test closes the book on RFI:

Layered RFI Defense — MitraSETI Pipeline Raw Spectrogram Data LAYER 1 Spectral Kurtosis — adaptive median ± N·σ thresholds → rejects bursts LAYER 2 Known RFI Database — 27-entry frequency catalog → labels known LAYER 3 ON/OFF Cadence — ≥2/3 ON consensus, OFF rejection → uses pointing LAYER 4 Clustering + ML classifiers + OOD scoring → dedup & rank Vetted Candidates (~0.004% survive) Figure 4.4 — MitraSETI's layered RFI defense. Each layer catches failure modes the others miss. In internal benchmarks over 100 Breakthrough Listen files, the combined rejection fraction reached ~99.996%.

Together, these layers (plus additional pipeline stages such as clustering and learned classifiers in the full MitraSETI stack) complement one another: each layer catches failure modes the others miss. In internal benchmarks over 100 Breakthrough Listen files, the combined RFI rejection fraction reached on the order of 99.996%—illustrating why defense in depth is standard practice, not an optional refinement.

Summary

References

Try it in the Cloud

RFI filtering runs automatically on every upload. See how your observation file is cleaned before classification.

Open MitraSETI Cloud →