A divide-and-conquer structure that computes de-Doppler sums in O(N log N) work instead of O(N²) — the engine inside MitraSETI.
Prerequisite: Doppler Effect and De-Doppler Search (spectrograms, drift, brute-force integration along diagonals)
You already know how brute-force de-Doppler works: for each trial drift rate, walk through time and add up the normalised power along the predicted frequency trajectory. That is correct, easy to reason about, and far too slow when the number of drift trials and time steps grow. This chapter explains the Taylor tree—the divide-and-conquer structure MitraSETI uses (by default) to compute essentially the same family of integrated sums in O(N log N) work per frequency channel instead of O(N²).
Call Nt the number of time rows, Nf the number of frequency channels, and Nd the number of discrete drift rates you try (typically comparable to the number of channel offsets you can accumulate over the full observation). Brute force is:
For each drift, each starting channel, you touch about Nt spectrogram cells. When Nd is on the order of Nt (one channel per full observation is a natural resolution), the scaling feels like O(Nt² × Nf). Double the observation length (more time steps and a proportionally finer drift grid) and you pay four times the work in the worst case.
Adjacent discrete drifts are not independent random walks through the array. In the staircase model (one channel offset per time step, rounded), two drifts that differ by the smallest step agree on every time row except possibly the last: they follow the same path until the final increment. More generally, many long trajectories share long prefixes. Brute force recomputes those prefixes from scratch for every drift trial.
So the asymptotic bottleneck is not "physics is hard"—it is duplicated summation. The Taylor tree removes that duplication.
Imagine ranking teams:
The Taylor tree is the second idea applied to sums along time, not sports. Instead of recomputing every full-length diagonal independently, you combine smaller chunks whose answers you already know.
Computing running totals 1 + 2 + 3 + … is faster if you remember the previous total: each new value is previous + next. The tree generalises that idea to two-dimensional structure: at each level you store not one running total but a small bundle of totals corresponding to how much the trajectory "bent" inside each sub-interval.
Computing running totals 1 + 2 + 3 + … is faster if you remember the previous total: each new value is previous + next. The tree generalises that idea to two-dimensional structure: at each level you store not one running total but a small bundle of totals corresponding to how much the trajectory "bent" inside each sub-interval.
Partition the time axis into halves, then quarters, then eighths. At the finest level you combine pairs of consecutive time rows. At the next level you combine pairs of pair-results, and so on, until one block spans the entire observation.
At each merge, a binary choice encodes whether the second half's contribution is frequency-shifted relative to the first by a fixed number of channels. Those choices line up with bits of the total drift index—exactly the FFT butterfly story, but with addition and index shifts instead of complex twiddles.
An integer d in 0 … Npadded−1 has binary expansion d = b₀ + 2b₁ + 4b₂ + …; bit bk says whether, at merge depth k, you applied the 2k-channel offset to the right-hand sub-tree.
Throughout, think of Layer 0 as the raw spectrogram: Nt rows indexed by time, Nf columns indexed by frequency. MitraSETI pads Nt up to the next power of two (Npadded) with zeros when needed; the tree is clearest when Nt is already a power of two.
Sign convention: The implementation runs the tree twice—sign = +1 (energy moves to higher channel index over time) and sign = −1 (lower index). Layer-1 shift is one channel in that direction; deeper layers use 2, 4, 8, … channels.
You have Nt rows t0 … tNt−1 and Nf channels f0 … fNf−1.
Example layout (Nt = 8, Nf = 8):
Take pairs (t0,t1), (t2,t3), (t4,t5), (t6,t7).
For each pair and each frequency column f, emit two rows:
data[t0,f] + data[t1,f] (with padding: missing rows count as 0).data[t0,f] + data[t1, f + sign] (out-of-band indices contribute 0).You now have four groups (one per pair), two partial-drift rows each → 8 rows total, still Nf columns. Those rows are not "final drifts" yet; they are local choices inside each 2-step window.
ASCII merge view:
Here d=0 / d=1 means "zero or one channel shift inside that pair," not the full-observation drift yet.
Merge the block (t0–t1) with (t2–t3), and separately (t4–t5) with (t6–t7).
Each side brings two rows (local drift 0 or 1). When you merge two halves of length 2, the combined drift index needs two bits → four rows for that merged block:
You can read the merge like a two-digit binary counter. Suppose the left half contributed integer drift_A ∈ {0,1} from its own pair merges and the right half contributed drift_B ∈ {0,1}. The new four outcomes are all pairs (drift_A, drift_B) together with a third binary choice: whether the right block's spectrum is shifted by 2k before adding, where k is the layer index inside the implementation's loop (here shift = 2 channels). That third choice is the next bit of the global drift index. After the full tree, an integer d in 0 … Npadded−1 has binary expansion d = b₀ + 2b₁ + 4b₂ + …; bit bk says whether, at merge depth k, you applied the 2k-channel offset to the right-hand sub-tree. This is exactly the sense in which total drift is assembled from half-sized subproblems—like counting in binary, except each 1 bit carries a physical frequency slip instead of arithmetic weight alone.
After this layer: 2 groups, 4 drift rows each.
Merge the two remaining mega-blocks. The cross-boundary shift is 2² = 4 channels. You obtain 8 rows, indexed d = 0 … 7. Row d, column f, holds the integrated sum along the staircase trajectory of total drift d (in channel units per observation, in the chosen sign direction) that starts at channel f at t0—subject to zeroing when trajectories walk out of the band.
Take Nt = 4, Npadded = 4, n_layers = 2, and sign = +1. Use 8 channels so shifts stay in-band for the columns we care about.
Suppose a signal drifts up one channel per time step, so power sits on a diagonal:
| f0 | f1 | f2 | f3 | f4 | f5 | f6 | f7 | |
|---|---|---|---|---|---|---|---|---|
| t0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| t1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
| t2 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 |
| t3 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 |
The ideal diagonal starting at f0 picks (t0,f0), (t1,f1), (t2,f2), (t3,f3) → sum 1 + 2 + 3 + 4 = 10.
Group (t0,t1) — rows g = 0 → output rows 0 and 1:
Group (t2,t3) — rows g = 1 → output rows 2 and 3:
So the Layer 1 buffer (only non-zero entries sketched) behaves like:
Here n_drifts_prev = 2, shift_amount = 2. The code combines row blocks {0,1} with {2,3} into 4 final rows d = 0…3.
The interesting cell is starting frequency f0, total drift d = 3 (binary 11): low bit selects row 1 from the first half, high bit applies shift +2 when reading from the second half—so you add row 1 at f0 to row 3 at f0+2 = f2:
Rows d = 0,1,2 at f0 give smaller totals; d = 3 is the one that "lines up" with +1 channel per step under this staircase parameterisation. That is exactly what you want from a search grid: the row index is the discrete drift label; peaks appear where the label matches the physical drift (within quantisation).
This is the same radix-2 butterfly idea as the FFT:
Takeaway: shifts double each time you move up a layer: 1, 2, 4, 8, … — the binary decomposition of the total drift index.
For readers who prefer four leaves instead of eight, the same topology is just shorter:
Same rule at every scale: two children feed four drift combinations at the next rung, then eight, and so on, until the leaf count matches Npadded.
For each layer, the implementation touches on the order of Npadded × Nf array entries (each output cell is one addition and a couple of index checks). Constants depend on how many partial-drift rows are live at that stage, but they are O(1) per cell per layer.
L = log₂(Npadded) merge stages (after padding to a power of two).
Compare to brute force: O(Nt² × Nf). The Taylor tree replaces the quadratic Nt² term with Nt × log₂(Nt) — an idealised speedup factor of Nt / log₂(Nt).
Brute force: O(Nd × Nt × Nf) with Nd ~ Nt → O(Nt² × Nf).
Taylor tree: O(Nt × log₂(Nt) × Nf) (using Nt ≈ Npadded for big-O intuition).
Idealised speedup factor on the time axis:
| Nt | log₂(Nt) | Nt / log₂(Nt) |
|---|---|---|
| 8 | 3 | 2.7× |
| 16 | 4 | 4.0× |
| 32 | 5 | 6.4× |
| 64 | 6 | 10.7× |
| 128 | 7 | 18.3× |
| 256 | 8 | 32× |
| 1024 | 10 | 102.4× |
Real pipelines also pay for normalisation, candidate extraction, clustering, and I/O; measured speedups track theory best when the tree dominates the timer.
The production code lives in core/src/dedoppler.rs.
ndarray::Array2<f32> — rows are time (after normalisation), columns are frequency channels.build_taylor_tree builds one tree for a fixed sign (+1 or −1). It returns Array2<f32> with shape (n_padded, n_chans): row d is the integrated sum for discrete drift index d at each start channel.taylor_tree_search orchestrates padding, both signs, thresholding against min_snr, and emits SignalCandidate structs (then clustering / optional RFI filtering run as in the full DedopplerEngine::search pipeline).rayon — each group at a given layer is independent, so the inner combine uses into_par_iter() over groups and writes results back.SearchParams.use_taylor_tree = true (default in typical configs). The heavy work stays in Rust; Python supplies the spectrogram slice and parameters.DedopplerEngine, headers, and result types directly to Python. NumPy data can be handed off without a slow serial copy in the hot path design.mitraseti_core for the compiled module).From a notebook or script you typically call:
Pythonimport mitraseti_core
engine = mitraseti_core.DedopplerEngine(params)
result = engine.search(flat_data, n_times, n_chans, header)
You do not need to read the Rust to use the search—but understanding the tree explains what is being approximated and why it is fast.
On one reference setup (65 536 channels, 16 time steps):
| Mode | Time | Throughput |
|---|---|---|
| Taylor tree | 38.9 ms | ~25 M points/s |
| Brute force | 163.1 ms | ~6.4 M points/s |
Speedup ≈ 4.2×, consistent with 16 / log₂(16) = 4 as a first-order expectation on the algorithmic term. As Nt grows, the gap widens because the quadratic brute-force term wakes up while the tree stays near-linear in Nt times log Nt.
These are common pitfalls when implementing or configuring a Taylor tree search. Miss any one of them and you may get silent wrong answers or panics at runtime.
Nt = 1: There is no merge stage. The tree returns a single row—the first (and only) spectrum row—shape (1, n_chans).
Drift limit vs tree width: Physical max_drift_rate maps to an integer max_drift_channels. The tree only defines rows 0 … n_padded − 1. If max_drift_channels > n_padded − 1 (common with very fine frequency resolution and few time steps), you must clamp the loop that reads rows; otherwise you would index past the built tree. MitraSETI uses max_drift_channels.min(n_padded - 1).
Zero drift double counting: The negative-sign pass would repeat d = 0 identically to the positive pass. The code skips d = 0 when sign = −1.
The Voyager 1 regression (lesson learned): Ultra-wide FFT-style filterbanks (e.g. 2²⁰ channels) with only ~16 time steps made max_drift_channels enormous compared to n_padded − 1. Without clamping, the extractor walked off the end of the Array2. The fix is conceptually simple—respect the actual number of drift rows built—but easy to miss because small test files never triggered it.
Staircase vs continuous drift: The tree searches a discrete family of paths tied to channel quantisation and observation length—the same pragmatic approximation used across many SETI drift pipelines. Extremely steep drifts or mis-estimated tsamp / foff still matter for physical interpretation even when the math is fast.
Array2<f32>, rayon), twice for ± drift, with power-of-two padding.n_padded − 1; skip duplicate d = 0 on the negative pass.When you next look at a spectrogram with a slanted narrowband trace, you can picture both the single diagonal a human eye tracks and the whole family of diagonals the Taylor tree updates together—that is the bridge between the brute-force story in the previous chapter and the interactive searches MitraSETI runs in production.
Next suggested reading: pipeline chapters on candidate clustering, RFI rejection, and end-to-end search() — now that the O(N log N) core is in place.
The Rust-powered Taylor Tree runs at 45x speed in the cloud. Upload your own file and see it in action.