Chapter 03

The Taylor Tree Algorithm

A divide-and-conquer structure that computes de-Doppler sums in O(N log N) work instead of O(N²) — the engine inside MitraSETI.

Saman Tabatabaeian — Deep Field Labs MitraSETI Tutorial Series

Prerequisite: Doppler Effect and De-Doppler Search (spectrograms, drift, brute-force integration along diagonals)

You already know how brute-force de-Doppler works: for each trial drift rate, walk through time and add up the normalised power along the predicted frequency trajectory. That is correct, easy to reason about, and far too slow when the number of drift trials and time steps grow. This chapter explains the Taylor tree—the divide-and-conquer structure MitraSETI uses (by default) to compute essentially the same family of integrated sums in O(N log N) work per frequency channel instead of O(N²).

1. The Problem with Brute Force (Recap)

Cost

Call N_t the number of time rows, N_f the number of frequency channels, and N_d the number of discrete drift rates you try (typically comparable to the number of channel offsets you can accumulate over the full observation). Brute force is:

O(N_d × N_t × N_f)

For each drift, each starting channel, you touch about N_t spectrogram cells. When N_d is on the order of N_t (one channel per full observation is a natural resolution), the scaling feels like O(N_t² × N_f). Double the observation length (more time steps and a proportionally finer drift grid) and you pay four times the work in the worst case.

Why so much work is redundant

Adjacent discrete drifts are not independent random walks through the array. In the staircase model (one channel offset per time step, rounded), two drifts that differ by the smallest step agree on every time row except possibly the last: they follow the same path until the final increment. More generally, many long trajectories share long prefixes. Brute force recomputes those prefixes from scratch for every drift trial.

So the asymptotic bottleneck is not "physics is hard"—it is duplicated summation. The Taylor tree removes that duplication.

2. The Key Insight: Reusing Partial Sums

Tournament bracket vs round-robin

Figure 3.1 — Round-robin (brute force) requires every-pair computation. A single-elimination bracket (Taylor tree) reuses results from each round, achieving O(N log N).

Imagine ranking teams:

Round-robin (brute force): every team plays every other team. If there are N teams, you need on the order of N² games.
Single-elimination bracket: each game feeds the next round. Roughly N games per round and log₂ N rounds → O(N log N) games.

The Taylor tree is the second idea applied to sums along time, not sports. Instead of recomputing every full-length diagonal independently, you combine smaller chunks whose answers you already know.

✦

Key Concept — The Partial Sums Insight

Computing running totals 1 + 2 + 3 + … is faster if you remember the previous total: each new value is previous + next. The tree generalises that idea to two-dimensional structure: at each level you store not one running total but a small bundle of totals corresponding to how much the trajectory "bent" inside each sub-interval.

Prefix sums

Divide and conquer over time

Partition the time axis into halves, then quarters, then eighths. At the finest level you combine pairs of consecutive time rows. At the next level you combine pairs of pair-results, and so on, until one block spans the entire observation.

✦

Key Concept — Binary Decomposition of Drift

At each merge, a binary choice encodes whether the second half's contribution is frequency-shifted relative to the first by a fixed number of channels. Those choices line up with bits of the total drift index—exactly the FFT butterfly story, but with addition and index shifts instead of complex twiddles.

An integer d in 0 … N_padded−1 has binary expansion d = b₀ + 2b₁ + 4b₂ + …; bit b_k says whether, at merge depth k, you applied the 2^k-channel offset to the right-hand sub-tree.

3. Step-by-Step Construction

Throughout, think of Layer 0 as the raw spectrogram: N_t rows indexed by time, N_f columns indexed by frequency. MitraSETI pads N_t up to the next power of two (N_padded) with zeros when needed; the tree is clearest when N_t is already a power of two.

Sign convention: The implementation runs the tree twice—sign = +1 (energy moves to higher channel index over time) and sign = −1 (lower index). Layer-1 shift is one channel in that direction; deeper layers use 2, 4, 8, … channels.

Layer 0: Raw spectrogram

You have N_t rows t₀ … t_{N_t−1} and N_f channels f₀ … f_{N_f−1}.

Example layout (N_t = 8, N_f = 8):

f0 f1 f2 f3 f4 f5 f6 f7 t0 | ... ... ... ... ... ... ... ... t1 | ... ... ... ... ... ... ... ... t2 | ... ... ... ... ... ... ... ... t3 | ... ... ... ... ... ... ... ... t4 | ... ... ... ... ... ... ... ... t5 | ... ... ... ... ... ... ... ... t6 | ... ... ... ... ... ... ... ... t7 | ... ... ... ... ... ... ... ...

Layer 1: Combine adjacent time pairs

Take pairs (t0,t1), (t2,t3), (t4,t5), (t6,t7).

For each pair and each frequency column f, emit two rows:

drift_bit = 0: no shift — data[t0,f] + data[t1,f] (with padding: missing rows count as 0).
drift_bit = 1: shift the later row by one channel in the drift direction — data[t0,f] + data[t1, f + sign] (out-of-band indices contribute 0).

You now have four groups (one per pair), two partial-drift rows each → 8 rows total, still N_f columns. Those rows are not "final drifts" yet; they are local choices inside each 2-step window.

ASCII merge view:

Layer 0: t0 t1 t2 t3 t4 t5 t6 t7 \ / \ / \ / \ / Layer 1: [d=0] [d=0] [d=0] [d=0] [d=1] [d=1] [d=1] [d=1]

Here d=0 / d=1 means "zero or one channel shift inside that pair," not the full-observation drift yet.

Layer 2: Combine pairs of pairs

Merge the block (t0–t1) with (t2–t3), and separately (t4–t5) with (t6–t7).

Each side brings two rows (local drift 0 or 1). When you merge two halves of length 2, the combined drift index needs two bits → four rows for that merged block:

No shift across the boundary: add matching "local drift" rows with no extra offset.
Shift across the boundary: add rows where the second half is shifted by 2¹ = 2 channels (for this layer in the N_t = 8 tree).

You can read the merge like a two-digit binary counter. Suppose the left half contributed integer drift_A ∈ {0,1} from its own pair merges and the right half contributed drift_B ∈ {0,1}. The new four outcomes are all pairs (drift_A, drift_B) together with a third binary choice: whether the right block's spectrum is shifted by 2^k before adding, where k is the layer index inside the implementation's loop (here shift = 2 channels). That third choice is the next bit of the global drift index. After the full tree, an integer d in 0 … N_padded−1 has binary expansion d = b₀ + 2b₁ + 4b₂ + …; bit b_k says whether, at merge depth k, you applied the 2^k-channel offset to the right-hand sub-tree. This is exactly the sense in which total drift is assembled from half-sized subproblems—like counting in binary, except each 1 bit carries a physical frequency slip instead of arithmetic weight alone.

After this layer: 2 groups, 4 drift rows each.

Layer 3 (final when N_t = 8)

Merge the two remaining mega-blocks. The cross-boundary shift is 2² = 4 channels. You obtain 8 rows, indexed d = 0 … 7. Row d, column f, holds the integrated sum along the staircase trajectory of total drift d (in channel units per observation, in the chosen sign direction) that starts at channel f at t₀—subject to zeroing when trajectories walk out of the band.

4. Worked Example: Four Time Steps, Actual Numbers

Take N_t = 4, N_padded = 4, n_layers = 2, and sign = +1. Use 8 channels so shifts stay in-band for the columns we care about.

Suppose a signal drifts up one channel per time step, so power sits on a diagonal:

	f0	f1	f2	f3
t0	1	0	0	0
t1	0	2	0	0
t2	0	0	3	0
t3	0	0	0	4

The ideal diagonal starting at f0 picks (t0,f0), (t1,f1), (t2,f2), (t3,f3) → sum 1 + 2 + 3 + 4 = 10.

1 f1:2 f2:0 f3:0 Row 1 (d=1): f0:3 f1:0 f2:0 f3:0 t0[f] + t1[f] t0[f] + t1[f+1] Row 2 (d=0): f0:0 f1:0 f2:3 f3:4 Row 3 (d=1): f0:0 f1:0 f2:7 f3:0 t2[f] + t3[f] t2[f] + t3[f+1] Merge halves (shift 0 or 2) Layer 2 — Final (4 drift rows) d=0 (binary 00): f0: 1 (Row0[f0] + Row2[f0]) d=1 (binary 01): f0: 3 (Row1[f0] + Row2[f0]) d=2 (binary 10): f0: 4 (Row0[f0] + Row2[f0+2]) d=3 (binary 11): f0: 10 (Row1[f0] + Row3[f0+2]) ← 3 + 7 = 10 ✓ d=3 at f0 matches the hand-traced diagonal: 1 + 2 + 3 + 4 = 10 Figure 3.2 — Worked example with N_t = 4: from raw spectrogram (Layer 0) through pair sums (Layer 1) to the final merged output (Layer 2). Drift index d=3 at f0 recovers the diagonal sum 1+2+3+4 = 10.

Layer 0 → Layer 1 (pair sums)

Group (t0,t1) — rows g = 0 → output rows 0 and 1:

Row 0 (local drift 0): e.g. at f0: 1+0 = 1; at f1: 0+2 = 2.
Row 1 (local drift 1): at f0: t0[f0] + t1[f1] = 1+2 = 3; at f1: 0+0 = 0; …

Group (t2,t3) — rows g = 1 → output rows 2 and 3:

Row 2 (local 0): at f2: 3+0 = 3; at f3: 0+4 = 4; elsewhere zeros in this toy.
Row 3 (local 1): at f2: t2[f2] + t3[f3] = 3+4 = 7; …

So the Layer 1 buffer (only non-zero entries sketched) behaves like:

Row 0: energy at f0,f1 from "no bend" in first pair.
Row 1: bend +1 in first pair → strong at f0.
Row 2: no bend in second pair → f2,f3.
Row 3: bend in second pair → strong at f2.

Layer 2 (merge the two halves)

Here n_drifts_prev = 2, shift_amount = 2. The code combines row blocks {0,1} with {2,3} into 4 final rows d = 0…3.

The interesting cell is starting frequency f0, total drift d = 3 (binary 11): low bit selects row 1 from the first half, high bit applies shift +2 when reading from the second half—so you add row 1 at f0 to row 3 at f0+2 = f2:

Row 1, f0 = 3
Row 3, f2 = 7
Sum = 10 — the same as the hand-traced diagonal.

Rows d = 0,1,2 at f0 give smaller totals; d = 3 is the one that "lines up" with +1 channel per step under this staircase parameterisation. That is exactly what you want from a search grid: the row index is the discrete drift label; peaks appear where the label matches the physical drift (within quantisation).

5. The Butterfly Pattern

This is the same radix-2 butterfly idea as the FFT:

Inputs: time-ordered spectrogram rows (padded to power-of-two length).
Stages: log₂(N_padded) merges.
At stage k (0-based from the first pair merge): the cross-block shift magnitude is 2^k channels (after the first layer, which uses shift 1).

double each stage: 1 → 2 → 4 → 8 → … (binary decomposition of total drift) Figure 3.3 — Butterfly dataflow for N_t = 8: three merge stages (shift ±1, ±2, ±4) produce 8 drift output rows. Each row index d encodes the total drift in binary.

Takeaway: shifts double each time you move up a layer: 1, 2, 4, 8, … — the binary decomposition of the total drift index.

Tiny butterfly (N_t = 4)

For readers who prefer four leaves instead of eight, the same topology is just shorter:

t0 ──┬── } shift 1 t1 ──┘ } t2 ──┬── } shift 1 t3 ──┘ } ╲ / ╲ ╱ shift 2 between halves ╳ ╱ ╲ ╱ ╲ out: 4 rows (drifts 0..3) × N_f channels

Same rule at every scale: two children feed four drift combinations at the next rung, then eight, and so on, until the leaf count matches N_padded.

6. Complexity Analysis (Detailed)

Per layer

For each layer, the implementation touches on the order of N_padded × N_f array entries (each output cell is one addition and a couple of index checks). Constants depend on how many partial-drift rows are live at that stage, but they are O(1) per cell per layer.

Number of layers

L = log₂(N_padded) merge stages (after padding to a power of two).

Total serial work

✦

Key Concept — O(N log N) Complexity

O(N_padded × N_f × log₂(N_padded))

Compare to brute force: O(N_t² × N_f). The Taylor tree replaces the quadratic N_t² term with N_t × log₂(N_t) — an idealised speedup factor of N_t / log₂(N_t).

Compare to brute force

Brute force: O(N_d × N_t × N_f) with N_d ~ N_t → O(N_t² × N_f).

Taylor tree: O(N_t × log₂(N_t) × N_f) (using N_t ≈ N_padded for big-O intuition).

Idealised speedup factor on the time axis:

N_t / log₂(N_t)

Figure 3.4 — Idealised speedup of Taylor tree over brute force grows as N_t / log₂(N_t). At 1024 time steps, the tree is ~102× faster.

N_t	log₂(N_t)	N_t / log₂(N_t)
8	3	2.7×
16	4	4.0×
32	5	6.4×
64	6	10.7×
128	7	18.3×
256	8	32×
1024	10	102.4×

Real pipelines also pay for normalisation, candidate extraction, clustering, and I/O; measured speedups track theory best when the tree dominates the timer.

7. Implementation in MitraSETI (Rust)

The production code lives in core/src/dedoppler.rs.

Data layout: ndarray::Array2<f32> — rows are time (after normalisation), columns are frequency channels.
Entry points:
- build_taylor_tree builds one tree for a fixed sign (+1 or −1). It returns Array2<f32> with shape (n_padded, n_chans): row d is the integrated sum for discrete drift index d at each start channel.
- taylor_tree_search orchestrates padding, both signs, thresholding against min_snr, and emits SignalCandidate structs (then clustering / optional RFI filtering run as in the full DedopplerEngine::search pipeline).
Parallelism: rayon — each group at a given layer is independent, so the inner combine uses into_par_iter() over groups and writes results back.
Bidirectional search: two passes fix the fact that sign only affects whether you read f + δ or f − δ when shifting; negative drifts are not free extras from the positive pass.
Padding: if N_t is not a power of two, ceil(log₂ N_t) determines n_layers and n_padded = 2^n_layers; missing time rows are treated as zeros in the pair combines.
Python side: enable the tree with SearchParams.use_taylor_tree = true (default in typical configs). The heavy work stays in Rust; Python supplies the spectrogram slice and parameters.

8. Why Rust? Why Not Python?

Tight numerical loops in pure Python carry enormous interpreter overhead—often 10×–100× slower than a compact compiled loop on the same CPU for this access pattern.
Rust delivers C-like throughput with memory safety and explicit control over layout—important when every cache line matters on large N_f.
PyO3 exposes DedopplerEngine, headers, and result types directly to Python. NumPy data can be handed off without a slow serial copy in the hot path design.
maturin builds the extension into an installable wheel (the import name is mitraseti_core for the compiled module).

From a notebook or script you typically call:

Pythonimport mitraseti_core

engine = mitraseti_core.DedopplerEngine(params)
result = engine.search(flat_data, n_times, n_chans, header)

You do not need to read the Rust to use the search—but understanding the tree explains what is being approximated and why it is fast.

9. MitraSETI Benchmark Results (Representative)

On one reference setup (65 536 channels, 16 time steps):

Mode	Time	Throughput
Taylor tree	38.9 ms	~25 M points/s
Brute force	163.1 ms	~6.4 M points/s

Speedup ≈ 4.2×, consistent with 16 / log₂(16) = 4 as a first-order expectation on the algorithmic term. As N_t grows, the gap widens because the quadratic brute-force term wakes up while the tree stays near-linear in N_t times log N_t.

10. Edge Cases and Gotchas

⚠

Warning — Edge Cases

These are common pitfalls when implementing or configuring a Taylor tree search. Miss any one of them and you may get silent wrong answers or panics at runtime.

N_t = 1: There is no merge stage. The tree returns a single row—the first (and only) spectrum row—shape (1, n_chans).
Drift limit vs tree width: Physical max_drift_rate maps to an integer max_drift_channels. The tree only defines rows 0 … n_padded − 1. If max_drift_channels > n_padded − 1 (common with very fine frequency resolution and few time steps), you must clamp the loop that reads rows; otherwise you would index past the built tree. MitraSETI uses max_drift_channels.min(n_padded - 1).
Zero drift double counting: The negative-sign pass would repeat d = 0 identically to the positive pass. The code skips d = 0 when sign = −1.
The Voyager 1 regression (lesson learned): Ultra-wide FFT-style filterbanks (e.g. 2²⁰ channels) with only ~16 time steps made max_drift_channels enormous compared to n_padded − 1. Without clamping, the extractor walked off the end of the Array2. The fix is conceptually simple—respect the actual number of drift rows built—but easy to miss because small test files never triggered it.
Staircase vs continuous drift: The tree searches a discrete family of paths tied to channel quantisation and observation length—the same pragmatic approximation used across many SETI drift pipelines. Extremely steep drifts or mis-estimated tsamp / foff still matter for physical interpretation even when the math is fast.

11. History and Context

1974 — J. H. Taylor, "A sensitive method for detecting dispersed radio emission" introduced the tree of partial sums to integrate along dispersion trails in pulsar data—different physics, identical "sum along a slanted path through a 2D array" structure.
SETI adaptation: Once narrowband drift is modelled as linear frequency vs time, the same combinatorics apply: you are still integrating a sampled 2D function along approximately linear loci.
turboSETI and relatives use fast, tree-like ideas internally; details differ from a textbook Taylor-tree exposition, but the motivation—share work across drifts—is shared.

12. Mental Checklist (What You Should Remember)

Brute force is correct but quadratic in N_t when N_d ~ N_t.
The Taylor tree reuses partial sums in a log₂(N_t)-deep hierarchy.
Each layer's shift size doubles — butterfly / FFT analogy.
MitraSETI implements this in Rust (Array2<f32>, rayon), twice for ± drift, with power-of-two padding.
Clamp drift indices to n_padded − 1; skip duplicate d = 0 on the negative pass.

When you next look at a spectrogram with a slanted narrowband trace, you can picture both the single diagonal a human eye tracks and the whole family of diagonals the Taylor tree updates together—that is the bridge between the brute-force story in the previous chapter and the interactive searches MitraSETI runs in production.

💡

Next Steps

Next suggested reading: pipeline chapters on candidate clustering, RFI rejection, and end-to-end search() — now that the O(N log N) core is in place.

Try it in the Cloud

The Rust-powered Taylor Tree runs at 45x speed in the cloud. Upload your own file and see it in action.

Open MitraSETI Cloud →