Chapter 03

The Taylor Tree Algorithm

A divide-and-conquer structure that computes de-Doppler sums in O(N log N) work instead of O(N²) — the engine inside MitraSETI.

Saman Tabatabaeian — Deep Field Labs MitraSETI Tutorial Series

Prerequisite: Doppler Effect and De-Doppler Search (spectrograms, drift, brute-force integration along diagonals)

You already know how brute-force de-Doppler works: for each trial drift rate, walk through time and add up the normalised power along the predicted frequency trajectory. That is correct, easy to reason about, and far too slow when the number of drift trials and time steps grow. This chapter explains the Taylor tree—the divide-and-conquer structure MitraSETI uses (by default) to compute essentially the same family of integrated sums in O(N log N) work per frequency channel instead of O(N²).

1. The Problem with Brute Force (Recap)

Cost

Call Nt the number of time rows, Nf the number of frequency channels, and Nd the number of discrete drift rates you try (typically comparable to the number of channel offsets you can accumulate over the full observation). Brute force is:

O(Nd × Nt × Nf)

For each drift, each starting channel, you touch about Nt spectrogram cells. When Nd is on the order of Nt (one channel per full observation is a natural resolution), the scaling feels like O(Nt² × Nf). Double the observation length (more time steps and a proportionally finer drift grid) and you pay four times the work in the worst case.

Why so much work is redundant

Adjacent discrete drifts are not independent random walks through the array. In the staircase model (one channel offset per time step, rounded), two drifts that differ by the smallest step agree on every time row except possibly the last: they follow the same path until the final increment. More generally, many long trajectories share long prefixes. Brute force recomputes those prefixes from scratch for every drift trial.

So the asymptotic bottleneck is not "physics is hard"—it is duplicated summation. The Taylor tree removes that duplication.

2. The Key Insight: Reusing Partial Sums

Tournament bracket vs round-robin

Round-Robin (Brute Force) O(N²) games T1 T2 T3 T4 T5 T6 T7 T8 Every pair re-computed ~28 games for 8 teams Elimination Bracket (Taylor Tree) O(N log N) games T1 T2 T3 T4 T5 T6 R1 R2 R3 Results feed forward ~8 games (N) × 3 rounds (log N) Same answer, far fewer operations Figure 3.1 — Round-robin (brute force) requires every-pair computation. A single-elimination bracket (Taylor tree) reuses results from each round, achieving O(N log N).

Imagine ranking teams:

The Taylor tree is the second idea applied to sums along time, not sports. Instead of recomputing every full-length diagonal independently, you combine smaller chunks whose answers you already know.

Key Concept — The Partial Sums Insight

Computing running totals 1 + 2 + 3 + … is faster if you remember the previous total: each new value is previous + next. The tree generalises that idea to two-dimensional structure: at each level you store not one running total but a small bundle of totals corresponding to how much the trajectory "bent" inside each sub-interval.

Prefix sums

Computing running totals 1 + 2 + 3 + … is faster if you remember the previous total: each new value is previous + next. The tree generalises that idea to two-dimensional structure: at each level you store not one running total but a small bundle of totals corresponding to how much the trajectory "bent" inside each sub-interval.

Divide and conquer over time

Partition the time axis into halves, then quarters, then eighths. At the finest level you combine pairs of consecutive time rows. At the next level you combine pairs of pair-results, and so on, until one block spans the entire observation.

Key Concept — Binary Decomposition of Drift

At each merge, a binary choice encodes whether the second half's contribution is frequency-shifted relative to the first by a fixed number of channels. Those choices line up with bits of the total drift index—exactly the FFT butterfly story, but with addition and index shifts instead of complex twiddles.

An integer d in 0 … Npadded−1 has binary expansion d = b₀ + 2b₁ + 4b₂ + …; bit bk says whether, at merge depth k, you applied the 2k-channel offset to the right-hand sub-tree.

3. Step-by-Step Construction

Throughout, think of Layer 0 as the raw spectrogram: Nt rows indexed by time, Nf columns indexed by frequency. MitraSETI pads Nt up to the next power of two (Npadded) with zeros when needed; the tree is clearest when Nt is already a power of two.

Sign convention: The implementation runs the tree twice—sign = +1 (energy moves to higher channel index over time) and sign = −1 (lower index). Layer-1 shift is one channel in that direction; deeper layers use 2, 4, 8, … channels.

Layer 0: Raw spectrogram

You have Nt rows t0 … tNt−1 and Nf channels f0 … fNf−1.

Example layout (Nt = 8, Nf = 8):

f0 f1 f2 f3 f4 f5 f6 f7 t0 | ... ... ... ... ... ... ... ... t1 | ... ... ... ... ... ... ... ... t2 | ... ... ... ... ... ... ... ... t3 | ... ... ... ... ... ... ... ... t4 | ... ... ... ... ... ... ... ... t5 | ... ... ... ... ... ... ... ... t6 | ... ... ... ... ... ... ... ... t7 | ... ... ... ... ... ... ... ...

Layer 1: Combine adjacent time pairs

Take pairs (t0,t1), (t2,t3), (t4,t5), (t6,t7).

For each pair and each frequency column f, emit two rows:

You now have four groups (one per pair), two partial-drift rows each → 8 rows total, still Nf columns. Those rows are not "final drifts" yet; they are local choices inside each 2-step window.

ASCII merge view:

Layer 0: t0 t1 t2 t3 t4 t5 t6 t7 \ / \ / \ / \ / Layer 1: [d=0] [d=0] [d=0] [d=0] [d=1] [d=1] [d=1] [d=1]

Here d=0 / d=1 means "zero or one channel shift inside that pair," not the full-observation drift yet.

Layer 2: Combine pairs of pairs

Merge the block (t0–t1) with (t2–t3), and separately (t4–t5) with (t6–t7).

Each side brings two rows (local drift 0 or 1). When you merge two halves of length 2, the combined drift index needs two bitsfour rows for that merged block:

You can read the merge like a two-digit binary counter. Suppose the left half contributed integer drift_A ∈ {0,1} from its own pair merges and the right half contributed drift_B ∈ {0,1}. The new four outcomes are all pairs (drift_A, drift_B) together with a third binary choice: whether the right block's spectrum is shifted by 2k before adding, where k is the layer index inside the implementation's loop (here shift = 2 channels). That third choice is the next bit of the global drift index. After the full tree, an integer d in 0 … Npadded−1 has binary expansion d = b₀ + 2b₁ + 4b₂ + …; bit bk says whether, at merge depth k, you applied the 2k-channel offset to the right-hand sub-tree. This is exactly the sense in which total drift is assembled from half-sized subproblems—like counting in binary, except each 1 bit carries a physical frequency slip instead of arithmetic weight alone.

After this layer: 2 groups, 4 drift rows each.

Layer 3 (final when Nt = 8)

Merge the two remaining mega-blocks. The cross-boundary shift is 2² = 4 channels. You obtain 8 rows, indexed d = 0 … 7. Row d, column f, holds the integrated sum along the staircase trajectory of total drift d (in channel units per observation, in the chosen sign direction) that starts at channel f at t0—subject to zeroing when trajectories walk out of the band.

4. Worked Example: Four Time Steps, Actual Numbers

Take Nt = 4, Npadded = 4, n_layers = 2, and sign = +1. Use 8 channels so shifts stay in-band for the columns we care about.

Suppose a signal drifts up one channel per time step, so power sits on a diagonal:

f0f1f2f3f4f5f6f7
t010000000
t102000000
t200300000
t300040000

The ideal diagonal starting at f0 picks (t0,f0), (t1,f1), (t2,f2), (t3,f3) → sum 1 + 2 + 3 + 4 = 10.

Layer 0 — Raw Spectrogram t0: 1 0 0 0 t1: 0 2 0 0 t2: 0 0 3 0 t3: 0 0 0 4 f0 f1 f2 f3 Group A Group B Pair sums (shift 0 or 1) Layer 1 — Pair Sums Row 0 (d=0): f0:1 f1:2 f2:0 f3:0 Row 1 (d=1): f0:3 f1:0 f2:0 f3:0 t0[f] + t1[f] t0[f] + t1[f+1] Row 2 (d=0): f0:0 f1:0 f2:3 f3:4 Row 3 (d=1): f0:0 f1:0 f2:7 f3:0 t2[f] + t3[f] t2[f] + t3[f+1] Merge halves (shift 0 or 2) Layer 2 — Final (4 drift rows) d=0 (binary 00): f0: 1 (Row0[f0] + Row2[f0]) d=1 (binary 01): f0: 3 (Row1[f0] + Row2[f0]) d=2 (binary 10): f0: 4 (Row0[f0] + Row2[f0+2]) d=3 (binary 11): f0: 10 (Row1[f0] + Row3[f0+2]) ← 3 + 7 = 10 ✓ d=3 at f0 matches the hand-traced diagonal: 1 + 2 + 3 + 4 = 10 Figure 3.2 — Worked example with Nt = 4: from raw spectrogram (Layer 0) through pair sums (Layer 1) to the final merged output (Layer 2). Drift index d=3 at f0 recovers the diagonal sum 1+2+3+4 = 10.

Layer 0 → Layer 1 (pair sums)

Group (t0,t1) — rows g = 0 → output rows 0 and 1:

Group (t2,t3) — rows g = 1 → output rows 2 and 3:

So the Layer 1 buffer (only non-zero entries sketched) behaves like:

Layer 2 (merge the two halves)

Here n_drifts_prev = 2, shift_amount = 2. The code combines row blocks {0,1} with {2,3} into 4 final rows d = 0…3.

The interesting cell is starting frequency f0, total drift d = 3 (binary 11): low bit selects row 1 from the first half, high bit applies shift +2 when reading from the second half—so you add row 1 at f0 to row 3 at f0+2 = f2:

Rows d = 0,1,2 at f0 give smaller totals; d = 3 is the one that "lines up" with +1 channel per step under this staircase parameterisation. That is exactly what you want from a search grid: the row index is the discrete drift label; peaks appear where the label matches the physical drift (within quantisation).

5. The Butterfly Pattern

This is the same radix-2 butterfly idea as the FFT:

Butterfly Dataflow — Nt = 8 Inputs t0 t1 t2 t3 t4 t5 t6 t7 Stage 0 shift ±1 Stage 1 shift ±2 Stage 2 shift ±4 Output drift rows d=0 (000) d=1 (001) d=2 (010) d=3 (011) d=4 (100) d=5 (101) d=6 (110) d=7 (111) Shifts double each stage: 1 → 2 → 4 → 8 → … (binary decomposition of total drift) Figure 3.3 — Butterfly dataflow for Nt = 8: three merge stages (shift ±1, ±2, ±4) produce 8 drift output rows. Each row index d encodes the total drift in binary.

Takeaway: shifts double each time you move up a layer: 1, 2, 4, 8, … — the binary decomposition of the total drift index.

Tiny butterfly (Nt = 4)

For readers who prefer four leaves instead of eight, the same topology is just shorter:

t0 ──┬── } shift 1 t1 ──┘ } t2 ──┬── } shift 1 t3 ──┘ } ╲ / ╲ ╱ shift 2 between halves ╳ ╱ ╲ ╱ ╲ out: 4 rows (drifts 0..3) × N_f channels

Same rule at every scale: two children feed four drift combinations at the next rung, then eight, and so on, until the leaf count matches Npadded.

6. Complexity Analysis (Detailed)

Per layer

For each layer, the implementation touches on the order of Npadded × Nf array entries (each output cell is one addition and a couple of index checks). Constants depend on how many partial-drift rows are live at that stage, but they are O(1) per cell per layer.

Number of layers

L = log₂(Npadded) merge stages (after padding to a power of two).

Total serial work

Key Concept — O(N log N) Complexity

O(Npadded × Nf × log₂(Npadded))

Compare to brute force: O(Nt² × Nf). The Taylor tree replaces the quadratic Nt² term with Nt × log₂(Nt) — an idealised speedup factor of Nt / log₂(Nt).

Compare to brute force

Brute force: O(Nd × Nt × Nf) with Nd ~ NtO(Nt² × Nf).

Taylor tree: O(Nt × log₂(Nt) × Nf) (using Nt ≈ Npadded for big-O intuition).

Idealised speedup factor on the time axis:

Nt / log₂(Nt)
Speedup: Nt / log₂(Nt) Speedup factor × Nt (time steps) 10× 20× 50× 100× 2.7× 8 4.0× 16 6.4× 32 10.7× 64 18.3× 128 32× 256 102.4× 1024 Figure 3.4 — Idealised speedup of Taylor tree over brute force grows as Nt / log₂(Nt). At 1024 time steps, the tree is ~102× faster.
Ntlog₂(Nt)Nt / log₂(Nt)
832.7×
1644.0×
3256.4×
64610.7×
128718.3×
256832×
102410102.4×

Real pipelines also pay for normalisation, candidate extraction, clustering, and I/O; measured speedups track theory best when the tree dominates the timer.

7. Implementation in MitraSETI (Rust)

The production code lives in core/src/dedoppler.rs.

8. Why Rust? Why Not Python?

From a notebook or script you typically call:

Pythonimport mitraseti_core

engine = mitraseti_core.DedopplerEngine(params)
result = engine.search(flat_data, n_times, n_chans, header)

You do not need to read the Rust to use the search—but understanding the tree explains what is being approximated and why it is fast.

9. MitraSETI Benchmark Results (Representative)

On one reference setup (65 536 channels, 16 time steps):

ModeTimeThroughput
Taylor tree38.9 ms~25 M points/s
Brute force163.1 ms~6.4 M points/s

Speedup ≈ 4.2×, consistent with 16 / log₂(16) = 4 as a first-order expectation on the algorithmic term. As Nt grows, the gap widens because the quadratic brute-force term wakes up while the tree stays near-linear in Nt times log Nt.

10. Edge Cases and Gotchas

Warning — Edge Cases

These are common pitfalls when implementing or configuring a Taylor tree search. Miss any one of them and you may get silent wrong answers or panics at runtime.

  1. Nt = 1: There is no merge stage. The tree returns a single row—the first (and only) spectrum row—shape (1, n_chans).

  2. Drift limit vs tree width: Physical max_drift_rate maps to an integer max_drift_channels. The tree only defines rows 0 … n_padded − 1. If max_drift_channels > n_padded − 1 (common with very fine frequency resolution and few time steps), you must clamp the loop that reads rows; otherwise you would index past the built tree. MitraSETI uses max_drift_channels.min(n_padded - 1).

  3. Zero drift double counting: The negative-sign pass would repeat d = 0 identically to the positive pass. The code skips d = 0 when sign = −1.

  4. The Voyager 1 regression (lesson learned): Ultra-wide FFT-style filterbanks (e.g. 2²⁰ channels) with only ~16 time steps made max_drift_channels enormous compared to n_padded − 1. Without clamping, the extractor walked off the end of the Array2. The fix is conceptually simple—respect the actual number of drift rows built—but easy to miss because small test files never triggered it.

  5. Staircase vs continuous drift: The tree searches a discrete family of paths tied to channel quantisation and observation length—the same pragmatic approximation used across many SETI drift pipelines. Extremely steep drifts or mis-estimated tsamp / foff still matter for physical interpretation even when the math is fast.

11. History and Context

12. Mental Checklist (What You Should Remember)

  1. Brute force is correct but quadratic in Nt when Nd ~ Nt.
  2. The Taylor tree reuses partial sums in a log₂(Nt)-deep hierarchy.
  3. Each layer's shift size doublesbutterfly / FFT analogy.
  4. MitraSETI implements this in Rust (Array2<f32>, rayon), twice for ± drift, with power-of-two padding.
  5. Clamp drift indices to n_padded − 1; skip duplicate d = 0 on the negative pass.

When you next look at a spectrogram with a slanted narrowband trace, you can picture both the single diagonal a human eye tracks and the whole family of diagonals the Taylor tree updates together—that is the bridge between the brute-force story in the previous chapter and the interactive searches MitraSETI runs in production.

💡

Next Steps

Next suggested reading: pipeline chapters on candidate clustering, RFI rejection, and end-to-end search() — now that the O(N log N) core is in place.

Try it in the Cloud

The Rust-powered Taylor Tree runs at 45x speed in the cloud. Upload your own file and see it in action.

Open MitraSETI Cloud →