Complexity classes, divide-and-conquer, the FFT, parallelism, and why Rust sits next to Python — the theoretical CS behind SETI at scale.
This chapter is for software engineers who want the theoretical computer science behind MitraSETI: complexity classes, divide-and-conquer, the FFT, parallelism, data structures, clustering cost, neural nets, robust statistics, memory hierarchy, and why Rust sits next to Python in the stack. Where earlier tutorials emphasized what the pipeline does, here we emphasize why certain designs are inevitable at SETI scale.
An algorithm is a finite, well-defined sequence of steps that takes input and produces output. In practice, "well-defined" means each step is unambiguous to a machine (or a careful human). The same abstract algorithm can be implemented in Rust, Python, or hardware; complexity describes how that procedure's cost grows as the input size grows — not the constant factors hidden in any one language.
Time complexity answers: as the problem size N grows, how does the number of elementary operations (or the dominant term in runtime) grow? We care about asymptotic behavior: behavior for large N, where lower-order terms and implementation details matter less than the scaling exponent.
We write f(N) = O(g(N)) if, for sufficiently large N, f(N) is bounded above by a constant multiple of g(N). Informally: "g captures the worst-case growth rate, up to constants."
Common classes, from best to worst scaling for large N:
| Class | Name (informal) | Typical meaning |
|---|---|---|
O(1) | Constant | Work does not grow with N |
O(log N) | Logarithmic | Halving/doubling the search space each step |
O(N) | Linear | One pass (or constant passes) over data |
O(N log N) | Linearithmic | Divide-and-conquer with linear combine |
O(N²) | Quadratic | Pairs, nested loops over N |
O(N³) | Cubic | Triple nested structure, naive matrix multiply |
O(2ᴺ) | Exponential | Enumerate subsets or brute-force binary choices |
a[i] is one address calculation plus one load; the cost does not depend on how long the array is.Space complexity measures extra memory (beyond storing the input) as a function of N. An in-place sort may use O(1) extra space; merge sort typically uses O(N) auxiliary arrays; recursion depth can add O(log N) stack frames for balanced divide-and-conquer.
Some structures have occasional expensive operations but cheap operations most of the time. Amortized analysis averages cost over a sequence of operations. Classic example: dynamic arrays that double capacity when full — individual appends are usually O(1), but rare reallocations are O(N); over many appends, the average is still O(1) per append.
SETI pipelines see enormous effective N: millions of frequency channels, long time series, thousands of files. The gap between O(N²) and O(N log N) is not a constant tweak — it is often the difference between finishing in seconds versus never finishing on the same machine. MitraSETI's move from naive de-Doppler to a Taylor-tree-style construction is fundamentally a complexity-class story.
Divide and conquer means:
When the combine step is O(N) and you split into two problems of size N/2, you get recurrences of the form:
Unrolling the recurrence makes the log N factor visible. Suppose for simplicity N is a power of two and:
Expand once:
After j expansions:
Stop when N/2j = 1, i.e. j = log₂ N. Then 2j T(1) = O(N) for constant base case T(1), and the second term is c·N·log₂ N. Hence:
At each of about log₂ N levels, you do O(N) work across the whole array (merging, butterfly combines, etc.) — same total.
This is the same skeleton behind merge sort, Cooley–Tukey FFT, and the Taylor tree.
For recurrences of the form T(N) = a · T(N/b) + O(Nc), where a ≥ 1, b > 1, define d = logb a (the "leaf exponent"). Three qualitative regimes:
You do not need to memorize cases for MitraSETI; you need the intuition: logarithmic depth of recursion × linear work per level → O(N log N) whenever you are in the balanced regime.
The Taylor tree splits Nt time steps across log₂ Nt layers (for padded power-of-two sizes). At each layer, work is proportional to something like Npadded × Nf (padded time extent × frequency bins), with independent groups that can be parallelized. So total work scales as O(Npadded · Nf · log Nt) — the log factor is the tree depth, the linear factors are the per-layer data volume. Implementation lives in spirit alongside dedoppler.rs in the MitraSETI codebase.
The Discrete Fourier Transform (DFT) expresses a length-N sequence {xn} as coefficients of complex exponentials at frequencies k = 0, …, N−1:
This is the time domain → frequency domain map that underlies spectral analysis.
Computing each Xk is a dot product of length N; there are N values of k, hence O(N²) operations.
The FFT exploits algebraic structure: symmetries and periodicity of roots of unity. By recursively splitting into smaller DFTs (even/odd decimation) and recombining, the cost drops to O(N log N).
FFT implementations are often drawn as butterfly graphs: two inputs, twiddle factors, two outputs. Each stage mixes pairs that are N/2k apart at stage k, which is exactly the "which indices get combined this round?" pattern you see in radix-2 FFT flowcharts. The Taylor tree uses a similar staged pairing of data with phase/frequency shifts that implement drift compensation in the time–frequency plane — not identical twiddle factors to every FFT variant, but the same computational architecture: logarithmic depth, local combine operations, massive potential parallelism.
When you read FFT code and Taylor-tree code side by side, look for: bit-reversal or stride doubling, in-place vs out-of-place buffers, and independent butterflies that rayon can fan out across.
A spectrogram is built by windowing the time series into segments and applying a DFT/FFT per window — the Short-Time Fourier Transform (STFT). So FFT cost appears repeatedly across time windows; efficient FFT libraries are central to real-time-ish processing.
MitraSETI uses FFT-based ideas for periodicity and spectral structure; the Taylor tree is the de-Doppler workhorse whose structure mirrors FFT closely enough that understanding one deeply helps you understand the other.
Single-thread clock scaling has slowed; chips expose many cores and wide vectors. For throughput-bound pipelines (SETI often is), using cores is not optional — it is how you turn a week into a day.
If fraction s of work is strictly sequential and fraction (1−s) is perfectly parallel across P processors, ideal speedup is:
Numeric anchor: with s=0.1 and P=8, the denominator is 0.1 + 0.9/8 ≈ 0.2125, so speedup ≲ 4.7× — far below 8×. Let P→∞: speedup → 1/s = 10×. Bottlenecks matter enormously.
If the problem size grows with available parallelism (more cores → process more data), the effective sequential fraction can shrink in practice — throughput scales better than a fixed tiny problem would suggest. SETI often has more data to process when you add machines; Gustafson's perspective matches weak scaling intuition.
Contrast in one sentence: Amdahl warns that fixed workloads hit a sequential wall; Gustafson notes that scaled workloads can keep parallel chunks large so utilization stays high. Good pipeline design reduces s (parallel readers, batched Rust kernels) and grows work per run when cores appear.
rayon is a Rust crate for data-parallel iterators. Calling into_par_iter() turns a sequential iterator into one that work-steals across a thread pool rayon manages for you.
In the Taylor tree, many groups at a given layer are independent → embarrassingly parallel structure → near-linear speedup on ~8 cores for tree construction is plausible when the parallel fraction is high and memory bandwidth does not saturate.
A hash table maps a key to a bucket via a hash function h(k) → {0, …, M−1}. Average-case lookup/insert is O(1) when buckets stay balanced.
Real hashes collide. Chaining stores linked lists in buckets; open addressing probes alternative slots. Worst-case degrades toward O(N) if everything lands in one bucket.
Conceptually, a known RFI store wants O(1) average per-frequency checks: "is this bin on a bad list?" A hash map achieves that when load factor is controlled and hash quality spreads keys. In MitraSETI, when the list is tiny (on the order of tens of entries — e.g. linear scan over ~27), constant-factor overhead of hashing (hash computation, pointer chasing, cache misses on bucket chains) can exceed a tight loop over a contiguous small table. So O(N) with very small N beats O(1) with large constants — a reminder that asymptotics and engineering must agree.
Rule of thumb: below a few dozen to a few hundred keys, sorted vector + binary search or even linear scan often wins on real CPUs; above thousands, hashing or trees pull ahead. SETI's per-channel hot path should stay branch-predictable and cache-hot.
Complexity is often quoted as O(N · k · iterations) for N points, k centers: assign each point to nearest center, update centers, repeat. Fast, but assumes roughly spherical clusters of similar size — often wrong for structured RFI trails and elongated features.
With spatial indexing (e.g., ball tree / k-d tree), neighborhood queries can be O(log N) amortized, yielding roughly O(N log N) overall in favorable regimes. DBSCAN finds arbitrary-shaped clusters and labels noise — useful when "cluster" ≠ "Gaussian blob".
HDBSCAN extends DBSCAN's sensitivity to ε by constructing a full hierarchy of clusterings at different density levels, then extracting stable clusters. Internally, it leans on mutual reachability distances and minimum spanning trees on a weighted core-distance graph. Worst-case O(N²) stories arise when naive all-pairs work or dense graphs dominate; optimized code paths use spatial indices and Prim-style MST construction on k-nearest-neighbor graphs to approach O(N log N) in many practical point clouds. It is heavier than k-means but more faithful to varying density and noise — exactly what you want when RFI and astrophysical structure do not look like spherical Gaussian blobs.
Some clustering pipelines use MST ideas. Kruskal: sort edges, O(E log E); Prim with naive structures O(V²) (better with heaps on sparse graphs). HDBSCAN's internal story connects to density and spanning structures on graphs derived from distances.
HDBSCAN runs after de-Doppler on a reduced set of detections (often hundreds, not millions). The heavy lifting is upstream; clustering cost is manageable because N is small at that stage — see pipeline.py integration in the project.
A layer is mostly affine maps (matrix multiply + bias) and nonlinear activations. Deep networks stack these; depth multiplies sequential dependency (harder to parallelize across layers, easy within large matmuls).
A 2D conv layer with kernel K×K, Cin input channels, Cout output channels, on feature map H×W, has very roughly:
Im2col and Winograd change constants, not the scaling intuition.
For sequence length N and model dimension d, attention involves pairwise interactions across tokens — often O(N² · d) for standard attention (optimizations exist for long sequences). That quadratic in N is why long contexts are expensive.
Training uses reverse-mode automatic differentiation (backprop): one forward pass stores activations, then a backward pass applies the chain rule to propagate gradients from loss to parameters. Reverse mode is efficient when many inputs feed one scalar loss — typical for neural nets — because cost is roughly proportional to one forward + one backward, not input dimension times forward passes (as forward-mode autodiff would be). Inference-only paths (MitraSETI's classifier at runtime) omit backward graphs and optimizer state — lower memory and lower latency than training.
GPUs expose thousands of simple cores suited to large matmuls and convolutions. MitraSETI's note for practitioners: inference is CPU-only (no GPU required), improving portability and deployment on modest hardware — at the cost of throughput versus a large GPU.
Sort-based median: O(N log N). Quickselect (randomized pivoting) achieves expected O(N) for a single order statistic.
Two medians → often O(N log N) with sorting, or O(N) expected with selection tricks.
A single pass (or two passes for numerically stable moments) can yield kurtosis in O(N) per channel/window.
Per-channel signal-to-noise estimates from running statistics are typically O(N) over the samples contributing to each estimate.
Mean and standard deviation are sensitive to outliers: a few strong RFI spikes pull the mean and inflate variance, widening thresholds and masking faint carriers — or conversely tightening thresholds incorrectly after bad preprocessing. Median and MAD (often scaled to mimic Gaussian σ) resist a bounded fraction of arbitrary contamination under formal breakdown-point analyses. They underpin robust thresholds in spectral kurtosis-style reasoning and related guards — see the SK filter discussion elsewhere in the series.
Complexity note: robustness costs more than a single mean pass (median/MAD), but on per-channel windows that cost is still linear or linearithmic in window length — negligible next to FFTs and tree construction at survey scale.
| Level | Size (order) | Latency (order) |
|---|---|---|
| L1 | ~32 KB | ~1 ns |
| L2 | ~256 KB | ~4 ns |
| L3 | ~8 MB | ~10 ns |
| RAM | GB scale | ~100 ns |
| SSD / disk | TB scale | ~µs–ms (10,000 ns+) |
An O(N log N) algorithm with random memory access can lose to an O(N²) algorithm with sequential, cache-friendly access on medium N: constants from cache misses dominate.
The tree's staged, structured traversal tends toward predictable access over contiguous buffers — good for prefetch and cache lines. Random probing of a huge hash-heavy graph (not MitraSETI's main inner loop, but a contrast) hurts.
A rough footprint for the tree array on the order of Npadded × Nchans × 4 bytes (per precision assumptions) should aim to fit in L3 for typical configurations — when it spills to RAM bandwidth-bound behavior shows up. Engineering then balances padding, channels per batch, and parallelism to stay bandwidth-efficient.
PyO3 exposes Rust functions/types to Python. maturin builds Rust into a Python wheel (.whl) so pip install brings native extensions. MitraSETI's split stack is Python orchestrates, Rust computes, PyO3 bridges.
Typical speedups for numeric inner loops are often cited in the 10×–100× range versus pure Python — exact factors depend on allocation, vectorization, and whether NumPy already calls optimized BLAS.
MitraSETI architecture (one paragraph): Python parses configs, schedules work, calls into PyTorch (or similar) for model inference, and handles plots; Rust (via PyO3 / wheels built with maturin) owns de-Doppler and other numeric hotspots. That split keeps research velocity where dynamic languages shine and deterministic throughput where the borrow checker helps. When you profile, expect time to concentrate in Rust for tree-heavy runs and in Python + ML for classification-heavy runs — optimize where the flame graph points.
The pipeline is a composition of tools chosen for scaling, robustness, and deployability:
| Concept | Where it shows up |
|---|---|
| Divide and conquer | Taylor tree construction (dedoppler.rs) |
| O(N log N) | Taylor tree; FFT-based periodicity analyses |
| Parallel computing | rayon during independent tree groups / data-parallel stages |
| Density clustering | HDBSCAN in pipeline.py on post–de-Doppler detections |
| FFT / STFT | Spectral views, periodicity-related processing |
| Robust statistics | SK filter: median, MAD-style reasoning for RFI |
| Neural networks | CNN + Transformer classifier scoring candidates |
| Cache-friendly access | Sequential tree buffers vs random scatter-gather patterns |
| Rust + Python | Native extensions for compute; Python for glue and ML |
If you take one lesson into code review: measure both asymptotics and constants. SETI-scale data makes exponents non-negotiable; cache behavior and parallel fractions decide whether a theoretically optimal algorithm wins on your machine. The MitraSETI stack reflects that discipline — log-linear core search, parallel tree stages, robust statistics where RFI lurks, and pragmatic language boundaries where they matter most.
Next in the series: 08 — MitraSETI Pipeline Walkthrough (end-to-end path from raw data to ranked candidates).
All these algorithms run at scale in MitraSETI Cloud. Process files up to 1 GB with zero infrastructure setup.