Chapter 07

Computer Science Foundations for MitraSETI Algorithms

Complexity classes, divide-and-conquer, the FFT, parallelism, and why Rust sits next to Python — the theoretical CS behind SETI at scale.

Saman Tabatabaeian — Deep Field Labs MitraSETI Tutorial Series Intermediate → Advanced

This chapter is for software engineers who want the theoretical computer science behind MitraSETI: complexity classes, divide-and-conquer, the FFT, parallelism, data structures, clustering cost, neural nets, robust statistics, memory hierarchy, and why Rust sits next to Python in the stack. Where earlier tutorials emphasized what the pipeline does, here we emphasize why certain designs are inevitable at SETI scale.

1. Algorithm Complexity — Big-O Notation

What is an algorithm?

An algorithm is a finite, well-defined sequence of steps that takes input and produces output. In practice, "well-defined" means each step is unambiguous to a machine (or a careful human). The same abstract algorithm can be implemented in Rust, Python, or hardware; complexity describes how that procedure's cost grows as the input size grows — not the constant factors hidden in any one language.

Time complexity

Time complexity answers: as the problem size N grows, how does the number of elementary operations (or the dominant term in runtime) grow? We care about asymptotic behavior: behavior for large N, where lower-order terms and implementation details matter less than the scaling exponent.

Big-O notation

✦

Key Concept — Big-O Notation

We write f(N) = O(g(N)) if, for sufficiently large N, f(N) is bounded above by a constant multiple of g(N). Informally: "g captures the worst-case growth rate, up to constants."

Common classes, from best to worst scaling for large N:

Class	Name (informal)	Typical meaning
`O(1)`	Constant	Work does not grow with N
`O(log N)`	Logarithmic	Halving/doubling the search space each step
`O(N)`	Linear	One pass (or constant passes) over data
`O(N log N)`	Linearithmic	Divide-and-conquer with linear combine
`O(N²)`	Quadratic	Pairs, nested loops over N
`O(N³)`	Cubic	Triple nested structure, naive matrix multiply
`O(2ᴺ)`	Exponential	Enumerate subsets or brute-force binary choices

Figure 7.1 — Big-O complexity classes: for SETI-scale data (millions of channels), only algorithms in the green feasible zone finish in practical time.

Examples tied to intuition

O(1) — Array lookup by index: given a contiguous array, reading a[i] is one address calculation plus one load; the cost does not depend on how long the array is.
O(log N) — Binary search on a sorted array: each comparison eliminates half the remaining elements. After k steps, at most N/2^k candidates remain; setting N/2^k ≈ 1 gives k ≈ log₂ N.
O(N) — Scanning a list: finding the maximum, summing elements, or a single linear pass for a simple statistic.
O(N log N) — Merge sort (split in half, recurse, merge in linear time), the FFT (split even/odd indices, recurse, combine with butterflies), and the Taylor tree (pair time steps with shifts across log₂ N_t layers) — all share this "divide + linear combine" skeleton.
O(N²) — Brute-force de-Doppler when the number of Doppler trials N_d is proportional to the number of time samples N_t, and you effectively score every pair (trial, sample). Nested loops over N are the canonical textbook example.
O(N³) — Naive matrix multiplication of two N×N matrices: each output entry is a dot product of length N, and there are N² entries.
O(2^N) — Brute-force subset enumeration: 2^N subsets of an N-element set; useful only for tiny N in production systems.

Space complexity

Space complexity measures extra memory (beyond storing the input) as a function of N. An in-place sort may use O(1) extra space; merge sort typically uses O(N) auxiliary arrays; recursion depth can add O(log N) stack frames for balanced divide-and-conquer.

Amortized complexity

Some structures have occasional expensive operations but cheap operations most of the time. Amortized analysis averages cost over a sequence of operations. Classic example: dynamic arrays that double capacity when full — individual appends are usually O(1), but rare reallocations are O(N); over many appends, the average is still O(1) per append.

Why this matters for SETI

SETI pipelines see enormous effective N: millions of frequency channels, long time series, thousands of files. The gap between O(N²) and O(N log N) is not a constant tweak — it is often the difference between finishing in seconds versus never finishing on the same machine. MitraSETI's move from naive de-Doppler to a Taylor-tree-style construction is fundamentally a complexity-class story.

2. Divide and Conquer

Strategy

Divide and conquer means:

Divide the instance into smaller subproblems (often halves).
Conquer by solving subproblems recursively (or directly when tiny).
Combine partial solutions with a merge step whose cost is usually linear in the current problem size.

When the combine step is O(N) and you split into two problems of size N/2, you get recurrences of the form:

T(N) = 2 · T(N/2) + O(N)

Figure 7.2 — Divide-and-conquer recursion tree: each level does O(N) total work across all subproblems; log₂ N levels yield O(N log N).

Informal solution

Unrolling the recurrence makes the log N factor visible. Suppose for simplicity N is a power of two and:

T(N) = 2 · T(N/2) + cN

Expand once:

T(N) = 2(2 · T(N/4) + cN/2) + cN = 4 · T(N/4) + 2cN

After j expansions:

T(N) = 2^j · T(N/2^j) + j · cN

Stop when N/2^j = 1, i.e. j = log₂ N. Then 2^j T(1) = O(N) for constant base case T(1), and the second term is c·N·log₂ N. Hence:

T(N) = O(N log N)

At each of about log₂ N levels, you do O(N) work across the whole array (merging, butterfly combines, etc.) — same total.

This is the same skeleton behind merge sort, Cooley–Tukey FFT, and the Taylor tree.

Classic examples

Merge sort: split array in half, sort halves, merge two sorted lists in O(N).
Binary search: not always classified as full divide-and-conquer with combine, but the halving idea is the same mental model.
FFT: split the DFT of length N into two DFTs of length N/2 on even and odd samples, then combine with twiddle factors — butterfly operations.
Taylor tree: pair adjacent time steps, apply frequency shifts that encode drift hypotheses, and stack layers until one root represents the full integration; each layer touches the data in a structured way analogous to FFT stages.

Master theorem (informal)

✦

Key Concept — The Master Theorem

For recurrences of the form T(N) = a · T(N/b) + O(N^c), where a ≥ 1, b > 1, define d = log_b a (the "leaf exponent"). Three qualitative regimes:

Leaf-heavy (d > c): cost is dominated by the bottom of the recursion tree — total O(N^d).
Balanced (d = c): each level contributes comparably; there are O(log N) levels with O(N^c) work each — total O(N^c log N). This is merge sort and Cooley–Tukey: a=2, b=2, c=1 → d=1=c.
Root-heavy (d < c): the top-level combine dominates — total O(N^c).

You do not need to memorize cases for MitraSETI; you need the intuition: logarithmic depth of recursion × linear work per level → O(N log N) whenever you are in the balanced regime.

Taylor tree in this language

The Taylor tree splits N_t time steps across log₂ N_t layers (for padded power-of-two sizes). At each layer, work is proportional to something like N_padded × N_f (padded time extent × frequency bins), with independent groups that can be parallelized. So total work scales as O(N_padded · N_f · log N_t) — the log factor is the tree depth, the linear factors are the per-layer data volume. Implementation lives in spirit alongside dedoppler.rs in the MitraSETI codebase.

3. The Fast Fourier Transform (FFT)

What it computes

The Discrete Fourier Transform (DFT) expresses a length-N sequence {x_n} as coefficients of complex exponentials at frequencies k = 0, …, N−1:

X_k = Σ_n=0^N−1 x_n · e^{−2πi·k·n/N} (DFT)

This is the time domain → frequency domain map that underlies spectral analysis.

Naive DFT: O(N²)

Computing each X_k is a dot product of length N; there are N values of k, hence O(N²) operations.

FFT (Cooley–Tukey, 1965): O(N log N)

The FFT exploits algebraic structure: symmetries and periodicity of roots of unity. By recursively splitting into smaller DFTs (even/odd decimation) and recombining, the cost drops to O(N log N).

Butterfly diagram

Figure 7.3 — FFT butterfly diagram for N=8: three stages of pair-wise combines reduce O(N²) to O(N log N). The Taylor tree mirrors this staged architecture.

FFT implementations are often drawn as butterfly graphs: two inputs, twiddle factors, two outputs. Each stage mixes pairs that are N/2^k apart at stage k, which is exactly the "which indices get combined this round?" pattern you see in radix-2 FFT flowcharts. The Taylor tree uses a similar staged pairing of data with phase/frequency shifts that implement drift compensation in the time–frequency plane — not identical twiddle factors to every FFT variant, but the same computational architecture: logarithmic depth, local combine operations, massive potential parallelism.

When you read FFT code and Taylor-tree code side by side, look for: bit-reversal or stride doubling, in-place vs out-of-place buffers, and independent butterflies that rayon can fan out across.

Spectrograms and STFT

A spectrogram is built by windowing the time series into segments and applying a DFT/FFT per window — the Short-Time Fourier Transform (STFT). So FFT cost appears repeatedly across time windows; efficient FFT libraries are central to real-time-ish processing.

MitraSETI relevance

MitraSETI uses FFT-based ideas for periodicity and spectral structure; the Taylor tree is the de-Doppler workhorse whose structure mirrors FFT closely enough that understanding one deeply helps you understand the other.

4. Parallel Computing

Why parallelism matters

Single-thread clock scaling has slowed; chips expose many cores and wide vectors. For throughput-bound pipelines (SETI often is), using cores is not optional — it is how you turn a week into a day.

Types of parallelism

Data parallelism: same operation on disjoint chunks (e.g., different frequency channels or independent Taylor-tree groups).
Task parallelism: different stages or pipelines overlap (I/O vs compute vs ML).
SIMD (Single Instruction, Multiple Data): CPU vector instructions (AVX, NEON) apply one instruction to a lane of values — critical for numeric kernels.

Amdahl's Law

✦

Key Concept — Amdahl's Law

If fraction s of work is strictly sequential and fraction (1−s) is perfectly parallel across P processors, ideal speedup is:

Speedup(P) ≤ 1 / (s + (1−s)/P)

Numeric anchor: with s=0.1 and P=8, the denominator is 0.1 + 0.9/8 ≈ 0.2125, so speedup ≲ 4.7× — far below 8×. Let P→∞: speedup → 1/s = 10×. Bottlenecks matter enormously.

Gustafson's Law

If the problem size grows with available parallelism (more cores → process more data), the effective sequential fraction can shrink in practice — throughput scales better than a fixed tiny problem would suggest. SETI often has more data to process when you add machines; Gustafson's perspective matches weak scaling intuition.

Contrast in one sentence: Amdahl warns that fixed workloads hit a sequential wall; Gustafson notes that scaled workloads can keep parallel chunks large so utilization stays high. Good pipeline design reduces s (parallel readers, batched Rust kernels) and grows work per run when cores appear.

MitraSETI: Rust and rayon

rayon is a Rust crate for data-parallel iterators. Calling into_par_iter() turns a sequential iterator into one that work-steals across a thread pool rayon manages for you.

In the Taylor tree, many groups at a given layer are independent → embarrassingly parallel structure → near-linear speedup on ~8 cores for tree construction is plausible when the parallel fraction is high and memory bandwidth does not saturate.

5. Hash Tables and Lookups

Idealized behavior

A hash table maps a key to a bucket via a hash function h(k) → {0, …, M−1}. Average-case lookup/insert is O(1) when buckets stay balanced.

Collisions

Real hashes collide. Chaining stores linked lists in buckets; open addressing probes alternative slots. Worst-case degrades toward O(N) if everything lands in one bucket.

Known RFI database

Conceptually, a known RFI store wants O(1) average per-frequency checks: "is this bin on a bad list?" A hash map achieves that when load factor is controlled and hash quality spreads keys. In MitraSETI, when the list is tiny (on the order of tens of entries — e.g. linear scan over ~27), constant-factor overhead of hashing (hash computation, pointer chasing, cache misses on bucket chains) can exceed a tight loop over a contiguous small table. So O(N) with very small N beats O(1) with large constants — a reminder that asymptotics and engineering must agree.

Rule of thumb: below a few dozen to a few hundred keys, sorted vector + binary search or even linear scan often wins on real CPUs; above thousands, hashing or trees pull ahead. SETI's per-channel hot path should stay branch-predictable and cache-hot.

6. Density Estimation and Clustering Algorithms

k-means

Complexity is often quoted as O(N · k · iterations) for N points, k centers: assign each point to nearest center, update centers, repeat. Fast, but assumes roughly spherical clusters of similar size — often wrong for structured RFI trails and elongated features.

DBSCAN

With spatial indexing (e.g., ball tree / k-d tree), neighborhood queries can be O(log N) amortized, yielding roughly O(N log N) overall in favorable regimes. DBSCAN finds arbitrary-shaped clusters and labels noise — useful when "cluster" ≠ "Gaussian blob".

HDBSCAN

HDBSCAN extends DBSCAN's sensitivity to ε by constructing a full hierarchy of clusterings at different density levels, then extracting stable clusters. Internally, it leans on mutual reachability distances and minimum spanning trees on a weighted core-distance graph. Worst-case O(N²) stories arise when naive all-pairs work or dense graphs dominate; optimized code paths use spatial indices and Prim-style MST construction on k-nearest-neighbor graphs to approach O(N log N) in many practical point clouds. It is heavier than k-means but more faithful to varying density and noise — exactly what you want when RFI and astrophysical structure do not look like spherical Gaussian blobs.

Minimum spanning tree (MST)

Some clustering pipelines use MST ideas. Kruskal: sort edges, O(E log E); Prim with naive structures O(V²) (better with heaps on sparse graphs). HDBSCAN's internal story connects to density and spanning structures on graphs derived from distances.

MitraSETI usage

HDBSCAN runs after de-Doppler on a reduced set of detections (often hundreds, not millions). The heavy lifting is upstream; clustering cost is manageable because N is small at that stage — see pipeline.py integration in the project.

7. Neural Network Computation

Forward pass

A layer is mostly affine maps (matrix multiply + bias) and nonlinear activations. Deep networks stack these; depth multiplies sequential dependency (harder to parallelize across layers, easy within large matmuls).

Convolutional layers

A 2D conv layer with kernel K×K, C_in input channels, C_out output channels, on feature map H×W, has very roughly:

O(K² · C_in · C_out · H · W) multiply-adds

Im2col and Winograd change constants, not the scaling intuition.

Self-attention

For sequence length N and model dimension d, attention involves pairwise interactions across tokens — often O(N² · d) for standard attention (optimizations exist for long sequences). That quadratic in N is why long contexts are expensive.

Backpropagation

Training uses reverse-mode automatic differentiation (backprop): one forward pass stores activations, then a backward pass applies the chain rule to propagate gradients from loss to parameters. Reverse mode is efficient when many inputs feed one scalar loss — typical for neural nets — because cost is roughly proportional to one forward + one backward, not input dimension times forward passes (as forward-mode autodiff would be). Inference-only paths (MitraSETI's classifier at runtime) omit backward graphs and optimizer state — lower memory and lower latency than training.

GPUs

GPUs expose thousands of simple cores suited to large matmuls and convolutions. MitraSETI's note for practitioners: inference is CPU-only (no GPU required), improving portability and deployment on modest hardware — at the cost of throughput versus a large GPU.

8. Statistical Estimation

Median

Sort-based median: O(N log N). Quickselect (randomized pivoting) achieves expected O(N) for a single order statistic.

MAD (Median Absolute Deviation)

MAD = median_i( |x_i − median(x)| )

Two medians → often O(N log N) with sorting, or O(N) expected with selection tricks.

Kurtosis

A single pass (or two passes for numerically stable moments) can yield kurtosis in O(N) per channel/window.

SNR estimation

Per-channel signal-to-noise estimates from running statistics are typically O(N) over the samples contributing to each estimate.

Why robust estimators for RFI?

Mean and standard deviation are sensitive to outliers: a few strong RFI spikes pull the mean and inflate variance, widening thresholds and masking faint carriers — or conversely tightening thresholds incorrectly after bad preprocessing. Median and MAD (often scaled to mimic Gaussian σ) resist a bounded fraction of arbitrary contamination under formal breakdown-point analyses. They underpin robust thresholds in spectral kurtosis-style reasoning and related guards — see the SK filter discussion elsewhere in the series.

Complexity note: robustness costs more than a single mean pass (median/MAD), but on per-channel windows that cost is still linear or linearithmic in window length — negligible next to FFTs and tree construction at survey scale.

9. Memory Hierarchy and Cache Effects

Orders of magnitude (ballpark)

Figure 7.4 — Memory hierarchy: each level trades size for speed. Taylor tree buffers should aim to fit in L3 for best performance.

Level	Size (order)	Latency (order)
L1	~32 KB	~1 ns
L2	~256 KB	~4 ns
L3	~8 MB	~10 ns
RAM	GB scale	~100 ns
SSD / disk	TB scale	~µs–ms (10,000 ns+)

Why asymptotics lie sometimes

An O(N log N) algorithm with random memory access can lose to an O(N²) algorithm with sequential, cache-friendly access on medium N: constants from cache misses dominate.

Taylor tree access pattern

The tree's staged, structured traversal tends toward predictable access over contiguous buffers — good for prefetch and cache lines. Random probing of a huge hash-heavy graph (not MitraSETI's main inner loop, but a contrast) hurts.

Sizing against L3

A rough footprint for the tree array on the order of N_padded × N_chans × 4 bytes (per precision assumptions) should aim to fit in L3 for typical configurations — when it spills to RAM bandwidth-bound behavior shows up. Engineering then balances padding, channels per batch, and parallelism to stay bandwidth-efficient.

10. Rust vs Python — Performance Deep Dive

Figure 7.5 — MitraSETI's split architecture: Python for orchestration and ML; Rust for numeric hotspots, bridged by PyO3.

Python

Interpreted execution (CPython bytecode) with per-operation overhead.
GIL (Global Interpreter Lock): limits true parallel threads for CPU-bound Python code — often you multiprocess instead.
Dynamic typing: flexible, but hard for the runtime to optimize as aggressively as a static compiler.

Rust

Ahead-of-time compilation to native code.
No GC: memory management via ownership and borrowing, enabling predictable latency.
Zero-cost abstractions: iterators often compile to loops as tight as hand-written C.
Fearless concurrency: type system rules out data races (though deadlocks remain your responsibility).

PyO3 and maturin

PyO3 exposes Rust functions/types to Python. maturin builds Rust into a Python wheel (.whl) so pip install brings native extensions. MitraSETI's split stack is Python orchestrates, Rust computes, PyO3 bridges.

When to use which

Rust: hot loops — de-Doppler / Taylor tree construction, tight numeric kernels.
Python: orchestration, file I/O, ML ecosystems (PyTorch/TensorFlow) where research velocity and bindings matter, visualization.

Typical speedups for numeric inner loops are often cited in the 10×–100× range versus pure Python — exact factors depend on allocation, vectorization, and whether NumPy already calls optimized BLAS.

MitraSETI architecture (one paragraph): Python parses configs, schedules work, calls into PyTorch (or similar) for model inference, and handles plots; Rust (via PyO3 / wheels built with maturin) owns de-Doppler and other numeric hotspots. That split keeps research velocity where dynamic languages shine and deterministic throughput where the borrow checker helps. When you profile, expect time to concentrate in Rust for tree-heavy runs and in Python + ML for classification-heavy runs — optimize where the flame graph points.

11. How These Concepts Connect in MitraSETI

The pipeline is a composition of tools chosen for scaling, robustness, and deployability:

Concept	Where it shows up
Divide and conquer	Taylor tree construction (`dedoppler.rs`)
O(N log N)	Taylor tree; FFT-based periodicity analyses
Parallel computing	rayon during independent tree groups / data-parallel stages
Density clustering	HDBSCAN in `pipeline.py` on post–de-Doppler detections
FFT / STFT	Spectral views, periodicity-related processing
Robust statistics	SK filter: median, MAD-style reasoning for RFI
Neural networks	CNN + Transformer classifier scoring candidates
Cache-friendly access	Sequential tree buffers vs random scatter-gather patterns
Rust + Python	Native extensions for compute; Python for glue and ML

Closing

If you take one lesson into code review: measure both asymptotics and constants. SETI-scale data makes exponents non-negotiable; cache behavior and parallel fractions decide whether a theoretically optimal algorithm wins on your machine. The MitraSETI stack reflects that discipline — log-linear core search, parallel tree stages, robust statistics where RFI lurks, and pragmatic language boundaries where they matter most.

Next in the series: 08 — MitraSETI Pipeline Walkthrough (end-to-end path from raw data to ranked candidates).

Try it in the Cloud

All these algorithms run at scale in MitraSETI Cloud. Process files up to 1 GB with zero infrastructure setup.

Open MitraSETI Cloud →