Architecture
The inner workings of instant-grep โ from regex to result in under a millisecond. Every technique honed to perfection.
๐ Rasengan Pipeline
The complete query pipeline โ six stages of pure concentrated energy.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โฆ RASENGAN PIPELINE โ FORBIDDEN SCROLL OF SEARCH โฆ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
User Input
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 1 ยท regex-syntax Extractor โ query/extract.rs
โ "async fn" โ literal fragments โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 2 ยท Sparse N-gram Generation โ index/ngram.rs
โ fragments โ variable-length n-grams โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 3 ยท Shadow Clone Selection โ index/ngram.rs
โ all n-grams โ minimal covering set โ build_covering_ngrams()
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 4 ยท Hash Table Lookup โ index/reader.rs
โ n-grams โ posting list offsets (mmap) โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 5 ยท Posting List Intersection โ index/reader.rs
โ sorted file ID lists โ candidates โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 6 ยท Parallel Regex Verification โ Rayon threadpool
โ candidates โ verified matches โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โ Results (p50: 0.9ms) ๐ Module Structure
src/index/ngram.rs Core Algorithm
Sparse n-gram generation and covering set selection
src/index/writer.rs Index Build Pipeline
Walks filesystem, builds and serializes the index
src/index/reader.rs Index Query
mmap-backed hash table + posting list intersection
src/query/extract.rs Regex โ N-gram Conversion
Parses regex AST to extract searchable literal fragments
src/daemon.rs Sage Mode Daemon
Unix socket server for sub-ms repeated queries
src/main.rs CLI Entry Point
Command dispatch via clap โ routes to index, search, status, watch, daemon, query
๐พ On-disk Format (.ig/ directory)
Three binary files, all memory-mapped for zero-copy access.
INDEX_VERSION must be bumped when the format changes.
.ig/
โโโ lexicon.bin โ Hash table: u64(n-gram hash) โ u32(posting offset) + u32(count)
โโโ postings.bin โ Sorted u32[] file IDs per n-gram (delta-encoded)
โโโ metadata.bin โ u32 version + u64[] mtime + []string file paths (null-separated) ๐ Rasengan Algorithm โ Sparse N-grams
Traditional trigram search uses every contiguous 3-character substring โ expensive and produces many false positives. The Rasengan Algorithm uses variable-length n-grams that adapt to the string being indexed, concentrating energy where it matters.
How hash_bigram() works:
Each n-gram is hashed by combining its characters using a polynomial rolling hash with a prime base.
The hash fits in a u64 โ enabling the open-addressing hash table lookup
to be a single memory access when the index is warm in the OS page cache (or the daemon's mmap).
๐ฅ Shadow Clone Selection โ Covering Algorithm
From all possible n-grams for a query, we need only the minimal covering set โ the smallest collection of n-grams that together guarantee we find all matching files. Like selecting the perfect shadow clones: enough to cover every angle, no redundancy.
fn build_covering_ngrams(ngrams: Vec<Ngram>) โ Vec<Ngram> {
sort ngrams by (position, length desc)
greedily select ngrams that cover uncovered positions
return minimal subset that covers the full query string
} ๐ธ Sage Mode โ Daemon Architecture
In Sage Mode, the ninja draws power from nature without moving โ the daemon keeps the index mmap'd in process memory, eliminating OS page cache warmup on every query.
- โข Cold start: open files, read mmap headers
- โข OS page faults on first access
- โข ~5-15ms for cold index
- โข ~0.9ms when OS cache is warm
- โข Index already in process memory
- โข Unix socket: no TCP overhead
- โข Sub-0.5ms consistent latency
- โข Perfect for AI agent repeated queries
# Start Sage Mode
ig daemon . # runs in background, socket at .ig/daemon.sock
# Query via socket (0.3ms)
ig query "async fn" .