OpenAI Embeddings — opt-in POC
A pedagogical playground for dense semantic search over your codebase, using OpenAI's text-embedding-3-small model.
Disabled by default at two independent layers: compile-time (cargo feature flag) and runtime (ig emb on/off, since v1.14.2). The published binary contains zero OpenAI client code unless you build with --features embed-poc; even then, no network call fires until you flip ig emb on.
Why opt-in?
- Cost. Indexing 3 k files ≈ $0.05; a runaway re-index could rack up real money.
- Network. Each search is one OpenAI round-trip (~200–800 ms). The trigram daemon answers in < 1 ms.
- Recall is similar at this scale. Well-tuned
ig --semantic --top 10(PMI) catches most queries dense embeddings catch on a 3 k-file repo. Embeddings start to dominate at 50 k+ files / multi-language polyglot repos.
The fallback for users without an API key is the regular trigram path — sub-ms, no network, no cost.
Two-layer gating
| Layer | Mechanism | Controls | Default |
|---|---|---|---|
| Compile-time | cargo build --features embed-poc | Whether the subcommand is in the binary at all | absent |
| Runtime (v1.14.2+) | ig emb on / off | Whether it executes when present | disabled |
Both layers are independent. The runtime toggle is always-available — even on a binary built without --features embed-poc, you can call ig emb on and the flag is persisted; the subcommand just isn't there until you rebuild.
Runtime toggle — ig emb
# Inspect the current state (default: disabled)
ig emb status
# Enable — accepts: on, true, 1, yes, y, enable, enabled
ig emb on
# Disable — accepts: off, false, 0, no, n, disable, disabled
ig emb off
State persisted at ~/.config/ig/embed.toml:
# Runtime toggle for ig emb — overridable with ig emb on/off
enabled = false
When the runtime toggle is off and someone calls ig embed-poc <op>:
$ ig embed-poc hello "test"
Error: embeddings are disabled.
Enable with: ig emb on
(or build a binary without the embed-poc feature to remove the subcommand entirely.) Fail-closed: if the config file is unreadable or malformed, embeddings stay off.
Quickstart
# 1. Build with the feature on (compile-time opt-in)
cargo build --release --features embed-poc
cp target/release/ig ~/.local/bin/ig
# 2. Drop your OpenAI key in either ~/.config/ig/config.toml OR a project .env
mkdir -p ~/.config/ig && cat > ~/.config/ig/config.toml <<EOF
[providers.openai]
api_key = "sk-proj-XXXX"
default_model = "text-embedding-3-small"
EOF
chmod 600 ~/.config/ig/config.toml
# 3. Flip the runtime toggle ON (default is OFF, even when feature is built in)
ig emb on
# 4. Smoke-test — Phase 1: 1 string → 1 vector → console
ig embed-poc hello "function cancelSubscription(userId)"
# 5. Index a directory — Phase 2: chunk + embed + JSON store
ig embed-poc index ./src
# 6. Search — Phase 2: top-N cosine
ig embed-poc search "function that cancels a Stripe subscription"
# 7. (optional) — Phase 3: tiny_http JSON server + React SPA
ig embed-poc serve --port 7877 --ui ui/dist
# 8. When you're done — flip it back off
ig emb off The pipeline
ig embed-poc index ./src │ ├─▶ walk files (respects .ig/ excludes) ├─▶ chunk each file: 40 lines, 5-line overlap ├─▶ batch-embed via OpenAI /v1/embeddings (size 100, ureq sync) │ model = text-embedding-3-small │ dim = 1536, L2-normalised → cosine = dot product ├─▶ persist as JSON ──▶ .ig/poc-embeddings.json │ (~30 MB on a 3 k-file repo, deliberately readable: cat | jq) └─▶ console: "Embedded 768 chunks · $0.0046 · 11.3 s" ig embed-poc search "<query>" │ ├─▶ embed the query (1 OpenAI call, ~$0.0000002) ├─▶ rayon par_iter cosine over the JSON store │ ~3 ms on 247 chunks · ~600 ms on 50 k chunks ├─▶ sort descending, take top-N └─▶ stdout: file:lines · score · 5-line preview
What an embedding actually is
Phase 1 (ig embed-poc hello <text>) exists for one reason: to make the abstract concept tangible. Run it once and you'll see exactly what comes back from OpenAI.
$ ig embed-poc hello "function cancelSubscription(userId) { ... }"
Provider : openai
Model : text-embedding-3-small
Input tokens : 12
Cost : $0.00000024
Vector dim : 1536
First 10 : [-0.0123, 0.0456, -0.0211, 0.0089, ..., 0.0334]
L2 norm : 1.0000 ← OpenAI vectors are L2-normalized
→ cosine(a, b) = dot(a, b)
→ no division step needed A 1 536-dimensional unit vector. Two semantically similar strings produce vectors with a high dot product (typically 0.4–0.7 for code/code), unrelated ones cluster around 0.1–0.2. That's the whole magic.
The store format — deliberately readable
The Phase-2 store is plain JSON, not bincode. You can cat .ig/poc-embeddings.json | jq '.chunks[0]' and inspect a single chunk + its full vector. This is intentional: when (later) we'd switch to bincode + HNSW, you'd already know exactly what got replaced.
{
"version": "poc-1",
"model": "text-embedding-3-small",
"dim": 1536,
"provider": "openai",
"total_tokens": 156783,
"total_cost_usd": 0.00313566,
"chunks": [
{
"id": 0,
"file": "src/embed_poc/config.rs",
"start_line": 1,
"end_line": 40,
"tokens": 312,
"embedding": [0.0123, -0.0456, /* … 1534 more … */, 0.0089]
},
/* … */
]
} Search math — brute-force cosine
No HNSW, no FAISS, no PCA. The whole search is a par_iter dot product across the chunk array:
pub fn rank<'a>(query: &[f32], store: &'a Store, top_n: usize)
-> Vec<(f32, &'a Chunk)>
{
let mut scored: Vec<_> = store.chunks
.par_iter() // rayon
.map(|c| (dot(query, &c.embedding), c)) // L2-normalised → cosine = dot
.collect();
scored.par_sort_unstable_by(|a, b|
b.0.partial_cmp(&a.0).unwrap()
);
scored.truncate(top_n);
scored
} Latency on a Mac M4 Max:
- 247 chunks × 1 536 dim: ~3 ms cosine + ~250 ms OpenAI = ~250 ms total.
- 50 k chunks × 1 536 dim: ~600 ms cosine + ~250 ms OpenAI = ~850 ms total.
HNSW becomes worth it past ~100 k chunks. Below that, brute force is simpler, more debuggable, and the OpenAI round-trip dominates anyway.
Cost guard
Pricing as of April 2026 (always re-confirm in the OpenAI console):
| Model | Dim | $/M tokens | Index 3k files |
|---|---|---|---|
| text-embedding-3-small | 1 536 | ~$0.02 | ~$0.05 |
| text-embedding-3-large | 3 072 | ~$0.13 | ~$0.40 |
Set hard limits before generating your key.
OpenAI Settings → Billing → Usage limits. For this POC: Soft $2/mo + Hard $5/mo. The API will return errors past the hard limit. $5 covers ~100 full-repo re-indexes on a 3 k-file project.
Security — how the key never reaches the repo
.envis gitignored. Verified in.gitignoreat the repo root.- Pre-commit hook (
.githooks/pre-commit) blocks any staged content matchingsk-[A-Za-z0-9]{20,}or a non-placeholderOPENAI_API_KEY=. - Project key only — the recommended OpenAI key is a Project key (
sk-proj-…) with permissions restricted to Models: Read + Model capabilities: Write. It cannot generate text, cannot access your ChatGPT history. - If in doubt, revoke. A leaked key costs $0 to revoke + regenerate from the OpenAI dashboard.
Phase 3 — the React SPA
ig embed-poc serve starts a 3-route tiny_http JSON server (sync, blocking, ~200 LoC) plus an optional static SPA. Bound to 127.0.0.1 only — no auth, no TLS, single-user local POC.
# Backend (with the feature on)
ig embed-poc serve --port 7877 --ui ui/dist
# Routes
GET /api/status → { ready, model, dim, chunks, total_tokens, total_cost_usd }
POST /api/search { query } → { hits, openai_ms, cosine_ms, query_cost_usd }
GET /api/chunks?limit=N → { total, returned, dim, chunks: [...] }
The SPA (under ui/ in the repo, generated with shadcn@latest + Vite) ships three routes:
/— Home. Status cards (chunks, dim, tokens, cost), provider/model, store path./search— Semantic search. NL input + top-N spinbutton + ranked results with scores. Latency breakdown (OpenAI ms vs cosine ms vs query cost)./inspect— Embedding heatmap. Browse the indexed chunks; click any chunk to render its 1 536-D vector as a 32×48 heatmap (red = positive, blue = negative, black = zero, scaled symmetrically around the max abs value). This is what makes embeddings tangible: two semantically similar functions have visibly similar heatmaps.
Phase 0 — getting an API key safely
- Create an OpenAI account at platform.openai.com (carte requise — l'API n'est pas sur l'abonnement ChatGPT).
- Set usage limits FIRST (before generating a key):
Settings → Billing → Usage limits→ Soft $2 + Hard $5. - Generate a Project key (not a User key) at
Dashboard → API keys. Permissions:Restricted→ Models: Read + Model capabilities: Write only. - Copy
sk-proj-…immediately (never reshown). - Persist locally with restrictive perms:
mkdir -p ~/.config/ig && chmod 700 ~/.config/ig cat > ~/.config/ig/config.toml <<EOF [providers.openai] api_key = "sk-proj-XXXX" default_model = "text-embedding-3-small" EOF chmod 600 ~/.config/ig/config.toml - Smoke-test:
curl https://api.openai.com/v1/models \ -H "Authorization: Bearer $(grep api_key ~/.config/ig/config.toml | cut -d'"' -f2)" \ | head -5
What this POC does not do
- ❌ No HNSW or vector index optimisation — brute-force cosine only.
- ❌ No multi-provider — OpenAI only (no Voyage, no Cohere, no local model).
- ❌ No hybrid search — no lexical + vector RRF blending. Use
ig --semantic --top Nfor hybrid. - ❌ No auth, no TLS —
127.0.0.1only. - ❌ No automatic cost guard — just an estimate printed at index time.
- ❌ No async runtime — sync
ureq+tiny_httpblocking I/O.
This is by design. It's a teaching artefact: read the source, see the cost, decide if industrial-grade embedding search is worth building.
When to use embeddings vs --semantic vs trigram
Trigram (ig "pat")
- Sub-ms, no network, no cost
- Exact / regex matching
- Best for: known-token searches, refactor queries, structural patterns
- Default.
--semantic (PMI)
- ~5 ms, no network, no cost
- Synonyms learned from your repo at index time
- Best for: queries where you know the concept but not the project's word for it
- Combine with
--top Nfor BM25 rerank.
Embeddings (POC)
- ~250–800 ms, OpenAI round-trip, $0.0000002 per query
- True natural-language understanding (cross-token, cross-language)
- Best for: NL queries with no shared token ("function that cancels a Stripe subscription")
- Opt-in only.
Source layout
src/embed_poc/
├── mod.rs # entry: run_hello, run_index, run_inspect, run_search
├── config.rs # parse ~/.config/ig/config.toml + .env
├── openai.rs # POST /v1/embeddings via ureq, batched
├── chunk.rs # 40-line chunker, 5-line overlap
├── store.rs # JSON (de)serialisation of .ig/poc-embeddings.json
├── search.rs # rayon par_iter cosine
└── server.rs # tiny_http: /api/status, /api/search, /api/chunks
ui/ # generated via `bunx shadcn@latest init`
├── src/routes/Home.tsx # status + nav
├── src/routes/SearchPage.tsx # NL search + ranked hits
├── src/routes/Inspect.tsx # 32×48 embedding heatmap
└── src/lib/api.ts # typed fetch helpers