OpenAI Embeddings — opt-in POC

A pedagogical playground for dense semantic search over your codebase, using OpenAI's text-embedding-3-small model. Disabled by default at two independent layers: compile-time (cargo feature flag) and runtime (ig emb on/off, since v1.14.2). The published binary contains zero OpenAI client code unless you build with --features embed-poc; even then, no network call fires until you flip ig emb on.

Why opt-in?

  • Cost. Indexing 3 k files ≈ $0.05; a runaway re-index could rack up real money.
  • Network. Each search is one OpenAI round-trip (~200–800 ms). The trigram daemon answers in < 1 ms.
  • Recall is similar at this scale. Well-tuned ig --semantic --top 10 (PMI) catches most queries dense embeddings catch on a 3 k-file repo. Embeddings start to dominate at 50 k+ files / multi-language polyglot repos.

The fallback for users without an API key is the regular trigram path — sub-ms, no network, no cost.

Two-layer gating

Layer Mechanism Controls Default
Compile-time cargo build --features embed-poc Whether the subcommand is in the binary at all absent
Runtime (v1.14.2+) ig emb on / off Whether it executes when present disabled

Both layers are independent. The runtime toggle is always-available — even on a binary built without --features embed-poc, you can call ig emb on and the flag is persisted; the subcommand just isn't there until you rebuild.

Runtime toggle — ig emb

# Inspect the current state (default: disabled)
ig emb status

# Enable — accepts: on, true, 1, yes, y, enable, enabled
ig emb on

# Disable — accepts: off, false, 0, no, n, disable, disabled
ig emb off

State persisted at ~/.config/ig/embed.toml:

# Runtime toggle for ig emb — overridable with ig emb on/off
enabled = false

When the runtime toggle is off and someone calls ig embed-poc <op>:

$ ig embed-poc hello "test"
Error: embeddings are disabled.
Enable with:  ig emb on
(or build a binary without the embed-poc feature to remove the subcommand entirely.)

Fail-closed: if the config file is unreadable or malformed, embeddings stay off.

Quickstart

# 1. Build with the feature on (compile-time opt-in)
cargo build --release --features embed-poc
cp target/release/ig ~/.local/bin/ig

# 2. Drop your OpenAI key in either ~/.config/ig/config.toml OR a project .env
mkdir -p ~/.config/ig && cat > ~/.config/ig/config.toml <<EOF
[providers.openai]
api_key = "sk-proj-XXXX"
default_model = "text-embedding-3-small"
EOF
chmod 600 ~/.config/ig/config.toml

# 3. Flip the runtime toggle ON (default is OFF, even when feature is built in)
ig emb on

# 4. Smoke-test — Phase 1: 1 string → 1 vector → console
ig embed-poc hello "function cancelSubscription(userId)"

# 5. Index a directory — Phase 2: chunk + embed + JSON store
ig embed-poc index ./src

# 6. Search — Phase 2: top-N cosine
ig embed-poc search "function that cancels a Stripe subscription"

# 7. (optional) — Phase 3: tiny_http JSON server + React SPA
ig embed-poc serve --port 7877 --ui ui/dist

# 8. When you're done — flip it back off
ig emb off

The pipeline

ig embed-poc index ./src
   │
   ├─▶ walk files (respects .ig/ excludes)
   ├─▶ chunk each file: 40 lines, 5-line overlap
   ├─▶ batch-embed via OpenAI /v1/embeddings (size 100, ureq sync)
   │     model = text-embedding-3-small
   │     dim   = 1536, L2-normalised → cosine = dot product
   ├─▶ persist as JSON  ──▶  .ig/poc-embeddings.json
   │     (~30 MB on a 3 k-file repo, deliberately readable: cat | jq)
   └─▶ console:  "Embedded 768 chunks · $0.0046 · 11.3 s"

ig embed-poc search "<query>"
   │
   ├─▶ embed the query (1 OpenAI call, ~$0.0000002)
   ├─▶ rayon par_iter cosine over the JSON store
   │     ~3 ms on 247 chunks · ~600 ms on 50 k chunks
   ├─▶ sort descending, take top-N
   └─▶ stdout:  file:lines · score · 5-line preview

What an embedding actually is

Phase 1 (ig embed-poc hello <text>) exists for one reason: to make the abstract concept tangible. Run it once and you'll see exactly what comes back from OpenAI.

$ ig embed-poc hello "function cancelSubscription(userId) { ... }"

Provider     : openai
Model        : text-embedding-3-small
Input tokens : 12
Cost         : $0.00000024
Vector dim   : 1536
First 10     : [-0.0123, 0.0456, -0.0211, 0.0089, ..., 0.0334]
L2 norm      : 1.0000 OpenAI vectors are L2-normalized
 cosine(a, b) = dot(a, b)
 no division step needed

A 1 536-dimensional unit vector. Two semantically similar strings produce vectors with a high dot product (typically 0.4–0.7 for code/code), unrelated ones cluster around 0.1–0.2. That's the whole magic.

The store format — deliberately readable

The Phase-2 store is plain JSON, not bincode. You can cat .ig/poc-embeddings.json | jq '.chunks[0]' and inspect a single chunk + its full vector. This is intentional: when (later) we'd switch to bincode + HNSW, you'd already know exactly what got replaced.

{
  "version": "poc-1",
  "model": "text-embedding-3-small",
  "dim": 1536,
  "provider": "openai",
  "total_tokens": 156783,
  "total_cost_usd": 0.00313566,
  "chunks": [
    {
      "id": 0,
      "file": "src/embed_poc/config.rs",
      "start_line": 1,
      "end_line": 40,
      "tokens": 312,
      "embedding": [0.0123, -0.0456, /* … 1534 more … */, 0.0089]
    },
    /* … */
  ]
}

Search math — brute-force cosine

No HNSW, no FAISS, no PCA. The whole search is a par_iter dot product across the chunk array:

pub fn rank<'a>(query: &[f32], store: &'a Store, top_n: usize)
    -> Vec<(f32, &'a Chunk)>
{
    let mut scored: Vec<_> = store.chunks
        .par_iter()                              // rayon
        .map(|c| (dot(query, &c.embedding), c)) // L2-normalised → cosine = dot
        .collect();
    scored.par_sort_unstable_by(|a, b|
        b.0.partial_cmp(&a.0).unwrap()
    );
    scored.truncate(top_n);
    scored
}

Latency on a Mac M4 Max:

  • 247 chunks × 1 536 dim: ~3 ms cosine + ~250 ms OpenAI = ~250 ms total.
  • 50 k chunks × 1 536 dim: ~600 ms cosine + ~250 ms OpenAI = ~850 ms total.

HNSW becomes worth it past ~100 k chunks. Below that, brute force is simpler, more debuggable, and the OpenAI round-trip dominates anyway.

Cost guard

Pricing as of April 2026 (always re-confirm in the OpenAI console):

Model Dim $/M tokens Index 3k files
text-embedding-3-small 1 536 ~$0.02 ~$0.05
text-embedding-3-large 3 072 ~$0.13 ~$0.40

Set hard limits before generating your key.

OpenAI Settings → Billing → Usage limits. For this POC: Soft $2/mo + Hard $5/mo. The API will return errors past the hard limit. $5 covers ~100 full-repo re-indexes on a 3 k-file project.

Security — how the key never reaches the repo

  • .env is gitignored. Verified in .gitignore at the repo root.
  • Pre-commit hook (.githooks/pre-commit) blocks any staged content matching sk-[A-Za-z0-9]{20,} or a non-placeholder OPENAI_API_KEY=.
  • Project key only — the recommended OpenAI key is a Project key (sk-proj-…) with permissions restricted to Models: Read + Model capabilities: Write. It cannot generate text, cannot access your ChatGPT history.
  • If in doubt, revoke. A leaked key costs $0 to revoke + regenerate from the OpenAI dashboard.

Phase 3 — the React SPA

ig embed-poc serve starts a 3-route tiny_http JSON server (sync, blocking, ~200 LoC) plus an optional static SPA. Bound to 127.0.0.1 only — no auth, no TLS, single-user local POC.

# Backend (with the feature on)
ig embed-poc serve --port 7877 --ui ui/dist

# Routes
GET  /api/status { ready, model, dim, chunks, total_tokens, total_cost_usd }
POST /api/search   { query } { hits, openai_ms, cosine_ms, query_cost_usd }
GET  /api/chunks?limit=N { total, returned, dim, chunks: [...] }

The SPA (under ui/ in the repo, generated with shadcn@latest + Vite) ships three routes:

  • / — Home. Status cards (chunks, dim, tokens, cost), provider/model, store path.
  • /search — Semantic search. NL input + top-N spinbutton + ranked results with scores. Latency breakdown (OpenAI ms vs cosine ms vs query cost).
  • /inspect — Embedding heatmap. Browse the indexed chunks; click any chunk to render its 1 536-D vector as a 32×48 heatmap (red = positive, blue = negative, black = zero, scaled symmetrically around the max abs value). This is what makes embeddings tangible: two semantically similar functions have visibly similar heatmaps.

Phase 0 — getting an API key safely

  1. Create an OpenAI account at platform.openai.com (carte requise — l'API n'est pas sur l'abonnement ChatGPT).
  2. Set usage limits FIRST (before generating a key): Settings → Billing → Usage limits → Soft $2 + Hard $5.
  3. Generate a Project key (not a User key) at Dashboard → API keys. Permissions: RestrictedModels: Read + Model capabilities: Write only.
  4. Copy sk-proj-… immediately (never reshown).
  5. Persist locally with restrictive perms:
    mkdir -p ~/.config/ig && chmod 700 ~/.config/ig
    cat > ~/.config/ig/config.toml <<EOF
    [providers.openai]
    api_key = "sk-proj-XXXX"
    default_model = "text-embedding-3-small"
    EOF
    chmod 600 ~/.config/ig/config.toml
  6. Smoke-test:
    curl https://api.openai.com/v1/models \
      -H "Authorization: Bearer $(grep api_key ~/.config/ig/config.toml | cut -d'"' -f2)" \
      | head -5

What this POC does not do

  • ❌ No HNSW or vector index optimisation — brute-force cosine only.
  • ❌ No multi-provider — OpenAI only (no Voyage, no Cohere, no local model).
  • ❌ No hybrid search — no lexical + vector RRF blending. Use ig --semantic --top N for hybrid.
  • ❌ No auth, no TLS — 127.0.0.1 only.
  • ❌ No automatic cost guard — just an estimate printed at index time.
  • ❌ No async runtime — sync ureq + tiny_http blocking I/O.

This is by design. It's a teaching artefact: read the source, see the cost, decide if industrial-grade embedding search is worth building.

When to use embeddings vs --semantic vs trigram

Trigram (ig "pat")

  • Sub-ms, no network, no cost
  • Exact / regex matching
  • Best for: known-token searches, refactor queries, structural patterns
  • Default.

--semantic (PMI)

  • ~5 ms, no network, no cost
  • Synonyms learned from your repo at index time
  • Best for: queries where you know the concept but not the project's word for it
  • Combine with --top N for BM25 rerank.

Embeddings (POC)

  • ~250–800 ms, OpenAI round-trip, $0.0000002 per query
  • True natural-language understanding (cross-token, cross-language)
  • Best for: NL queries with no shared token ("function that cancels a Stripe subscription")
  • Opt-in only.

Source layout

src/embed_poc/
├── mod.rs       # entry: run_hello, run_index, run_inspect, run_search
├── config.rs    # parse ~/.config/ig/config.toml + .env
├── openai.rs    # POST /v1/embeddings via ureq, batched
├── chunk.rs     # 40-line chunker, 5-line overlap
├── store.rs     # JSON (de)serialisation of .ig/poc-embeddings.json
├── search.rs    # rayon par_iter cosine
└── server.rs    # tiny_http: /api/status, /api/search, /api/chunks

ui/                                # generated via `bunx shadcn@latest init`
├── src/routes/Home.tsx             # status + nav
├── src/routes/SearchPage.tsx       # NL search + ranked hits
├── src/routes/Inspect.tsx          # 32×48 embedding heatmap
└── src/lib/api.ts                  # typed fetch helpers

Next steps