fix(vibe): warm-up runs a real text forward (absorb torch lazy-init)

Loading weights wasn't enough — the first forward still cost ~30s on torch's one-time graph/thread init. Warm now runs a throwaway _embed_texts() so the first USER query is ~1.5s, not 30s.

fix(vibe): warm-up runs a real text forward (absorb torch lazy-init)
Loading weights wasn't enough — the first forward still cost ~30s on torch's one-time graph/thread init. Warm now runs a throwaway _embed_texts() so the first USER query is ~1.5s, not 30s.
cec9dec3 · PLN (Algolia) · e3b1fecc · cec9dec3
Commit cec9dec3 authored Jun 07, 2026 by PLN (Algolia)
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

serve.py armada/serve.py +3 -2

No files found.
--- a/armada/serve.py
+++ b/armada/serve.py
@@ -179,8 +179,9 @@ if __name__ == "__main__":
        def _warm():
            try:
                print("  vibe  : warming CLAP…", flush=True)
-                _vibe_load()
-                print("  vibe  : ready (/vibe, /similar)", flush=True)
+                V = _vibe_load()
+                V["S"]._embed_texts(["warm up"])   # exercise the text tower (torch
+                print("  vibe  : ready (/vibe, /similar)", flush=True)  # lazy-inits once)
            except Exception as e:
                print(f"  vibe  : disabled — {e}", flush=True)
        threading.Thread(target=_warm, daemon=True).start()