feat(tfidf): index synths as sound sources; identify Take35 = Septième Armée

PLN: "samples and synths are the same class for fingerprinting." sample_tfidf now indexes synthdef names (SC synthdefs/ + SCLOrkSynths quark + SuperDirt builtins) alongside Dirt-Samples, tags each sound sample|synth, and persists kinds. vocab 871 sounds (293 samples + 63 synths used). septieme_armee signature now surfaces moogBass/FMRhodes1/bassWarsaw — but all common (no rare tell): its identity is the orbit-arrangement (the SNA riff), not a signature sound. L3 needs both signals. Take35 identified by blind ear-test as Septième Armée (Seven Nation Army cover, septieme_armee.tidal, 4:35), NOT the 38C3 "Pitbul Punk" the ±3d date-join guessed. Corroborated by orbit→sound map (d4 bassWarsaw = the riff bass, etc.). take_gig_map corrected; performance_notes logs the find + cover-license caveat.

feat(tfidf): index synths as sound sources; identify Take35 = Septième Armée
PLN: "samples and synths are the same class for fingerprinting." sample_tfidf now indexes synthdef names (SC synthdefs/ + SCLOrkSynths quark + SuperDirt builtins) alongside Dirt-Samples, tags each sound sample|synth, and persists kinds. vocab 871 sounds (293 samples + 63 synths used). septieme_armee signature now surfaces moogBass/FMRhodes1/bassWarsaw — but all common (no rare tell): its identity is the orbit-arrangement (the SNA riff), not a signature sound. L3 needs both signals. Take35 identified by blind ear-test as Septième Armée (Seven Nation Army cover, septieme_armee.tidal, 4:35), NOT the 38C3 "Pitbul Punk" the ±3d date-join guessed. Corroborated by orbit→sound map (d4 bassWarsaw = the riff bass, etc.). take_gig_map corrected; performance_notes logs the find + cover-license caveat.
ce494fda · PLN (Algolia) · 78f564ff · ce494fda · ce494fda · ce494fda
Commit ce494fda authored Jun 05, 2026 by PLN (Algolia)
5 changed files
--- a/armada/tide-table/models.py
+++ b/armada/tide-table/models.py
@@ -131,10 +131,11 @@ class SampleHit(BaseModel):
    sample: str
    score: float          # tf-idf
    df: int               # document frequency across corpus
+    kind: str = "?"       # 'sample' | 'synth' — same class for fingerprinting


 class TrackSignature(BaseModel):
-    n_samples: int
+    n_samples: int        # distinct sound sources (samples ∪ synths)
    tf: dict[str, int]
    top_tfidf: list[SampleHit] = Field(default_factory=list)

@@ -143,6 +144,9 @@ class TfidfReport(BaseModel):
    corpus: str
    n_docs: int
    vocab_size: int
+    n_samples: int = 0    # distinct samples used across corpus
+    n_synths: int = 0     # distinct synths used across corpus
    df: dict[str, int]
    idf: dict[str, float]
+    kinds: dict[str, str] = Field(default_factory=dict)  # sound -> sample|synth
    tracks: dict[str, TrackSignature]
--- a/armada/tide-table/performance_notes.md
+++ b/armada/tide-table/performance_notes.md
@@ -146,3 +146,28 @@ Boundaries (machine-readable, merged + conflict-resolved):
 - ⚠️ The old "Hamburg/Take87" PunkAChien was a **misidentification** (actually La Fin
  de l'Insouciance → Liquid Finale @ 39C3). Do NOT A/B against it; real second take =
  Take35 (38C3), plus Take36 (the 61:46 Toilet set) for an in-set version.
+
+---
+
+## Take35 — "Septième Armée" (Seven Nation Army cover), 2024-12-25
+
+**Blind-test identification (2026-06-05).** Hunting the 38C3 PunkAChien, the take the
+date-join labeled "Pitbul Punk/38C3 (±3d)" turned out — on PLN's ears, blind — to be
+**Septième Armée** (`live/collab/raph/septieme_armee.tidal`), a Seven Nation Army-riff
+cover at 90 BPM. Objective corroboration: the take's active orbits (1,2,3,4,5,8,9;
+6/7/10/11/12 silent) match the `.tidal` orbit→sound map exactly, incl. **d4 `bassWarsaw`**
+= the SNA bass riff (Take35 orbit-04 was 89% sub — the bass), d5 `FMRhodes1`, d9 `moogBass`.
+
+Lessons:
+- **Date-joins lie; the ear is the oracle** (3rd metadata miss caught by ear — see
+  [[feedback_locate_matrix_method]]). Take35 ≠ PunkAChien; take_gig_map corrected.
+- **Fingerprint must include SYNTHS, not just samples** — `bassWarsaw`/`moogBass`/
+  `FMRhodes1` are the identity here and TF-IDF (Dirt-Samples-only) was blind to them.
+  Fixed: sample_tfidf now indexes all sound-context tokens (samples ∪ synths).
+- **PLN reaction:** "great one — good single, or at least a SoundCloud ébauche to push."
+  ⚠️ It's a **cover** (White Stripes / Jack White) → needs a mechanical/cover license for
+  paid/DSP release; SoundCloud ébauche is lower-risk. Treat like the other covers bucket.
+
+**Open:** the real 38C3 PunkAChien ("Pitbul Punk") is still unfound — Take35 eliminated.
+Candidates left: Take36 / the 61:41 "House of Tea" set, Take37/38 (Chaos Music Club),
+or it was never recorded as a standalone. Hunt deferred behind the bleed-detector build.
--- a/armada/tide-table/sample_tfidf.json
+++ b/armada/tide-table/sample_tfidf.json
--- a/armada/tide-table/take_gig_map.md
+++ b/armada/tide-table/take_gig_map.md
@@ -25,7 +25,7 @@ _mtime≈gig date · duration: SET≥25m / track / sketch / empty(skip) · gig m
 | 2024-11-24 | Take32 | 3:51 | 13 | track |  |
 | 2024-12-01 | Take33 | 2:02 | 13 | sketch |  |
 | 2024-12-20 | Take34 | 2:27 | 13 | sketch | TOPLAP Solstice 2024 (±1d) |
-| 2024-12-25 | Take35 | 4:35 | 13 | track | [38C3] Secret Toilet Rave (±3d) |
+| 2024-12-25 | Take35 | 4:35 | 13 | track | **Septième Armée** (septieme_armee.tidal) — EAR-VERIFIED ✓ blind test; NOT Pitbul Punk/38C3 (±3d date-join was wrong) |
 | 2024-12-28 | Take36 | 61:46 | 12 | SET | [38C3] Secret Toilet Rave |
 | 2024-12-29 | Take37 | 11:40 | 12 | track | [38C3] Chaos Music Club |
 | 2024-12-29 | Take38 | 14:27 | 13 | track | [38C3] Chaos Music Club |

--- a/tools/sample_tfidf.py
+++ b/tools/sample_tfidf.py
@@ -28,7 +28,14 @@ sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "armada" / "tide
 from models import TfidfReport  # noqa: E402

 CORPUS = Path("/home/pln/Work/Sound/Tidal")
-DIRT = Path("/home/pln/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples")
+SC = Path("/home/pln/.local/share/SuperCollider")
+DIRT = SC / "downloaded-quarks" / "Dirt-Samples"
+# synthdef sources — a synth is a sound source too (PLN: same class for fingerprinting)
+SYNTH_DIRS = [SC / "synthdefs", SC / "downloaded-quarks" / "SCLOrkSynths" / "SynthDefs"]
+SUPERDIRT_BUILTIN = {"superpiano", "supermandolin", "supergong", "superpwm",
+    "superhammond", "supersaw", "supersquare", "super808", "superchip",
+    "superhoover", "superzow", "supernoise", "superreese", "superfork",
+    "supercomparator", "supervibe", "soskick", "sossnare", "default"}
 OUT = CORPUS / "armada" / "tide-table" / "sample_tfidf.json"

 # split a quoted mininotation string into candidate tokens. KEEP '_' and digits
@@ -42,13 +49,24 @@ SPLIT = re.compile(r"[\s\[\](){}<>*/.~?!@|,;:%+\-]+")
 SOUND_CTX = re.compile(r'(?:\bsound\b|\bs\b|#)\s*"([^"]*)"')


-def vocab():
-    """Authoritative sample names = entries under local Dirt-Samples."""
-    return {p.name for p in DIRT.iterdir() if not p.name.startswith(".")}
+def sound_vocab():
+    """Authoritative sound-source names = Dirt-Samples folders ∪ synthdef names.
+    Returns (vocab_set, kind_map) where kind ∈ {'sample','synth'}."""
+    kind = {}
+    for p in DIRT.iterdir():
+        if not p.name.startswith("."):
+            kind[p.name] = "sample"
+    for d in SYNTH_DIRS:
+        if d.exists():
+            for p in d.rglob("*.scd"):
+                kind.setdefault(p.stem, "synth")   # don't override a sample name
+    for s in SUPERDIRT_BUILTIN:
+        kind.setdefault(s, "synth")
+    return set(kind), kind


 def samples_in(text, vocab):
-    """Multiset of sample tokens present in one .tidal, sound-context only."""
+    """Multiset of sound tokens (samples ∪ synths) in one .tidal, sound-context only."""
    counts = Counter()
    for q in SOUND_CTX.findall(text):
        for tok in SPLIT.split(q):
@@ -58,7 +76,7 @@ def samples_in(text, vocab):


 def build():
-    voc = vocab()
+    voc, kind = sound_vocab()
    files = sorted(CORPUS.rglob("*.tidal"))
    docs = {}           # rel_path -> Counter(sample -> tf)
    df = Counter()      # sample -> # docs containing it
@@ -82,12 +100,16 @@ def build():
        tfidf = {s: round(tf * idf[s], 3) for s, tf in c.items()}
        top = sorted(tfidf.items(), key=lambda kv: -kv[1])[:6]
        tracks[rel] = {"n_samples": len(c), "tf": dict(c),
-                       "top_tfidf": [{"sample": s, "score": v, "df": df[s]}
-                                     for s, v in top]}
+                       "top_tfidf": [{"sample": s, "score": v, "df": df[s],
+                                      "kind": kind.get(s, "?")} for s, v in top]}
+    used_kinds = {s: kind.get(s, "?") for s in df}
    return {
        "corpus": str(CORPUS), "n_docs": n, "vocab_size": len(voc),
+        "n_samples": sum(1 for k in used_kinds.values() if k == "sample"),
+        "n_synths": sum(1 for k in used_kinds.values() if k == "synth"),
        "df": dict(df.most_common()),
        "idf": dict(sorted(idf.items(), key=lambda kv: -kv[1])),
+        "kinds": used_kinds,
        "tracks": tracks,
    }

@@ -96,14 +118,16 @@ def report(data, args):
    n = data["n_docs"]
    df = data["df"]
    idf = data["idf"]
+    kinds = data.get("kinds", {})
+    K = lambda s: kinds.get(s, "?")
    if args.sample:
        s = args.sample
        if s not in df:
-            print(f"'{s}' not used in any .tidal (or not a Dirt-Samples name).")
+            print(f"'{s}' not used in any .tidal (or not a known sample/synth name).")
            return
        users = [(rel, t["tf"][s]) for rel, t in data["tracks"].items() if s in t["tf"]]
        users.sort(key=lambda x: -x[1])
-        print(f"\n■ '{s}'  df={df[s]}/{n} docs  idf={idf[s]}  "
+        print(f"\n■ '{s}' [{K(s)}]  df={df[s]}/{n} docs  idf={idf[s]}  "
              f"({'RARE tell' if df[s] <= 3 else 'common' if df[s] >= 20 else 'mid'})")
        print(f"  used in {len(users)} tracks:")
        for rel, tf in users[:25]:
@@ -112,24 +136,25 @@ def report(data, args):
    if args.track:
        hits = {rel: t for rel, t in data["tracks"].items() if args.track in rel}
        for rel, t in hits.items():
-            print(f"\n■ {rel}  ({t['n_samples']} distinct samples)")
+            print(f"\n■ {rel}  ({t['n_samples']} distinct sounds)")
            print("  signature (TF-IDF):")
            for h in t["top_tfidf"]:
                d = df[h["sample"]]
-                print(f"    {h['score']:>7}  {h['sample']:<20} (df={d}, "
+                print(f"    {h['score']:>7}  {h['sample']:<18} {h.get('kind','?'):<6} (df={d}, "
                      f"{'rare' if d <= 3 else 'common' if d >= 20 else 'mid'})")
        if not hits:
            print(f"no track matching '{args.track}'")
        return
-    print(f"\nCorpus: {n} .tidal docs · vocab {data['vocab_size']} sample names\n")
+    print(f"\nCorpus: {n} .tidal docs · vocab {data['vocab_size']} sound names "
+          f"({data.get('n_samples','?')} samples + {data.get('n_synths','?')} synths used)\n")
    rare = [s for s, d in df.items() if d == 1]
-    print(f"■ RARE TELLS (df=1, used in exactly one track) — {len(rare)} samples")
-    for s in list(df)[::-1][:25]:
+    print(f"■ RARE TELLS (df=1, one track only) — {len(rare)} sounds; sample of them:")
+    for s in list(df)[::-1][:22]:
        if df[s] <= 2:
-            print(f"    df={df[s]}  idf={idf[s]:<6}  {s}")
+            print(f"    df={df[s]}  idf={idf[s]:<6} [{K(s):<6}] {s}")
    print(f"\n■ COMMON / ubiquitous (high df, weak for ID):")
-    for s, d in list(df.items())[:18]:
-        print(f"    df={d:>4}  idf={idf[s]:<6}  {s}")
+    for s, d in list(df.items())[:16]:
+        print(f"    df={d:>4}  idf={idf[s]:<6} [{K(s):<6}] {s}")


 def main():