Commit 3e14a623 by PLN (Algolia)

feat(sample-classify): CLAP zero-shot sample-family analyzer (katana)

Ground sample families by LISTENING, not by name. sample_classify.py runs
laion/clap-htsat-unfused (transformers, torch CPU) over Dirt-Samples one-shots,
scoring each against text prompts for the 12 fleet families; aggregates per folder
(dominant + homogeneity → kits show as mixed). ffmpeg audio I/O, no librosa.
validate/run/one commands; validate measures top-1 vs the name-confident folders.

Finding (validate): 58% top-1 agreement with the name-heuristic at fine 12-way.
KEY: the name 'ground truth' itself is wrong in many disagreements — CLAP correctly
calls 808hc/808mc congas (perc), which the name-classifier mislabeled bass via '808'.
CLAP is near-perfect on vox/break/clear-bass/kick/keys; the genuinely fuzzy zone is
the melodic cluster (synth/lead/keys/pad). Prompt-tuning is whack-a-mole on noisy
truth. Conclusion: trust CLAP coarsely, not at fine 12-way silently.
parent e9bba22c
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment