-
feat(sample-classify): fine ontology + PANNs + ensemble methods · bb70f394
Per PLN: richer ontology + PANNs/AudioSet + ensembles for sample grounding. - sample_ontology.py: 99 fine descriptors across the 12 families ('this is the sound of {a reese bass}'); scored per-descriptor then marginalized to family. CLAP fine: 58% -> 68% top-1 (coarse super-family 76%) vs the noisy name truth. - sample_panns.py: PANNs Cnn14 (AudioSet 527) -> conservative label->family map -> per-family prob vector. ffmpeg @32k, zero-pad short one-shots (Cnn14 needs >=1s of mel frames or conv5 collapses). Weak on electronic one-shots (AudioSet 'Clapping'=applause, not a drum-machine clap). - sample_classify.py: --method clap|panns|ensemble, --fine|--coarse. clap_vector() exposes the family-prob vector; ensemble = mean of CLAP+PANNs vectors -> argmax. Scoreboard (vs name-heuristic, itself noisy): clap-coarse 58% | clap-fine 68% | panns - | ensemble - (head-to-head primed, not yet run). Stubborn residual = bass<->kick one-shot (spectral decay tiebreaker is the next lever).PLN (Algolia) authoredbb70f394
×