-
feat(feature_eda): MDA over 1485 samples → the superfeature axes · 01b88ed5
sample_features.py overfetched 36 L0/L1 features × 1485 corpus samples; feature_eda mines them three ways: - correlation: only 2 redundant pairs ≥0.9 (duration~temporal_centroid 0.97, bandwidth~rolloff 0.91) → the overfetch was lean, 34/36 independent. - PCA: intrinsic dim 19 (90%) / 24 (95%) — genuinely high-D. The 5 leading PCs are interpretable SUPERFEATURE AXES: PC1 brightness (rolloff/centroid), PC2 timbre (mfcc5-8), PC3 loudness (rms/peak/flux), PC4 envelope/time (temporal_centroid, decay_slope, attack — the kick
↔ bass axis), PC5 tonal-vs-noisy (kurtosis/chroma_entropy). - clustering: KMeans(12) vs resolver families ARI=0.25 NMI=0.40 (timbral clusters partly orthogonal to semantic family — consistent with 'folders are loose'). RF importance: spectral_centroid + temporal_centroid are the #1/#2 family discriminators → validates productizing the kick↔ bass tiebreaker (#80). TDD: 3 synthetic invariants (redundancy/dim/separation) + real-data load guard.PLN (Algolia) authored01b88ed5
×