feat(feature_eda): MDA over 1485 samples → the superfeature axes
sample_features.py overfetched 36 L0/L1 features × 1485 corpus samples; feature_eda mines them three ways: - correlation: only 2 redundant pairs ≥0.9 (duration~temporal_centroid 0.97, bandwidth~rolloff 0.91) → the overfetch was lean, 34/36 independent. - PCA: intrinsic dim 19 (90%) / 24 (95%) — genuinely high-D. The 5 leading PCs are interpretable SUPERFEATURE AXES: PC1 brightness (rolloff/centroid), PC2 timbre (mfcc5-8), PC3 loudness (rms/peak/flux), PC4 envelope/time (temporal_centroid, decay_slope, attack — the kick↔ bass axis), PC5 tonal-vs-noisy (kurtosis/chroma_entropy). - clustering: KMeans(12) vs resolver families ARI=0.25 NMI=0.40 (timbral clusters partly orthogonal to semantic family — consistent with 'folders are loose'). RF importance: spectral_centroid + temporal_centroid are the #1/#2 family discriminators → validates productizing the kick↔ bass tiebreaker (#80). TDD: 3 synthetic invariants (redundancy/dim/separation) + real-data load guard.
Showing
armada/tide-table/feature_eda.json
0 → 100644
armada/tide-table/feature_eda.py
0 → 100644
This diff is collapsed.
Click to expand it.
armada/tide-table/sample_features.json
0 → 100644
This source diff could not be displayed because it is too large.
You can
view the blob
instead.
armada/tide-table/tests/test_feature_eda.py
0 → 100644
Please
register
or
sign in
to comment