🎙️ Orthogonal Subspace Demo

Demonstration for the paper Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces. This demo reproduces Figure 10: cosine similarity between frame-level S3M representations and position-dependent phonological vectors over time, illustrating how each relative phone position occupies a distinct orthogonal subspace.

Upload, record, or use the example audio, configure the parameters, and click Run.

Parameters

  • Vector extraction method: How phonological vectors are estimated from S3M representations. Different options correspond to different training dataset/calculating the vectors.
  • Phonological features: Which phonological features to include in the plot. Deselect features to reduce clutter or isolate a single dimension of contrast.
  • Context size: Number of relative phone positions. 0 = vectors from current phone only; k = vectors from relative positions −k through +k. Larger values reveal how far phonological features extend beyond current (or immediately adjacent) phones.
  • Cosine similarity range: Upper bound of the cosine similarity (default +/- 0.4). Adjust to zoom in on fine-grained differences or accommodate low-similarity outputs.
Vector extraction method
0 4
0.1 1