Skip to the content.

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

Authors: Ioannis Prokopiou¹, Pantelis Vikatos², Maximos Kaliakatsos-Papakostas³, Theodoros Giannakopoulos², Themos Stafylakis⁴

Affiliations:
¹ Athens University of Economics and Business
² Orfium
³ Hellenic Mediterranean University
⁴ NCSR “Demokritos” & Archimedes/Athena R.C.


📄 Read Paper 💻 View Code

Abstract

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering.

Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain and introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization to address feature entanglement.


Approach & Methodology

Our framework enables inference-time steerability without expensive fine-tuning.

Key Components

\[\mathbf{h}_{\text{steer}}^{(l)} \leftarrow \mathbf{h}^{(l)} + \alpha \, \mathbf{v}^{(l)}\]

Pipeline Overview

  1. Dataset Curation: Classify songs by musical attributes (pitch, duration) using calibrated thresholds
  2. Activation Extraction: Extract residual stream activations from all 12 transformer layers
  3. Steering Vector Computation: Calculate difference vectors: v = mean(high) - mean(low)
  4. Vector Composition: Apply Gram-Schmidt orthogonalization for dual steering
  5. Generation: Inject steering vectors at inference time to control musical attributes

Results Summary

Single Steering

Dual Steering

Conditioned Dual Steering

Testing ability to override strong conditioning context:

Scenario Success Rate Avg Degradation
Low+Short → High+Long 96.1% 3.03
Low+Long → High+Short 90.6% 1.33
High+Long → Low+Short 85.6% 2.59
High+Short → Low+Long 82.2% 5.80

Key Finding: Upward steering (low→high) significantly easier than downward (high→low), with 82-96% success even when fighting 16-beat conditioning context.


Audio Examples

All examples use the same conditioning context (first 16 beats) to demonstrate steering effectiveness. Listen to the dramatic changes between baseline (no intervention) and steered generations.

1. Pitch Control

Demonstrating the ability to override pitch characteristics in the conditioning context.

Scenario 1: High Context → Low Continuation

Baseline (No Steering) Steered (α=-0.5)

Scenario 2: Low Context → High Continuation

Baseline (No Steering) Steered (α=+1.0)

2. Duration Control

Demonstrating control over rhythmic density and note lengths.

Scenario 1: Long Context → Short Continuation

Baseline (No Steering) Steered (α=-0.5)

Scenario 2: Short Context → Long Continuation

Baseline (No Steering) Steered (α=+0.5)

3. Dual Steering (Disentangled Control)

Simultaneous control of Pitch AND Duration using Gram-Schmidt Orthogonalization to prevent feature interference.

Scenario 1: High Pitch, Short Duration → Low Pitch, Long Duration

Baseline (α_pitch=0, α_duration=0) Steered (α_pitch=-0.5, α_duration=+1.0)
Result: 82.2% success rate | Avg Degradation: 5.80

Scenario 2: Low Pitch, Short Duration → High Pitch, Long Duration

Baseline (α_pitch=0, α_duration=0) Steered (α_pitch=+1.5, α_duration=+0.5)
Result: 96.1% success rate | Avg Degradation: 3.03

Scenario 3: Low Pitch, Long Duration → High Pitch, Short Duration

Baseline (α_pitch=0, α_duration=0) Steered (α_pitch=+1.0, α_duration=-0.5)
Result: 90.6% success rate | Avg Degradation: 1.33 (best quality)

Scenario 4: High Pitch, Long Duration → Low Pitch, Short Duration

Baseline (α_pitch=0, α_duration=0) Steered (α_pitch=-1.5, α_duration=-1.2)
Result: 85.6% success rate | Avg Degradation: 2.59

Code & Reproducibility

The complete codebase is available on GitHub:

Repository: https://github.com/GiannisProkopiouOrfium/music-transformer-sae

Key Scripts

Quick Start

# Clone repository
git clone https://github.com/GiannisProkopiouOrfium/music-transformer-sae.git
cd music-transformer-sae

# Install dependencies
pip install -r requirements.txt

# Run dual steering experiment
python steering_interventions/dual_steering/test_multi_conditioned.py \
    --pitch_vectors outputs/steering_vectors/average_pitch_steering_vectors.pt \
    --duration_vectors outputs/steering_vectors/average_duration_steering_vectors.pt \
    --output_dir outputs/phase4_conditioned \
    --strategies gram_schmidt_pitch

Citation

If you use this work in your research, please cite:

@inproceedings{prokopiou2026latent,
  title={Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation},
  author={Prokopiou, Ioannis and Vikatos, Pantelis and Kaliakatsos-Papakostas, Maximos and Giannakopoulos, Theodoros and Stafylakis, Themos},
  booktitle={EUSIPCO 2026},
  year={2026}
}

Acknowledgments

This work was supported by Athens University of Economics and Business, Orfium, Hellenic Mediterranean University, NCSR “Demokritos”, and Archimedes/Athena R.C.

Base model: Multitrack Music Transformer (MMT) by Hao-Wen Dong and colleagues.


© 2026 | Ioannis Prokopiou