How many parameters does SAME-L have?

SAME-L has approximately 0.9 billion parameters.

SAME-L was published by Stability AI on Hugging Face.

SAME-L

Name: SAME-L
Author: Stability AI

Otherby Stability AI·Model page ↗

Stability AI's 852M-parameter audio autoencoder for high-quality music and sound effect encoding and reconstruction.

Model Description

Please note: For commercial use, please refer to https://stability.ai/license

Model Description

Latent representations are at the heart of the majority of modern generative models. In the audio domain they are typically produced by a neural-audio-codec autoencoder. In this work we introduce SAME (Semantically Aligned Music autoEncoder), a transformer-based autoencoder for stereo music and general audio that reaches a 4096x temporal compression ratio (roughly twice the current standard) while maintaining excellent reconstruction quality and strong downstream generative performance. We achieve this by combining a set of semantic regularisation approaches with phase-aware reconstruction losses. The architecture also delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives. Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.

Usage

This model can be used with:

the stable-audio-3 inference and fine-tuning library
the stable-audio-tools research library

Using with `stable-audio-3`

import torchaudio
from stable_audio_3 import AutoencoderModel

ae = AutoencoderModel.from_pretrained("same-l")
waveform, sr = torchaudio.load("audio.wav")
latents = ae.encode(waveform, sr)
audio_out = ae.decode(latents)

Using with `stable-audio-tools`

import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond

device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cuda":
  model_half = True

# Download model
model, model_config = get_pretrained_model("stabilityai/SAME-L")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]

model = model.to(device)
if model_half:
  model = model.to(torch.float16)

audio, sr = torchaudio.load(/path/to/audiofile)  # [channels, samples]
if audio.shape[0] == 1:
    audio = audio.repeat(2, 1)

audio = audio.unsqueeze(0).to(device)
if model_half:
  audio = audio.half()
with torch.no_grad():
    latents = model.encode_audio(audio)  
    reconstructed = model.decode_audio(latents)         
reconstructed = reconstructed.squeeze(0).cpu()  
reconstructed = reconstructed.to(torch.float32).clamp(-1, 1).mul(32767).to(torch.int16).cpu()

Model Details

Model type: SAME is a continuous autoencoder model based on a transformer architecture.
Language(s): English
License: Stability AI Community License.
Research Paper: https://arxiv.org/abs/2605.18613

Training dataset

Datasets Used

Our dataset consists of ~19,500 hours of licensed production audio from AudioSparx which includes a 66/25/9% mix of music, sound effects, and instrument stems.

Author

Stability AI

Organization · ✓

stabilityai

Details

Downloads6.3K

Likes20

AccessOpen Source

Parameters852M

Licenseother

Librarystable-audio-3

CreatedMay 17, 2026

UpdatedJun 24, 2026

View on Hugging Face

Languages

Get the full context.

Author

Stability AI

Organization · ✓

stabilityai

Details

Downloads6.3K

Likes20

AccessOpen Source

Parameters852M

Licenseother

Librarystable-audio-3

CreatedMay 17, 2026

UpdatedJun 24, 2026

View on Hugging Face

Languages

Get the full context.

Model Description

Model Description

Usage

Using with stable-audio-3

Using with stable-audio-tools

Model Details

Training dataset

Datasets Used

Using with `stable-audio-3`

Using with `stable-audio-tools`