Gaussian/Gamma Audio Splatting

Abstract
Reconstruction # 1
Reconstruction # 2
Reconstruction # 3
Reconstruction # 4

Cite this Work

    
        @misc{vinyard2024audio,
            author = {Vinyard, John},
            title = {Gaussian/Gamma Audio Splatting},
            url = {https://JohnVinyard.github.io/machine-learning/2024/6/24/gamma-audio-splat.html},
            year = 2024
        }

Abstract

In this work, we apply a Gaussian Splatting-like approach to audio to produce a lossy, sparse, interpretable, and manipulatable representation of audio. We use a source-excitation model for each audio "atom" implemented by convolving a burst of band-limited noise with a variable-length "resonance", which is built using a number of exponentially decaying harmonics, meant to mimic the resonance of physical objects. Envelopes are built in both the time and frequency domain using gamma and/or gaussian distributions. Sixty-four atoms are randomly initialized and then fitted (3000 iterations) to a short segment of audio via a loss using multiple STFT resolutions. A sparse solution, with few active atoms is encouraged by a second, weighted loss term. Complete code for the experiment can be found on github. Trained segments come from the MusicNet dataset.

Reconstruction # 1

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

Mean and variance for the gaussian gain envelope applied to the entire "atom"
A scalar, unit value for the position in time of the atom
A scalar, unit value representing the mix between the noise impulse and resonance
A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
A scalar, unit value that describes how we cross-fade from starting filter to ending filter
A scalar, unit value representing the fundamental frequency (f0) of the resonance
A scalar, unit value which represents the decay of the resonance
A scalar, unit value which represents the spacing between harmonics (multiples of f0)
Mean and variance in the frequency domain for the filter applied to the noise impulse
A scalar value representing the overall amplitude/gain of the atom
A scalar value representing the choice of reverb impulse responses
A scalar value representing the dry/wet mix between the atom and the reverb impulse response

Reconstruction # 2

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

Mean and variance for the gaussian gain envelope applied to the entire "atom"
A scalar, unit value for the position in time of the atom
A scalar, unit value representing the mix between the noise impulse and resonance
A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
A scalar, unit value that describes how we cross-fade from starting filter to ending filter
A scalar, unit value representing the fundamental frequency (f0) of the resonance
A scalar, unit value which represents the decay of the resonance
A scalar, unit value which represents the spacing between harmonics (multiples of f0)
Mean and variance in the frequency domain for the filter applied to the noise impulse
A scalar value representing the overall amplitude/gain of the atom
A scalar value representing the choice of reverb impulse responses
A scalar value representing the dry/wet mix between the atom and the reverb impulse response

Reconstruction # 3

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

Mean and variance for the gaussian gain envelope applied to the entire "atom"
A scalar, unit value for the position in time of the atom
A scalar, unit value representing the mix between the noise impulse and resonance
A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
A scalar, unit value that describes how we cross-fade from starting filter to ending filter
A scalar, unit value representing the fundamental frequency (f0) of the resonance
A scalar, unit value which represents the decay of the resonance
A scalar, unit value which represents the spacing between harmonics (multiples of f0)
Mean and variance in the frequency domain for the filter applied to the noise impulse
A scalar value representing the overall amplitude/gain of the atom
A scalar value representing the choice of reverb impulse responses
A scalar value representing the dry/wet mix between the atom and the reverb impulse response

Reconstruction # 4

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

Mean and variance for the gaussian gain envelope applied to the entire "atom"
A scalar, unit value for the position in time of the atom
A scalar, unit value representing the mix between the noise impulse and resonance
A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
A scalar, unit value that describes how we cross-fade from starting filter to ending filter
A scalar, unit value representing the fundamental frequency (f0) of the resonance
A scalar, unit value which represents the decay of the resonance
A scalar, unit value which represents the spacing between harmonics (multiples of f0)
Mean and variance in the frequency domain for the filter applied to the noise impulse
A scalar value representing the overall amplitude/gain of the atom
A scalar value representing the choice of reverb impulse responses
A scalar value representing the dry/wet mix between the atom and the reverb impulse response