Gaussian/Gamma Audio Splatting

Table of Contents

Cite this Work

    
        @misc{vinyard2024audio,
            author = {Vinyard, John},
            title = {Gaussian/Gamma Audio Splatting},
            url = {https://JohnVinyard.github.io/machine-learning/2024/6/24/gamma-audio-splat.html},
            year = 2024
        }
    

Abstract

In this work, we apply a Gaussian Splatting-like approach to audio to produce a lossy, sparse, interpretable, and manipulatable representation of audio. We use a source-excitation model for each audio "atom" implemented by convolving a burst of band-limited noise with a variable-length "resonance", which is built using a number of exponentially decaying harmonics, meant to mimic the resonance of physical objects. Envelopes are built in both the time and frequency domain using gamma and/or gaussian distributions. Sixty-four atoms are randomly initialized and then fitted (3000 iterations) to a short segment of audio via a loss using multiple STFT resolutions. A sparse solution, with few active atoms is encouraged by a second, weighted loss term. Complete code for the experiment can be found on github. Trained segments come from the MusicNet dataset.

Reconstruction # 1

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

  • Mean and variance for the gaussian gain envelope applied to the entire "atom"
  • A scalar, unit value for the position in time of the atom
  • A scalar, unit value representing the mix between the noise impulse and resonance
  • A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
  • A scalar, unit value that describes how we cross-fade from starting filter to ending filter
  • A scalar, unit value representing the fundamental frequency (f0) of the resonance
  • A scalar, unit value which represents the decay of the resonance
  • A scalar, unit value which represents the spacing between harmonics (multiples of f0)
  • Mean and variance in the frequency domain for the filter applied to the noise impulse
  • A scalar value representing the overall amplitude/gain of the atom
  • A scalar value representing the choice of reverb impulse responses
  • A scalar value representing the dry/wet mix between the atom and the reverb impulse response


Reconstruction # 2

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

  • Mean and variance for the gaussian gain envelope applied to the entire "atom"
  • A scalar, unit value for the position in time of the atom
  • A scalar, unit value representing the mix between the noise impulse and resonance
  • A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
  • A scalar, unit value that describes how we cross-fade from starting filter to ending filter
  • A scalar, unit value representing the fundamental frequency (f0) of the resonance
  • A scalar, unit value which represents the decay of the resonance
  • A scalar, unit value which represents the spacing between harmonics (multiples of f0)
  • Mean and variance in the frequency domain for the filter applied to the noise impulse
  • A scalar value representing the overall amplitude/gain of the atom
  • A scalar value representing the choice of reverb impulse responses
  • A scalar value representing the dry/wet mix between the atom and the reverb impulse response


Reconstruction # 3

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

  • Mean and variance for the gaussian gain envelope applied to the entire "atom"
  • A scalar, unit value for the position in time of the atom
  • A scalar, unit value representing the mix between the noise impulse and resonance
  • A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
  • A scalar, unit value that describes how we cross-fade from starting filter to ending filter
  • A scalar, unit value representing the fundamental frequency (f0) of the resonance
  • A scalar, unit value which represents the decay of the resonance
  • A scalar, unit value which represents the spacing between harmonics (multiples of f0)
  • Mean and variance in the frequency domain for the filter applied to the noise impulse
  • A scalar value representing the overall amplitude/gain of the atom
  • A scalar value representing the choice of reverb impulse responses
  • A scalar value representing the dry/wet mix between the atom and the reverb impulse response


Reconstruction # 4

Original

Full Reconstruction (Sum of Segments) after 3000 iterations

Independent Atoms

2D Projection of 16-Dimensional Atom Parameters using T-SNE

16-Dimensional Atom Parameters

Each atom consists of the following parameters:

  • Mean and variance for the gaussian gain envelope applied to the entire "atom"
  • A scalar, unit value for the position in time of the atom
  • A scalar, unit value representing the mix between the noise impulse and resonance
  • A scalar, unit decay value which is used gto produce a cumulative product, representing the decay of the resonance
  • A scalar, unit value that describes how we cross-fade from starting filter to ending filter
  • A scalar, unit value representing the fundamental frequency (f0) of the resonance
  • A scalar, unit value which represents the decay of the resonance
  • A scalar, unit value which represents the spacing between harmonics (multiples of f0)
  • Mean and variance in the frequency domain for the filter applied to the noise impulse
  • A scalar value representing the overall amplitude/gain of the atom
  • A scalar value representing the choice of reverb impulse responses
  • A scalar value representing the dry/wet mix between the atom and the reverb impulse response