@misc{vinyard2024audio,
author = {Vinyard, John},
title = {Gaussian/Gamma Audio Splatting},
url = {https://JohnVinyard.github.io/machine-learning/2024/6/24/gamma-audio-splat.html},
year = 2024
}
In this work, we apply a Gaussian Splatting-like approach to audio to produce a lossy, sparse, interpretable, and manipulatable representation of audio. We use a source-excitation model for each audio "atom" implemented by convolving a burst of band-limited noise with a variable-length "resonance", which is built using a number of exponentially decaying harmonics, meant to mimic the resonance of physical objects. Envelopes are built in both the time and frequency domain using gamma and/or gaussian distributions. Sixty-four atoms are randomly initialized and then fitted (3000 iterations) to a short segment of audio via a loss using multiple STFT resolutions. A sparse solution, with few active atoms is encouraged by a second, weighted loss term. Complete code for the experiment can be found on github. Trained segments come from the MusicNet dataset.
Each atom consists of the following parameters:
Each atom consists of the following parameters:
Each atom consists of the following parameters:
Each atom consists of the following parameters: