The Cracked Bassoon


Ripple sounds

Filed under hearing, python.

Ripple sounds are broadband stimuli with sinusoidal spectral envelopes. They were first proposed by (Kowalski et al., 1996) to measure the response properties of neurons within the auditory system (reviewed by Shamma, 2001). They are frequently described as the auditory equivalent to visual gratings (cf. Shapley & Lennie, 1985 ). Indeed, as we shall see shortly, plotted ripple envelopes look exactly like gratings. Ripple sounds have also been used to study auditory short-term memory (Visscher et al., 2007).

Mathematical definition

Ripple sounds are essentially a kind of broadband noise, but in practice it is difficult to create noise and shape it with filters to a desired envelope because this would require a very elaborate filter design. Instead, ripple sounds are created by summing together many amplitude-modulated sinusoids with different frequencies. The amplitude-modulation function controls the ripple envelope and can be used to create ripples with different depths, densities, velocities, and initial phases.

The symbolic definition is a bit cumbersome. First, we define a function that returns the instantaneous pressure of a ripple sound at a given time point as

\[\begin{equation} \mathrm{ripple}\left( t \right) = \sum_{i=1}^{n}{ s\left( i, t \right) \cdot a\left( i, t \right) \cdot q\left( i \right) } \end{equation}\]

where \(t\) is time in s, \(i\) indexes the sinusoid, \(n\) is the number of sinusoids (typically in the hundreds or thousands), and the three terms in the summation are all functions defined below.

The first term, \(s\), returns the waveform of an unmodulated sinusoid,

\[\begin{equation} s\left( i, t \right) = \sin\left[2\pi{}\cdot f\left(i\right)\cdot t+\varphi\right] \end{equation}\]

where \(f\) is the ordinary frequency of the tone in Hz and \(\varphi\) is the starting phase in radians. I discussed sinusoids in detail in a previous post. Frequencies are usually spaced logarithmically over a wide range. For example, tones can be spaced evenly along a \(\log_2\) (i.e., musical) scale using

\[\begin{equation} f\left( i \right) = f_0\left(\frac{f_\left(n-1\right)}{f_0}\right)^{\frac{i}{n - 1}} \end{equation}\]

where \(f_0\) is the frequency of the lowest tone in Hz and \(f_\left(n-1\right)\) is the frequency of the highest tone in Hz. Another option is to space tones evenly in terms of equivalent rectangular bandwidths (Glasberg & Moore, 1990). I’ll implement this in a future post. Starting phases should be random, so we draw them from a circular uniform distribution,

\[\begin{equation} \varphi \sim \mathrm{Uniform}\left( 0, 2\pi \right) \end{equation}\]

The next term, \(a\), is the amplitude-modulation function,

\[\begin{equation} a\left( i, t \right) = 1 + \\ \Delta\left(t\right) \cdot \sin \left\{ 2\pi \left[ w^\prime\left(t\right) + \Omega\left(t\right) \cdot x\left(i\right) \right] + \phi \right\} \end{equation}\]

where

\[\begin{equation} x\left( i \right) = \log_2\left(\frac{f\left(i\right)}{f_0}\right) \end{equation}\]

and

\[\begin{equation} w^\prime\left( t \right) = \int_0^t w \left( \tau \right)\: \mathrm{d}\tau \end{equation}\]

and \(\Delta\), \(\Omega\), and \(w\) are potentially time-varying functions that control ripple depth, density, and velocity, respectively, and \(\phi\) is the ripple starting phase.

The last term, \(q\), scales the amplitude-modulated sinusoids so that the resulting sound has a desired long-term average spectrum. For example, to ensure equal energy per octave, as in pink noise, we can use

\[\begin{equation} q\left(i\right)=\frac{1}{\sqrt{f\left(i\right)}} \end{equation}\]

Alternatively, we could use \(q\left(i\right)=1\) to ensure a flat long-term average spectrum, like white noise.

(Nitpicker’s note: The above definition produces ripple sounds at arbitrary overall sound pressure levels (SPLs). To produce sounds at a desired overall SPL, the simplest solution is to re-scale the entire waveform.)

Examples

When \(\Delta\left( t \right) = 0\), there is no sinusoidal variation in the envelope and the result is just noise.

When \(\Delta\left( t \right)\) and \(\Omega\left( t \right)\) are constant, non-zero values and \(w\left( t \right) = 0\), “stationary” ripple sounds are created. These sounds have sinusoidal spectral envelopes, but their envelopes do not change over time.

Envelopes of three stationary ripple sounds. Brighter areas have more energy. The middle sound has a higher ripple density (\(\Omega\)) than the leftmost sound, and the rightmost sound has a shallower ripple depth (\(\Delta\)).

Setting \(\Delta\left( t \right)\), \(\Omega\left( t \right)\), and \(w\left( t \right)\) to constant, non-zero values creates “moving” ripple sounds. These are the most common kind of ripple sound found in the literature (Shamma, 2001).

Envelopes of moving ripple sounds. The middle sound has a greater ripple drift (\(w\)) than the leftmost sound, while the rightmost sound has a negative drift.

Finally, we can create “dynamic” moving ripples by defining \(\Delta\left( t \right)\), \(\Omega\left( t \right)\), and \(w\left( t \right)\) as time-varying functions (Escabı́ & Schreiner, 2002).

Envelopes of dynamic moving ripple sounds. In each example, one of \(\Delta\left( t \right)\) (leftmost), \(\Omega\left( t \right)\) (middle), and \(w\left( t \right)\) (rightmost) was defined as a spline between multiple different values, creating a smooth walk.

Python code

Below is some Python code that creates and plays all of the examples from above. It can also produce the envelope figures, but this takes a really long time, so you might want to comment out those lines or reduce the value of \(n\) (in the script, this is n inside main()).

"""Synthesize, plot, and play ripple sounds.

"""
import numpy as np
import sounddevice as sd
import matplotlib.pyplot as plt

from scipy.interpolate import interp1d
from scipy.io import wavfile


a0 = 1e-5  # reference amplitude
sr = 44100  # sample rate


def ripple_sound(dur, n, omega, w, delta, phi, f0, fm1, l=70):
    """Synthesizes a ripple sound.

    Args:
        dur (float): Duration of sound in s.
        n (int): Number of sinusoids.
        omega (:obj:`float` or `array`-like): Ripple density in Hz. Must be a single
            value or an array with length `duration * sr`.
        w (:obj:`float` or `array`-like): Ripple drift in Hz. Must be a single
            value or an array with length `duration * sr`.
        delta (:obj:`float` or `array`-like): Normalized ripple depth. Must be a single
            value or an array with length `duration * sr`. Value(s) must be in
            the range [0, 1].
        phi (float): Ripple starting phase in radians.
        f0 (float): Frequency of the lowest sinusoid in Hz.
        fm1 (float): Frequency of the highest sinusoid in Hz.
        l (:obj:`float`, optional): Level in dB of the sound, assuming a pure tone with
            peak amplitude `a0` is 0 dB SPL. TODO: Implement this correctly!

    Returns:
        y (np.array): The waveform.
        a (np.array): The envelope (useful for plotting).

    """
    # create sinusoids
    m = int(dur * sr)  # total number of samples
    shapea = (1, m)
    shapeb = (n, 1)
    t = np.linspace(0, dur, int(m)).reshape(shapea)
    i = np.arange(n).reshape(shapeb)
    f = f0 * (fm1 / f0) ** (i / (n - 1))
    sphi = 2 * np.pi * np.random.random(shapeb)
    s = np.sin(2 * np.pi * f * t + sphi)

    # create envelope
    x = np.log2(f / f0)
    if hasattr(w, "__iter__"):
        wprime = np.cumsum(w) / sr
    else:
        wprime = w * t
    a = 1 + delta * np.sin(2 * np.pi * (wprime + omega * x) + phi)

    # create the waveform
    y = (a * s / np.sqrt(f)).sum(axis=0)

    # scale to a given SPL
    # TODO: This is likely wrong; I haven't checked it
    y /= np.abs(y).max()
    y *= a0 * 10 ** (l / 20)

    return y, a


def smooth_walk(points, dur):
    """Return a smooth walk.

    Args:
        points (:obj:`array`-like): Points to visit. These are spaced evenly and a
            spline is used to interpolate them.
        dur (float): Duration of sound in s.

    Returns:
        y (numpy.array): Values of the random walk.

    """
    f = interp1d(np.linspace(0, dur, len(points)), points, "cubic")
    return f(np.linspace(0, dur, dur * sr))


def plot_env(a, ax, labels=False):
    """Plots an envelope onto an axis.

    Args:
        a (np.array): An array with shape (m, n) where m is the total number of samples
            and n in the number of sinusoids and values representing instantaneous
            amplitudes.
        ax (matplotlib.axes._subplots.AxesSubplot): Axis.
        labels (:obj:`bool`, optional): Include labels or not.

    """
    ax.pcolormesh(a, rasterized=True, vmin=0, vmax=2)
    ax.set_xticks([], [])
    ax.set_yticks([], [])
    if labels is True:
        ax.set_xlabel("Time")
        ax.set_ylabel("Frequency\n($log_2$ scale)")


def main():
    """Create and plot some ripple sounds.

    """
    from matplotlib import rcParams

    figsize = rcParams["figure.figsize"]
    rcParams["figure.figsize"] = [figsize[0], int(figsize[1] / 2)]
    rcParams["font.size"] = 14

    # default parameter values
    np.random.seed(0)
    dur = 1
    n = 1000  # reduce this if you want to make figures; otherwise it takes forever!
    omega = 1
    w = 8
    delta = 0.9
    f0 = 250
    fm1 = 8000
    phi = 0.0
    args = (phi, f0, fm1)

    # filenames of figures
    fn = "../../assets/images/%s-ripples.svg"

    # static ripple sounds
    print("making static ripple sounds")
    _, axes = plt.subplots(1, 3, constrained_layout=True, sharex="all", sharey="all")
    for i, ax in enumerate(axes):
        _omega = 1.5 if i == 1 else omega
        _w = 0
        _delta = 0.5 if i == 2 else delta
        print(f"sound with omega={_omega:.2f}, w={_w:.2f}, and delta={_delta:.2f}")
        y, a = ripple_sound(dur, n, _omega, _w, _delta, *args)
        print("playing sound")
        sd.play(y, sr, blocking=True)
        print("plotting")
        plot_env(a, ax, ax == axes[0])
    print("saving a figure")
    plt.savefig(fn % "static", bbox_inches=0, transparent=True)

    # moving ripple sounds
    print("making moving ripple sounds")
    _, axes = plt.subplots(1, 3, constrained_layout=True, sharex="all", sharey="all")
    _ws = [4, 8, -4]
    for i, ax in enumerate(axes):
        _omega = omega
        _w = _ws[i]
        _delta = delta
        print(f"sound with omega={_omega:.2f}, w={_w:.2f}, and delta={_delta:.2f}")
        y, a = ripple_sound(dur, n, _omega, _w, _delta, *args)
        print("playing sound")
        sd.play(y, sr, blocking=True)
        print("plotting")
        plot_env(a, ax, ax == axes[0])
    print("making a figure")
    plt.savefig(fn % "moving", bbox_inches=0, transparent=True)

    # dynamic moving ripple sounds
    print("making dynamic static ripple sounds")
    _, axes = plt.subplots(1, 3, constrained_layout=True, sharex="all", sharey="all")
    for i, ax in enumerate(axes):
        _delta = smooth_walk(np.random.random(10), dur) if i == 0 else delta
        _omega = smooth_walk([1] * 5 + [1.5] * 5, dur) if i == 1 else omega
        _w = smooth_walk([-8, 0, 4, 8], dur) if i == 2 else w
        print(f"{[_delta, _omega, _w][i].shape}")
        y, a = ripple_sound(dur, n, _omega, _w, _delta, *args)
        print("playing sound")
        sd.play(y, sr, blocking=True)
        print("plotting")
        plot_env(a, ax, ax == axes[0])
    print("making a figure")
    plt.savefig(fn % "dynamic", bbox_inches=0, transparent=True)


if __name__ == "__main__":
    main()

References

Escabı́, M. A., & Schreiner, C. E. (2002). Nonlinear Spectrotemporal Sound Analysis by Neurons in the Auditory Midbrain. The Journal of Neuroscience, 22(10), 4114–4131. 10.1523/JNEUROSCI.22-10-04114.2002

Glasberg, B. R., & Moore, B. C. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1-2), 103–138. 10.1016/0378-5955(90)90170-T

Kowalski, N., Depireux, D. A., & Shamma, S. A. (1996). Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. Journal of Neurophysiology, 76(5), 3503–3523. 10.1152/jn.1996.76.5.3503

Shamma, S. (2001). On the role of space and time in auditory processing. Trends in Cognitive Sciences, 5(8), 340–348. 10.1016/S1364-6613(00)01704-6

Shapley, R., & Lennie, P. (1985). Spatial frequency analysis in the visual system. Annual Review of Neuroscience, 8(1), 547–581. 10.1146/annurev.ne.08.030185.002555

Visscher, K. M., Kaplan, E., Kahana, M. J., & Sekuler, R. (2007). Auditory short-term memory behaves like visual short-term memory. PLoS Biology, 5(3), e56. 10.1371/journal.pbio.0050056

Version history

Related posts