Creating Music—Part 3

Part 1 Part 2

Getting a Deeper Understanding

Now that I’ve had my fun playing around with high-level audio environments (namely Sonic Pi and Schismtracker), I want to actually learn how those sounds are made. This brings us to the topic of audio synthesis. There is an incredible series of articles by a superhuman Gordon Reid titled “Synth Secrets” published in Sound on Sound between 1999–2004. Though focusing on analog synths, it really helped me to lay the foundational knowledge to pursue my capstone project: music through code.

What is Sound?

Note: I have never taken a formal music theory class, take this section with a grain of salt.

Any sound can be made using three parts: amplitude (how loud it is, measured in decibels dB), frequency (the pitch, measured in hertz Hz) and the timbre. The latter is the most interesting. Sounds that are “pure,” not changing over time, are realllly booooring. A plain sine wave (∿) is an example of a pure sound, and it is not at all interesting to listen to.

Timbre is another way to say the “feel” of a sound. Sounds are composed of a main frequency (usually the lowest in pitch) and multiple other frequencies at higher pitches (the overtones). These overtones create the timbre. If the overtones are whole number multiples (2×, 3×, etc…) then it sounds “good.” These whole number multiples are called harmonics.

Sounds have two different, yet interchangeable, representations.[1] The first is a waveform or function. This can be any continuous line, curve (or otherwise) where the \(x\) value is time and the \(y\) value is voltage going to the speaker.[2] This function can be as complicated as required to produce a desired sound.

The second representation is a series of plain sine waves. They are usually visualized as a bar graph with \(x\) being frequency and \(y\) being amplitude. Converting to a complex waveform to this collection of amplitudes and frequencies uses the Fourier transform, named after French mathematician Joseph Fourier: \[\hat{f}(\xi) = \int_{-\infty}^{\infty} f(x)\ e^{-2\pi i x \xi}\,dx\]

The main takeaway is that the red waveform and the blue series of sine waves are equivalent ways to describe the exact same thing.

From Math → 0b01010101

This math stuff is nice to look at and all, but how can I make a sick beat on my computer? For this, I turned to OpenBSD’s Sndio library and server to create the sounds. The handy manual page was incredibly useful as well as Boulanger and Lazzarini’s The Audio Programming Book (MIT Press, 2011).

After creating a handler (an opaque pointer type) with sio_open(), parameters for the sound can be set using sio_initpar() and sio_setpar(). For my test project, I set the parameters to 16 bits per sample, two channels (stereo audio) and the sample rate to 44100 Hz. Woah woah woah! Slow down! What the heck does any of that mean?!? First off, this journal entry is not a C tutorial, but it will go in depth on digital audio.

So let’s, shall we? First are samples. A sample is a really small sliver of digital audio. Samples are needed in the first place (instead of just functions) because computers cannot deal with analog values or fancy math things like infinite integrals themselves. So instead we estimate, creating discrete samples of audio. Also, we may not know what sound we want to make in the future (say, if someone presses a key on a keyboard), so samples allow us to create audio on demand. Later on, the actual format of samples is discussed.

When I set the sample rate to 44100 Hz, that means there are that many samples played each second. Well, actually twice that many are played concurrently because I set there to be 2 channels of audio. Digital audio samples are interleaved, meaning if there are two channels L and R, the samples are arranged like LRLRLRLRLR…. One cycle of these samples (in our case just LR) is called a frame.

Let’s Create a Sine Wave

#include <sndio.h> static const unsigned int SAMPLE_RATE = 44100; int main(int argc, char *argv[]) { struct sio_hdl *hdl; struct sio_par par; if ((hdl = sio_open(SIO_DEVANY, SIO_PLAY, 0)) == NULL) err(1, NULL); sio_initpar(&par); par.bits = 16; par.pchan = 2; par.sig = 1; par.rate = SAMPLE_RATE; sio_setpar(hdl, &par); sio_start(hdl); /* Play your samples here */ sio_close(hdl); return 0; }

This is just the initialization routine: we get our hdl, setup our parameters (16 bits, 2 channels, yes to signed samples and the 44100 Hz sample rate), tell sndio that we are going to start playing with sio_start(), generate and play samples (see next) and finally close up our handle.

Generating samples is more interesting:

#include <math.h> void play_sine(struct sio_hdl *hdl, double seconds) { double samp; double freq = 440; double tau = 2.0 * M_PI; unsigned int nsamples = seconds * SAMPLE_RATE; short samples[nsamples]; unsigned int i; for (i = 0; i < nsamples; i += 2) { samp = sin(tau * freq * i / SAMPLE_RATE); samples[i] = samp * 32767.0; samples[i + 1] = samp * 32767.0; } sio_write(hdl, samples, sizeof(samples)); }

Sixteen-bit samples seemed to be pretty common when doing my research, mapping to C’s short type. Sndio gives us programmers a really nice API to create sound with sio_write(). After we allocate our buffer (called samples above) and fill it with samples, we can just pass a pointer to it (along with the number of bytes) and, once enough audio is in sndio’s internal buffer, we get sound!

We increment by 2 to fill up each channel separately (a full frame), but at this point it wouldn’t do much difference to fill the whole buffer sequentially by one.

The goodies are on the first line in the loop, let’s break it down: samp = sin(tau * freq * i / SAMPLE_RATE);. We can create a sine wave with math.h’s sin function. The argument is expected to be in radians, not degrees, throwback to trigonometry class! To create the angle, we multiply how far we are around a full turn, i / SAMPLE_RATE,[3] by \(2\pi\) or \(\tau\) (tau). Last we can multiply that by the frequency freq in hertz we want the pitch to be (in this case 440 Hz).

Okay great, but there is a problem. samples is an array of short, but our samp is a double! What transform do we need to do? Its actually pretty simple: just multiply by one less than the maximum short, which is \(32767\). Then all we have to do is assign it to both the left samples[i] and the right samples[i + 1] channels! Once we have our buffer, write it to the soundcard or audio server[4] and we hear a sweet, sweet sine wave.

Wrapping Up

This journal entry we learned a little about sound and how it can be represented mathematically. After that, there was an introduction to digital audio, including samples, sample rates and multi-channel audio. Finally a little bit of C was used, along with OpenBSD’s sndio, to play a simple sine wave, created only using code! Next entry I want to focus on creating more complex and interesting sounds. This is where Fourier transform will be put to use, allowing us to surgically filter and create something more interesting. Or maybe I’ll end up randomly trying different things out and choosing what sounds the coolest. See ya later!

Discuss on Lobsters.

Footnotes