Build Your Own Google Neural Synthesizer

Articles

A “piano-flute” is just one of the crazy instruments you can build with the NSynth open-source deep-⁠learning project

My teenage son has become interested in making music. In my generation, that would have meant picking up an electric guitar and forming a garage band. Instead, he’s installed a digital-audio workstation on his laptop, studied up on music theory, and started composing “EDM,” or electronic dance music. Frankly, I don’t understand what he’s doing.

I would much prefer that he spend some of his free hours honing his programming skills, and I keep suggesting that he explore one of the machine-learning frameworks now available. Although he’s expressed interest and has started to explore Torch, he’s not found anything that would make him really dive in.

So, while I’m not musical myself, my eyes lit up when I stumbled on Google’s new neural music-synthesis project NSynth (Neural Synthesizer). This, I thought, might be just the ticket to get my music-giddy son hooked on the amazing things possible with machine learning.

NSynth uses a deep neural network to distill musical notes from various instruments down to their essentials. Google’s developers first created a digital archive of some 300,000 notes, including up to 88 examples from about 1,000 different instruments, all sampled at 16 kilohertz. They then input those data into a deep-learning model that can represent all those wildly different sounds far more compactly using what they call “embeddings.” That exercise supposedly took about 10 days running on thirty-two K40 graphics [PDF] processing units.

Why do that? Well, with those results, you can now answer a question like “What do you get when you cross a piano with a flute?” (Musicians: Insert joke here.)

It would, of course, be easy enough to add together the two very distinct sounds of each instrument playing, say, middle C. But that would just sound like the two instruments playing the same note at once. NSynth allows you to combine the two sets of embeddings and create a virtual piano-flute, the sound of which can be synthesized using NSynth’s neural decoder.

What’s more, the Google team designed a piece of open-source hardware called NSynth Super, which allows you to combine as many as four instruments at once. I figured that building the synthesizer and experimenting with it would be a perfect father-son project.

Alas, my son isn’t particularly adept with hardware, so construction fell mostly on me. Google posted a good set of instructions, so putting it together was fairly straightforward.

I ordered a premade printed circuit board (PCB) for the project, which cost US $20 on Tindie, making it considerably less expensive than it would have been had I tried to have this rather large board fabricated myself. The same vendor sells a $60 version fully populated with its many surface-mount components, but it was out of stock.

So I ordered the bare board and components separately. On Hackaday, I found a complete bill of materials with links to suggested suppliers, which was handy. Still, a few parts were hard to procure. In particular, the rotary encoders used to assign instruments were unavailable, but I couldn’t see any harm in substituting the 12-indent-per-revolution versions the design specified with 18-indent versions of the same part. [READ MORE]