What birdsong and backends can teach us about magic

Have you ever had a magical experience with software? I have.

The magic of Merlin

Merlin is an app for bird identification, from the Cornell Lab of Ornithology. I first picked it up because the default action at the time was five simple questions. It turns out that these five questions (location, time of year, size, color, and what the bird was doing) are enough to narrow down candidate birds to just a few options. Matching the right bird from the shortlist was easy. And I didn’t need to fumble to snap a blurry photo nor struggle through unreliable AI-powered shenanigans.

A search results page, with the top result Steller's Jay. It is a bird with a large black crest on its head, and a bright blue body.

The Steller’s Jay is one of the most flashy birds in the Bay Area, and it’s the one that got me into identifying birds.

From their blog post:

Merlin is not the first to use deep convolutional neural networks to identify birds by their sounds.

[...] Previous bird sound ID models have typically been trained using data with a coarser level of temporal resolution. For instance, a model might hear a 30 second recording of a White-breasted Nuthatch, but not be told when the nuthatch is singing in the recording. This can lead to problems: if other species are singing in the same recording, the model will erroneously call all species in the recording a White-breasted Nuthatch, leading to false predictions. 

Merlin’s Sound ID tool is trained using audio data which includes the precise moments in time when each bird is vocalizing. The process of generating this data is labor intensive, because it requires sound ID experts to listen to each audio file carefully. As a result of these efforts, the model has the opportunity to learn a more accurate representation of which sounds correspond to which species (and which sounds are ambient noises).

A screenshot of the MerlinVision app. It shows a spectrogram of audio data, with different colored boxes drawn around sections to identify birds like the Marsh Wren, Black Tern, and more.

We built a custom annotation tool that allows sound ID experts to listen to Macaulay Library recordings and annotate the precise moments when different bird species are vocalizing.

Benjamin Hoffman and Grant Van Horn for the Macaulay Library: Behind the Scenes of Sound ID in Merlin

My emphasis added in bold above - the magic wasn’t created by hardware or ML architecture improvements. It was created by expert birders spending hours listening and drawing boxes on top of spectrograms. 

What an unreasonable amount of work! And what a beautiful outcome! 

Teller? I hardly know her!

It reminds me of a story I saw about Penn and Teller, the famous magician duo. Allen Pike tells the story better than I could:

Years ago, Teller performed a magic trick.

First, he’d have you pick a card. He would attempt to produce the card, but fail, indicating the card may have travelled elsewhere. He’d then lead you on a short walk to a nearby park, and then be inspired to dig a hole. Buried there, beneath undisturbed grass, was a box. When opened, the box would, somehow, contain the card you’d chosen. An impossible trick.

To create this magical moment, he had to do something you wouldn’t expect: he’d gone out into the park and buried a number of boxes, corresponding to potential cards one might choose. Then, he waited months – until the grass had grown over. Only then could he perform the trick.

Deducing what card you’ve picked is a well-known sleight. But performing a trick where your card is seamlessly buried requires so much advance preparation that it seems impossible.

Allen Pike: An Unreasonable Amount of Time

The beauty is that anyone could have done this. No individual step is insane - a bit of memorization, a bit of digging and burying. But we’ve all got other responsibilities, priorities, and other what-have-yous. No reasonable person would plan so many months ahead with this tedium. But regardless, one person did.

(Side note: his full name is just Teller.)

Teller describes the underlying principle like so:

“Sometimes magic is just someone spending more time on something than anyone else might reasonably expect.”

Allen Pike: An Unreasonable Amount of Time

And if you look at it from the other direction, that means that you - yes, you personally 🫵 - have the opportunity to produce magical experiences without any “secret sauce” beyond your willingness to put in the work. But it might not come easily.

Progress

Everyone who writes code goes through this emotional journey. It’s an uphill battle figuring out the basics. Finally, you get the hang of it. You’re capable of doing anything you want, and that feeling is the highest of highs.

Then you hit the lows: when you realize all the interesting parts are farmed out to tech companies doing the real heavy lifting. You started to build your perfect life management app, but your personal contribution is 100% glue code, between Google and Plaid and OpenAI and Twilio and Home Assistant and a dozen other services. When you want to do something and get stuck because there’s no off-the-shelf API to deal with it, that’s the worst feeling of all: realizing that you were never that powerful to begin with. 

Everyone who writes code goes through this. Everyone who creates anything goes through this. Having learned to code before LLMs, I can only imagine how hard it is now - easier to get a taste of the good life, harder still to learn the skills needed to make it great. It’s disillusioning to realize you’ve come so far from the start, but you’re still so far from making an impact. Even those cool algorithms fade away as you write your hundredth boring business logic if-statement. 

Is this all there is?

It’s easy to get jaded. But as you keep going, you find that you can make a difference. You pick up domain experience and life experience, novel insights, and the ability to contribute. Maybe you find yourself spending a nonsensical amount of time chipping away at some problem (and writing a lot of if-statements to do so); after all, all progress depends on the unreasonable man

Back to the workshop

The funny thing about software is that the magic does become invisible. Teller can only dig up so many boxes in a week, but a $5 cloud server can easily do millions of requests. Just about every application is built on top of other people’s abstractions, wrapped up so neatly that you never have to think about the insides.

The founder of Twilio talked about the before-times so much that they became familiar, like a bedtime story: it used to be that if you wanted to connect to a telecom company, they’d quote you five years and millions of dollars to get hooked up. Then he’d live-code the Twilio way in less than five minutes and one dollar. Like magic: you never saw the true amount of investment and preparation and effort and time behind it. 

At my current job at Stytch (note: my writing is always my own here, never my employer’s), I often show a demo where we detect a malicious bot and block it. But it’s rare that we peel back the curtain to show all the infrastructure we built to understand what real users and browsers look like, and what tools bad actors are using to try to avoid detection, and the subtle warning flags that can be picked up if you know what to look for. 

It’s unbelievable how much software is like that: built on hours, weeks, years of running every version of countless browsers, peeking into private forums to learn about the latest anti-detect, and god-knows-whatever’s-needed to send a text message. It’s rare to make progress through a genuinely new technical advancement. But every day, someone is putting in unreasonable effort and making things better. Someone listened to all those recordings of birdsong so that I can identify the white-crowned sparrows whistling up in the trees.

Back in high school band camp (really), someone gifted me a quote that has stuck as a starred thought in my mind ever since -

There are three phases in life:

  1. First, you believe in Santa.

  2. Then, you don’t believe in Santa.

  3. Finally, you become Santa.

Are you shaking your head at the naivete? Or are you ready to deliver some presents?

Next
Next

Why do Sublime Text and VS Code use Ctrl-Shift-P instead of Ctrl-K for the command bar?