Just Hum a Few Bars

Aug 1, 2003 12:00 PM, By Oliver Masciarotte


Education Guide

Mix is gearing up to present its longstanding annual Audio Education Guide in its November 2014 issue. Want to have your school listed in the directory, or do you need to update your current directory listing? Add an image, program description, or a logo to your listing! Get your school in the Mix Education Guide 2014.

By now, you've probably come across some appliance or service that recognizes human speech: your cell phone, perhaps, or the customer service call-in line at your credit card company. What you may not have realized is that a related technology is at work, instigated by “The Man,” and put in place to listen in on radio and TV transmissions solely to recognize songs and performances. Why would anyone set up these little music spies? What's going on with this technology?

There are several different machines that recognize audio, whether it is speech or music. By and large, they all share one thing in common: These machines “listen to” and process a sample of any material that they later recognize or match. This is an application of heuristics, learning from practical experience. Several specialized audio recognizers of human speech are available from IBM, MacSpeech and ScanSoft, and I can tell you from seemingly endless hours of “practical experience” that machine recognition of continuous or natural speech is one of the toughest problems in computing.

In contrast, music recognition is a good bit easier, as any particular performance, once it's recorded, is “etched in stone,” so to speak. The spectral makeup, timing and amplitude variations are fixed; and only global gain changes, noise and distortion are added when the performance is reproduced. That fact has spawned several vendors to sell recognition tools and services: One of these is Comparisonics Corporation, makers of the findsounds.com service. Findsounds.com lets you type in descriptors and its engine will return site URLs that host sounds that match your needs. This can be useful for multimedia producers and musicians who are hunting for that perfect effect or sample. Another heuristic audio search product is SoundFisher, a cross-platform, database-management system featuring content-based recognition, matching and retrieval.

A more interesting and difficult application of music-recognition technology deals with digital-rights management and performance metrics. This is where those machine spies come in. Two companies, Audible Magic and Relatable, are using their audio feature-identification smarts to monitor network traffic, especially P2P activity, recordings on optical and magnetic media and radio broadcasts. Audible Magic, in particular, has acquired quite a few companies, including SoundFisher's developer, in an effort to be the one-stop shop to control content in modern media's chaotic world.

Both Relatable and Audible Magic have products that “sniff” IP packets and “listen” to the audio being carried within file transfers. They've tried to go beyond mere identification to actually block illegal files, but so far, it hasn't worked as planned. The computational and network resources to recognize, validate and block illegal music-carrying packets in real time are still some ways away.

A third company, the solution provider formerly known as Cantametrix, is now part of Gracenote, those CDDB guys. For those of you who don't get out much, CDDB is the largest commercial database of CD metadata, which many MP3 player applications rely on to provide disc and song titles. According to Gracenote, its “information services are used by leading media players including AOL's Winamp, Apple's iTunes and RealNetworks' RealOne Player.” Leading CE manufacturers, including Pioneer, Philips and Sony, incorporate Gracenote's CDDB technology into their latest generation of home, mobile and portable music products.

In addition to the commercial products I've already mentioned, there are several Open Source or freely downloadable software whatsitz that also do the heuristics dance. One is MusicBrainz's Tagger, a Win application that “allows you to automatically look up the tracks in your music collection and then write clean metadata tags [ID3 tags or Vorbis comment fields] to your files. As you tag the files in your collection that MusicBrainz didn't recognize, you submit the acoustic fingerprints [TRM IDs] of your files back to the server. Submitting acoustic fingerprints will allow MusicBrainz to automatically identify these tracks in the future so that other people using the Tagger can benefit.” TRM IDs are profiles typically generated by Relatable's TRM audio fingerprinting technology. A version of TRM's audio feature extraction client was used by the MusicBrainz project.

Another no-cost machine is SWMUMDIS, a “universal tool to develop and explore audio representations that process the ridges” of a preprocessed spectrogram. SWMUMDIS is a demonstration of research principals and not a product, even by Open Source standards, but it does serve as a point of reference for further development by pointy-headed programmers.

Other music-recognition uses include automatic quality assessment and visualization of parameters such as spectral content, which makes rapid identification of sections easier for editing. Another utility application is quality control. The International Telecommunication Union (ITU) created the PEAQ (Perceptual Evaluation of Audio Quality) standard for objective machine evaluation of perceptually coded audio, of which the MP3 codec is a widespread example. Basically, PEAQ software “listens” to incoming audio, makes an evaluation based on a model of human hearing and that subjective factor we refer to as “quality,” and then rates the audio in real time. This is invaluable for broadcasters, replicators and anyone who needs a way to monitor their “product” while never tiring or growing bored with the program material. PEAQ's quality assessment is based on a group of trained human listeners whose talents were baked into software. PEAQ-based products are available as software-only and hardware implementations.

These days, the audio data-sniffing field is crowded enough that participants are vying for mind share by claiming the fastest recognition time — “I can name that tune in a dozen notes!” “Hah! I laugh at your algorithms! I can name that tune in half as many!” — and so it goes until, at some point, the programs will be able to name that tune with just one note, and then we can all retire and let the computers do our work. The world of machine intelligence and audio recognition may someday provide a truly useful product to, say, automatically assemble a soundtrack for your life. But until then, audio recognition remains a useful tool primarily for bean counters and intellectual-property cops. Just remember that, even in space, something can hear your Stratocaster scream.

OMas' computer auto-assembled this column while he was preparing a delicately toasted cheese sandwich. All that time, he and his PowerBook were under the influence of Morcheeba's latest, Charango, and the wide-ranging styles of new Brit-pop kids, Delays.

Pedant in a Box Spectrogram

A spectrogram is a visualization technique for acoustic events or audio material. Spectrograms provide a time vs. frequency and amplitude plot and can be real or out of real time. Nowadays, most spectrograms map frequency to a predefined color table to visually clarify the plot. Forensic investigators, audio restorers and speech pathologists routinely employ spectrograms in their work.

The following two spectrograms are from SoundHack and Frequency, the poor man's Retouch. The color plot from SoundHack shows a stereo folk-rock .AIFF file. Notice the tempo appears as almost a grid of vertical beats, while the monochrome Frequency screenshot displays my voice. The selected utterance is the word “SCSI.” For both, the X axis is time from left to right, while the Y axis is frequency.

Acceptable Use Policy
blog comments powered by Disqus

Mix Books

Modern Recording and Mixing

This 2-DVD set will show you how the best in the music industry set up a studio to make world-class records. Regardless of what gear you are using, the information you'll find here will allow you to take advantage of decades of expert knowledge. Order now $39.95

Mastering Cubase 4

Electronic Musician magazine and Thomson Course Technology PTR have joined forces again to create the second volume in their Personal Studio Series, Mastering Steinberg's Cubase(tm). Edited and produced by the staff of Electronic Musician, this special issue is not only a must-read for users of Cubase(tm) software, but it also delivers essential information for anyone recording/producing music in a personal-studio. Order now $12.95



Delivered straight to your inbox every other week, MixLine takes you straight into the studio, with new product announcements, industry news, upcoming events, recent recording/post projects and much more. Click here to read the latest edition; sign up here.

MixLine Live

Delivered straight to your inbox every other week, MixLine Live takes you on the road with today's hottest tours, new sound reinforcement professional products, recent installs, industry news and much more. Click here to read the latest edition; sign up here.

[an error occurred while processing this directive]

The Wire, a virtual press conference offering postings of the latest gear and music news, direct from the source. Visit the The Wire for the latest press postings.