State of


This is a guest post by David Pfahler Stephan Bönnemann of keeplook.in.

Why you’d like to use HTML5 audio in iOS 5

It’s all about apps these days. The question is whether you want to make the commitment and investment (of learning a new language) to program natively for one platform or leverage the web dev skills you already have? HTML5 is the way to go if you sympathize with the latter.

Let’s face it: The main problem for web developers that want to make apps is to deliver a great user experience. Hence, you’d like to use as much of today’s web technologies as possible to enhance the UX. One profound way of doing this is audio feedback. For example, many twitter apps use sounds if new tweets come in and another sound if no new tweets could be found. This kind of response is intuitive to the user. This is what you should go for.

Also, imagine a game without sounds. Would be kind of boring, wouldn’t it?

Naive Assumptions: What you’d expect

Fortunately in iOS we have rich HTML5 support built into mobile Safari. Initially, Steve Jobs wanted all apps to be web apps! So what you’d assume regarding audio is the presence of the HTML5 audio tag. And guess what, there is an implementation of the audio tag in iOS. Hurray!

So what do you expect from an audio tag implementation:

  • call .load(), .play(), .pause() and other methods via JavaScript
  • use remote files, dataURIs, cached files and other sources
  • play several audio files / tags at once
  • mix audio from several sources with different volumes, etc.

Reality checks: What should work but doesn’t

It becomes clear pretty fast that it’s not that simple. With iOS it’s a disaster. The bad news first:

It is not possible to load or play audio without user interaction.

Sad, but true. Apple deliberately decided that a touch event is mandatory to load and play audio. There is no workaround for this.

If we accept the fact that we need a touch event to play the audio, is it possible to use the audio tag as expected? Hmm, no, not really. So what’s broken as well?

Latency: Loading audio files on demand (after the touch event fired) is unusably slow. The desired user experience can not be achieved using this technique.

One audio tag at a time: You can never play more than one audio tag at once. This makes it impossible to mix sounds on the fly which would be necessary for games that have a background music and must also play sounds dynamically (e.g. when the character jumps).

Audio files only: You can not use anything as the source besides audio files (uncompressed WAV and AIF audio, MP3 audio, and AAC-LC or HE-AAC audio). If you could use dataURIs then it would be possible to preload the audio.

Workarounds: How to play anyways (at least somehow)

We invested a lot of time researching the audio tag on iOS 5. If you want to know how we play audio, read on.

So what do you do with this crippled audio tag? Well, you trick it a little bit and use it as much as possible. Here are several ideas that work best for different scenarios:

Bind touchstart event to body: As you have to play on a touch event, we bind a touchstart handler on the body. So wherever the user touches first, we can use this event to load the audio file. Even better, from now on, we can load and play different sources from this audio tag.

bodyEl.addEventListener("touchstart", function() {
audioNode.load(); audioNode.play();
}, false);

Hot swapping sources: We can now change the “activated” audio tag’s source on the fly. So it is now possible – without user interaction (!) – to load and play different sources. But you can only play one source at a time, of course.

audioNode.addEventListener("ended", function() {
audioNode.src = "newSource.mp3";
audioNode.load(); audioNode.play();
}, false);

Use audio sprites: The guys over at Zynga use audio sprites in their Jukebox. This means that you have a huge audio file containing one sound that you want to play, then one second of silence, then another sound and so on.

Example for a sound sprite structure:
1 second silence
First sound
1 second silence
Second sound
1 second silence
Third sound

In this file you include all the audio that you possibly want to play in your app. You only load the source once (as described above) and then jump to the relevant parts in the file. So the file is always playing but most of the time it just plays silence. When you need the sound, you can quickly jump to the part in the file that you need to play. This solution is quick (much faster than hot swapping!) but requires a long preloading phase before you can play the most sounds, because the file is bigger.

tl;dr

So audio in iOS is pretty much broken. If you are using a native wrapper like PhoneGap you could use native APIs via plugins like SoundPlug. We’ll try what is possible in the upcoming versions of iOS. Please let us know your ideas and feedback in the comments.

Update:

Regarding Philip’s comment, this is how we think it could be.

SettingsAlert

  • http://pilif.github.com Philip Hofstetter

    With my 250MB monthly data cap, I’m quite happy that playing audio and video requires user interaction as this gives me a chance to decide whether the audio/video content is important enough for me to warrant the download (hint: it probably isn’t).

    Arguably Apple could change this behavior depending on whether you are on WiFi or Mobile, but I guess that wouldn’t be very trivial to implement and could also be confusing to users as it wouldn’t be consistent UI.

  • http://github.com/martensms Christoph

    Well, I experienced pretty much the same as you.
    But I solved it the “most acceptable” way that you can have a low-delay playback on iOS. The concept uses background sounds in your sprite file … and automatic stream correction running within a sound loop.

    You could take a look at it, solves cross-device problems pretty well, runs on any Desktop, Android and iOS device supporting audio. (except IE9, because it’s buggy and crashes with memcopy :D)

    http://github.com/zynga/jukebox

    Greets from Frankfurt,
    Chris

  • http://www.haykranen.com Hay

    I’ve written two articles about the iOS video/audio tags and their quirks.

    http://weblogs.vpro.nl/digitaal/2011/11/04/why-html5-audiovideo-on-ios-is-virtually-unusable/
    http://weblogs.vpro.nl/digitaal/2011/10/24/advanced-html5-video-and-audio-use-on-ios-bugs-and-quirks/

    I’ve mostly come to the same conclusions. Another problem is that even *with* user interaction you can’t do asynchronous callbacks, so you’ll have problems doing Ajax calls for extra metadata for your audio.

  • http://twitter.com/rwaldron Rick Waldron

    This issue is equally unfortunate for video, you must click the video to play and then it will open up full screen. This is a deal breaker for projects such as Popcorn.js that are attempting to add interactive programmability to html5 video.

    Anyway, great post, thanks for bringing more attention to this issue.

  • Caleb

    I’m doing development for a product that will run strictly on Wifi enabled iPad’s… no phone data or cell-towers will be involved in any way.

    The autoplay function within HTML5 is crucial, as this is a translation application that involves multiple devices.

    I think Apple should simply allow for audio/video autoplay when you are on wifi. Problem solved!

    I literally am going to go and get a Droid tablet today due to this issue… this project also includes purchasing hundreds of these devices in the next 2 months, so Apple lost good revenue over this issue.