This article was first published on 8th January 2009, on Mark Pilgrim’s website. That website no longer exists so this article serves as an historical record. I have preserved all emphasis and links as per the original article.
I had lunch with my father the other day, and I explained this series as well as I could to someone who didn’t start programming when he was 11. His immediate reaction was, “Why are there so many different formats? Why can’t everybody just agree on a single format? It is political, or technical, or both?” The short answer is, it’s both. The history of video in any medium — and especially since the explosion of amateur digital video — has been marred by a string of companies who wanted to use container formats and video codecs as tools to lock content producers and content consumers into their little fiefdoms. Own the format, own the future. And when I say “history” — well, it’s still going on. Tried to play a Windows Media Video on Mac OS X lately? The codec and container support is out there, but it’s not baked in. Want to watch movie trailers on Apple.com? Please install QuickTime. And so forth and so on. The only thing that was pre-installed on both platforms was Flash, so when a few startups dipped their toes into the Internet video waters, the ones that used Flash Video won despite it being an objectively inferior codec. (Some revision of Flash 9 added support for H.264 video, AAC audio, and the MP4 container, which is what YouTube HD uses.)
So that’s the politics. But there are also technical barriers. As with all engineering, video encoding is primarily about constraints. I can think of 10 just off the top of my head:
- CPU capacity for decoding and playing in real time. This is one of the most important constraints, since video is meant to be watched in real time. That sounds simple, but it’s incredibly complex. Every video you’ve ever watched in your entire life had to be decoded and played in real time. Otherwise it stutters and the viewing experience sucks. And we’re talking about video here; if the viewing experience sucks, there’s nothing left. Some codecs are just more complex than others, and that translates into higher system requirements to decode videos in real time. As I’ve mentioned before, some codecs are now decoded by specialized hardware. iPhones have a little chip inside them that understands H.264 Baseline Profile; without that, the iPhone would need a Core 2 Duo processor to play movies, and it would have a battery life of 10 minutes.
- Codec compatibility. Normal people won’t download codecs or plug-ins just to watch a dog on a skateboard, or even to watch a trailer for a $100 million blockbuster. (Sadly, they will download plug-ins for porn, but those are invariably trojan horses. Or so I’ve read. Moving on…) The phone in your pocket can probably play AMR ringtones, maybe MP3 ringtones, but probably not Vorbis ringtones (unless you have an Android phone) — and you probably couldn’t download new codecs even if you wanted to (which, I must reiterate, nobody wants to). Apple and Real Networks tried for years to corner the web video market, but 99% of schmucks with a browser have Flash, so Flash video won on the web. Meanwhile, Firefox 3.1 will ship with support for the
<video>element but will only support Theora and Vorbis in an Ogg container — even if your underlying operating system ships with other codecs.
- CPU capacity for encoding. Encoding takes a long time. Taking my home movie from iMovie to a DVD used to take 8 hours on a Powerbook G4 laptop. These days you can rip a DVD movie with Xvid in 30 minutes, or you can rip it with a more complex codec with all optional features turned on, and maybe it’ll still take 8 hours. It’ll look better, but will it look 16 times better? If you’re only doing it once, maybe you don’t care. If you’re running YouTube and people are uploading 13 hours of video every minute, maybe you do. CPU cycles aren’t free; at that scale, they’re not even cheap. (That’s a real statistic, by the way; I got it from the page on the Google intranet entitled “What can we tell non-Googlers?” and it’s accurate as of September 2008.)
- Acceptable delay between recording and delivery. In my own experience, videos I’ve uploaded on YouTube are available within minutes, which is just mind-boggling when you consider the volume. If you’re re-encoding a live stream, even a few minutes delay is probably unacceptable. That means you’ll need a faster encoder, a less complex codec, or lower quality settings.
- Audience size. It’s not a big secret that lots of video on the Internet looks like crap. Partly that’s because the video uploader uploaded crappy video, but it’s also because most Internet videos are only watched by a few people, and it’s just not a worthwhile tradeoff to spend 8 hours re-encoding it. On the other hand, if you’re mastering a DVD that’ll get sold to 10 million people, you’ll probably use higher quality settings.
- Screen dimensions. DVDs can’t store high-def 1920 x 1080 video because the standard doesn’t allow for it, which makes perfect sense because it was designed around the screen resolution of standard-def TVs. Blu-Ray ups the limit, but there’s still a limit. Screen sizes vary more for PC video, but there will always be practical upper limits depending on your audience.
- My bandwidth. If you’re streaming or downloading video, some percentage of your audience is probably living in a third-world country like the United States, with limited broadband access, slow speeds, and monthly bandwidth caps. Larger file size = longer wait to play = fewer videos watched overall.
- Your bandwidth. Obviously every bit I download is a bit that you upload, and bandwidth ain’t free either. “When I get a little money I buy bandwidth; and if any is left I buy food and clothes.” Or something like that.
- Hard limits on storage size. As I mentioned before, physical media has upper limits on total size. Commercial DVDs can hold upwards of 9 GB, which seems like a lot but really isn’t. Blu-Ray maxes out at 50 GB, which seems like a lot but really isn’t.
- Patents / licensing costs. Did I mention that most popular video codecs are patent-encumbered? This is why Wikimedia uses Theora exclusively, and why Firefox can ship a native Theora decoder and but won’t ever ship H.264.
…and that’s the short list.
All of which leads me to the Zen of video encoding, which is this:
There is no right or wrong. There is only what works and what doesn’t.
If you can find even one combination of tools, delivery devices, and target platforms that satisfies your constraints and still accomplishes your goals, congratulations. You’re ahead of 99% of the people who’ve tried.