Mark Pilgrim – A Gentle Introduction to Video Encoding: Lossy Video Codecs

by Simon. Average Reading Time: about 7 minutes.

This article was first published on 19th December 2008, on Mark Pilgrim’s website. That website no longer exists so this article serves as an historical record. I have preserved all emphasis and links as per the original article.

The most important consideration in video encoding is choosing a video codec. A future article will talk about how to pick the one that’s right for you, but for now I just want to introduce the concept and describe the playing field. (This information is likely to go out of date quickly; future readers, be aware that this was written in December 2008.)

When you talk about “watching a video,” you’re probably talking about a combination of one video stream, one audio stream, and possibly some subtitles or captions. But you probably don’t have two different files; you just have “the video.” Maybe it’s an AVI file, or an MP4 file. These are just container formats, like a ZIP file that contains multiple kinds of files within it. The container format defines how to store the video and audio streams in a single file (and subtitles too, if any).

When you “watch a video,” your video player is doing several things at once:

  1. Interpreting the container format to find out which video and audio tracks are available, and how they are stored within the file so that it can find the data it needs to decode next
  2. Decoding the video stream and displaying a series of images on the screen
  3. Decoding the audio stream and sending the sound to your speakers
  4. Possibly decoding the subtitle stream as well, and showing and hiding phrases at the appropriate times while playing the video

A video codec is an algorithm by which a video stream is encoded, i.e. it specifies how to do #2 above. Your video player decodes the video stream according to the video codec, then displays a series of images, or “frames,” on the screen. Most modern video codecs use all sorts of tricks to minimize the amount of information required to display one frame after the next. For example, instead of storing each individual frame (like a screenshot), they will only store the differences between frames. Most videos don’t actually change all that much from one frame to the next, so this allows for high compression rates, which results in smaller file sizes. (There are many, many other complicated tricks too, which I’ll dive into in a future article.)

There are lossy and lossless video codecs; today’s article will only deal with lossy codecs. A lossy video codec means that information is being irretrievably lost during encoding. Like copying an audio cassette tape, you’re losing information about the source video, and degrading the quality, every time you encode. Instead of the “hiss” of an audio cassette, a re-re-re-encoded video may look blocky, especially during scenes with a lot of motion. (Actually, this can happen even if you encode straight from the original source, if you choose a poor video codec or pass it the wrong set of parameters.) On the bright side, lossy video codecs can offer amazing compression rates, and many offer ways to “cheat” and smooth over that blockiness during playback, to make the loss less noticeable to the human eye.

There are tons of video codecs. Today I’ll discuss five modern lossy video codecs: MPEG-4 ASP, H.264, VC-1, Theora, and Dirac.


a.k.a. “MPEG-4 Advanced Simple Profile.” MPEG-4 ASP was developed by the MPEG group and standardized in 2001. You may have heard of DivX, Xvid, or 3ivx; these are all competing implementations of the MPEG-4 ASP standard. Xvid is open source; DivX and 3ivx are closed source. The company behind DivX has had some mainstream success in branding “DivX” as synonymous with “MPEG-4 ASP.” For example, this “DivX-certified” DVD player can actually play most MPEG-4 ASP videos in an AVI container, even if they were created with a competing encoder. (To confuse things even further, the company behind DivX has now created their own container format.)

MPEG-4 ASP is patent-encumbered; licensing is brokered through the MPEG LA group. MPEG-4 ASP video can be embedded in most popular container formats, including AVI, MP4, and MKV.


a.k.a. “MPEG-4 part 10,” a.k.a. “MPEG-4 AVC,” a.k.a. “MPEG-4 Advanced Video Coding.” H.264 was also developed by the MPEG group and standardized in 2003. It aims to provide a single codec for low-bandwidth, low-CPU devices (cell phones); high-bandwidth, high-CPU devices (modern desktop computers); and everything in between. To accomplish this, the H.264 standard is split into “profiles,” which each define a set of optional features that trade complexity for file size. Higher profiles use more optional features, offer better visual quality at smaller file sizes, take longer to encode, and require more CPU power to decode in real-time.

To give you a rough idea of the range of profiles, Apple’s iPhone supports Baseline profile, the AppleTV set-top box supports Baseline and Main profiles, and Adobe Flash on a desktop PC supports Baseline, Main, and High profiles. YouTube (owned by Google, my employer) now uses H.264 to encode high-definition videos, playable through Adobe Flash; YouTube also provides H.264-encoded video to mobile devices, including Apple’s iPhone and phones running Google’s Android mobile operating system. Also, H.264 is one of the video codecs mandated by the Blu-Ray specification; Blu-Ray discs that use it generally use the High profile.

Most non-PC devices that play H.264 video (including iPhones and standalone Blu-Ray players) actually do the decoding on a dedicated chip, since their main CPUs are nowhere near powerful enough to decode the video in real-time. Recent high-end desktop graphics cards also support decoding H.264 in hardware. There are a number of competing H.264 encoders, including the open source x264 library. The H.264 standard is patent-encumbered; licensing is brokered through the MPEG LA group. H.264 video can be embedded in most popular container formats, including MP4 (used primarily by Apple’s iTunes Store) and MKV (used primarily by video pirates).


VC-1 evolved from Microsoft’s WMV9 codec and was standardized in 2006. It is primarily used and promoted by Microsoft for high-definition video, although, like H.264, it has a range of profiles to trade complexity for file size. Also like H.264, it is mandated by the Blu-Ray specification, and all Blu-Ray players are required to be able to decode it. The VC-1 codec is patent-encumbered, with licensing brokered through the MPEG LA group.

Wikipedia has a brief technical comparison of VC-1 and H.264; Microsoft has their own comparison; Multimedia.cx has a pretty Venn diagram outlining the similarities and differences. Multimedia.cx also discusses the technical features of VC-1. I also found this history of VC-1 and H.264 to be interesting (as well as this rebuttal).

VC-1 is designed to be container-independent, although it is most often embedded in an ASF container. An open source decoder for VC-1 video was a 2006 Google Summer of Code project, and the resulting code was added to the multi-faceted ffmpeg library.


Theora evolved from the VP3 codec and has subsequently been developed by the Xiph.org Foundation. Theora is a royalty-free codec and is not encumbered by any known patents other than the original VP3 patents, which have been irrevocably licensed royalty-free. Although the standard has been “frozen” since 2004, the Theora project (which includes an open source reference encoder and decoder) only hit 1.0 in November 2008.

Theora video can be embedded in any container format, although it is most often seen in an Ogg container. All major Linux distributions support Theora out-of-the-box, and Mozilla Firefox 3.1 will include native support for Theora video in an Ogg container. And by “native”, I mean “available on all platforms without platform-specific plugins.” You can also play Theora video on Windows or on Mac OS X after installing Xiph.org’s open source decoder software.

The reference encoder included in Theora 1.0 is widely criticized for being slow and poor quality, but Theora 1.1 will include a new encoder that takes better advantage of Theora’s features, while staying backward-compatible with current decoders. (Info: 1, 2, 3, 4, 5, source code.)


Dirac was developed by the BBC to provide a royalty-free alternative to H.264 and VC-1 that the BBC could use to stream high-definition television content in Great Britain. Like H.264, Dirac aims to provide a single codec for the full spectrum of very low- and very high-bandwidth streaming. Dirac is not encumbered by any known patents, and there are two open source implementations, dirac-research (the BBC’s reference implementation) and Schroedinger (optimized for speed).

The Dirac standard was only finalized in 2008, so there is very little mainstream use yet, although the BBC did use it internally during the 2008 Olympics. Dirac-encoded video tracks can be embedded in several popular container formats, including MP4, Ogg, MKV, and AVI. VLC 0.9.2 (released in September 2008) can play Dirac-encoded video within an Ogg or MP4 container.

And on and on…
Of course, this is only scratching the surface of all the available video codecs. Video encoding goes way back, but my focus in this series is on the present and near-future, not the past. If you like, you can read about MPEG-2 (used in DVDs), MPEG-1 (used in Video CDs), older versions of Microsoft’s WMV family, Sorenson, Indeo, and Cinepak.

This article has been tagged

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Other articles I recommend

Mark Pilgrim – A Gentle Introduction to Video Encoding: Container Formats

You may think of video files as “AVI files” or “MP4 files.” In reality, “AVI” and “MP4″ are just container formats. Just like a ZIP file can contain any sort of file within it, video container formats only define how to store things within them, not what kinds of data are stored. (It’s a little more complicated than that, because not all video streams are compatible with all container formats, but never mind that for now.) A video file usually contains multiple tracks — a video track (without audio), one or more audio tracks (without video), one or more subtitle/caption tracks, and so forth. Tracks are usually interrelated; an audio track contains markers within it to help synchronize the audio with the video, and a subtitle track contains time codes marking when each phrase should be displayed. Individual tracks can have metadata, such as the aspect ratio of a video track, or the language of an audio or subtitle track. Containers can also have metadata, such as the title of the video itself, cover art for the video, episode numbers (for television shows), and so on.

Mark Pilgrim – A Gentle Introduction to Video Encoding: Lossy Audio Codecs

Unless you’re going to stick to films made before 1927 or so, you’re going to want an audio track. A future article will talk about how to pick the audio codec that’s right for you, but for now I just want to introduce the concept and describe the playing field. (This information is likely to go out of date quickly; future readers, be aware that this was written in December 2008.)

Mark Pilgrim – A Gentle Introduction to Video Encoding: Constraints

I had lunch with my father the other day, and I explained this series as well as I could to someone who didn’t start programming when he was 11. His immediate reaction was, “Why are there so many different formats? Why can’t everybody just agree on a single format? It is political, or technical, or both?” The short answer is, it’s both. The history of video in any medium — and especially since the explosion of amateur digital video — has been marred by a string of companies who wanted to use container formats and video codecs as tools to lock content producers and content consumers into their little fiefdoms. Own the format, own the future. And when I say “history” — well, it’s still going on.