Video Compression

Once a video signal is digital, it requires a large amount of storage space and transmission bandwidth. To reduce the amount of data, several strategies are employed that compress the information without negatively affecting the quality of the image. Some methods are lossless, meaning that no data is lost, but most are lossy, meaning that information is thrown away that can’t be retrieved.

Some simple methods of data compression are:

Lossless Codecs

Once these basic methods have been employed, much more intensive algorithms can be employed to reduce the amount of transmitted and stored image data. Mathematical algorithms can be used to encode and decode each video frame. These codecs (such as enCode and Decode) must be installed in the VTR or software you are using to play back your video. For example, QuickTime supports many different video codecs for video export and playback.

The simplest encoding algorithm, called run-length encoding, represents strings of redundant values as a single value and a multiplier. For example, consider the following bit values:

0000000000000000000000001111111111111111000000000000000000000000

Using run-length encoding on the bit values above can reduce the amount of information to:

0 x 24, 1 x 16, 0 x 24

Or in binary:

0 [11000], 1 [10000], 0 [11000]

In the example above, the original 64 bits can be transmitted using only 18 bits.

Run-length encoding is lossless, because all the information is retained after decoding. This technique is particularly useful for computer graphics applications, because there are often large fields of identical colors.

Note: If each bit in the original image were to alternate between 0 and 1, run-length encoding would not only be ineffective, it could actually make the overall data rate higher. Each codec is designed to anticipate and compress different kinds of data patterns. For example, a codec designed for audio compression is not useful for video compression, which has very different data patterns.

Lossy Codecs

Most video codecs are necessarily lossy, because it is usually impractical to store and transmit uncompressed video signals. Even though most codecs lose some information in the video signal, the goal is to make this information loss visually imperceptible. When codec algorithms are developed, they are fine-tuned based on analyses of human vision and perception. For example, if the human eye cannot differentiate between lots of subtle variation in the red channel, a codec may throw away some of that information and viewers may never notice.

Many formats, including JPEG and all varieties of DV, use a fairly complicated algorithm called DCT encoding. Another method, called wavelet compression, is starting to be used for popular codecs, such as the Apple Pixlet video codec. DVDs, modern digital television, and formats such as HDV use MPEG-2 compression, which not only encodes single frames (intraframe, or spatial compression) but encodes multiple frames at once (interframe, or temporal compression) by throwing away data that is visually redundant over time.

About Uncompressed Video

Video that has no compression applied can be unwieldy, so it is only used for the highest-quality video work, such as special effects and color correction at the last stage of a project. Most professional projects have an offline phase that uses compressed video and then an online, finishing phase that uses uncompressed video recaptured at full resolution. Uncompressed video requires expensive VTRs and large, high-speed hard disks.

About MPEG Compression

MPEG encoding is based on eliminating redundant video information, not only within a frame but over a period of time. In a shot where there is little motion, such as an interview, most of the video content does not change from frame to frame, and MPEG encoding can compress the video by a huge ratio with little or no perceptible quality loss.

MPEG compression reduces video data rates in two ways:

  • Spatial (intraframe) compression: Compresses individual frames.
  • Temporal (interframe) compression: Compresses groups of frames together by eliminating redundant visual data across multiple frames.

Intraframe Compression

Within a single frame, areas of similar color and texture can be coded with fewer bits than the original, thus reducing the data rate with minimal loss in noticeable visual quality. JPEG compression works in a similar way to compress still images. Intraframe compression is used to create standalone video frames called I-frames (short for intraframe).

Interframe Compression

Instead of storing complete frames, temporal compression stores only what has changed from one frame to the next, which dramatically reduces the amount of data that needs to be stored while still achieving high-quality images.

Groups of Pictures

MPEG formats use three types of compressed frames, organized in a group of pictures, or GOP, to achieve interframe compression:

  • I-frames: Intra (I) frames, also known as reference or key frames, contain all the necessary data to re-create a complete image. An I-frame stands by itself without requiring data from other frames in the GOP. Every GOP contains one I-frame, although it does not have to be the first frame of the GOP. I-frames are the largest type of MPEG frame, but they are faster to decompress than other kinds of MPEG frames.
  • P-frames: Predicted (P) frames are encoded from a “predicted” picture based on the closest preceding I- or P-frame. P-frames are also known as reference frames, because neighboring B- and P-frames can refer to them. P-frames are typically much smaller than I-frames.
  • B-frames: Bi-directional (B) frames are encoded based on an interpolation from I- and P-frames that come before and after them. B-frames require very little space, but they can take longer to decompress because they are reliant on frames that may be reliant on other frames. A GOP can begin with a B-frame, but it cannot end with one.

GOPs are defined by three factors: their pattern of I-, P-, and B-frames, their length, and whether the GOP is “open” or “closed.”

GOP Pattern

A GOP pattern is defined by the ratio of P- to B-frames within a GOP. Common patterns used for DVD are IBP and IBBP. All three frame types do not have to be used in a pattern. For example, an IP pattern can be used. IBP and IBBP GOP patterns, in conjunction with longer GOP lengths, encode video very efficiently. Smaller GOP patterns with shorter GOP lengths work better with video that has quick movements, but they don’t compress the data rate as much.

Some encoders can force I-frames to be added sporadically throughout a stream’s GOPs. These I-frames can be placed manually during editing or automatically by an encoder detecting abrupt visual changes such as cuts, transitions, and fast camera movements.

GOP Length

Longer GOP lengths encode video more efficiently by reducing the number of I-frames but are less desirable during short-duration effects such as fast transitions or quick camera pans. MPEG video may be classified as long-GOP or short-GOP. The term long-GOP refers to the fact that several P- and B-frames are used between I-frame intervals. At the other end of the spectrum, short-GOP MPEG is synonymous with I-frame–only MPEG. Formats such as IMX use I-frame–only MPEG-2, which reduces temporal artifacts and improves editing performance. However, I-frame–only formats have a significantly higher data rate because each frame must store enough data to be completely self-contained. Therefore, although the decoding demands on your computer are decreased, there is a greater demand for scratch disk speed and capacity.

Maximum GOP length depends on the specifications of the playback device. The minimum GOP length depends on the GOP pattern. For example, an IP pattern can have a length as short as two frames.

Here are several examples of GOP length used in common MPEG formats:

  • MPEG-2 for DVD: Maximum GOP length is 18 frames for NTSC or 15 frames for PAL. These GOP lengths can be doubled for progressive footage.
  • 1080-line HDV: Uses a long-GOP structure that is 15 frames in length.
  • 720-line HDV: Uses a six-frame GOP structure.
  • IMX: Uses only I-frames.

Open and Closed GOPs

An open GOP allows the B-frames from one GOP to refer to an I- or P-frame in an adjacent GOP. Open GOPs are very efficient but cannot be used for features such as multiplexed multi-angle DVD video. A closed GOP format uses only self-contained GOPs that do not rely on frames outside the GOP.

The same GOP pattern can produce different results when used with an open or closed GOP. For example, a closed GOP would start an IBBP pattern with an I-frame, whereas an open GOP with the same pattern might start with a B-frame. In this example, starting with a B-frame is a little more efficient because starting with an I-frame means that an extra P-frame must be added to the end (a GOP cannot end with a B-frame).

Figure. Diagram showing the relationship of I-, P-, and B-frames in an open GOP and a closed GOP.

MPEG Containers and Streams

MPEG video and audio data are packaged into discrete data containers known as streams. Keeping video and audio streams discrete makes it possible for playback applications to easily switch between streams on the fly. For example, DVDs that use MPEG-2 video can switch between multiple audio tracks and video angles as the DVD plays.

Each MPEG standard has variations, but in general, MPEG formats support two basic kinds of streams:

  • Elementary streams: These are individual video and audio data streams.
  • System streams: These streams combine, or multiplex, video and audio elementary streams together. They are also known as multiplexed streams. To play back these streams, applications must be able to demultiplex the streams back into their elementary streams. Some applications only have the ability to play elementary streams.

MPEG-1

MPEG-1 is the earliest format specification in the family of MPEG formats. Because of its low bit rate, MPEG-1 has been popular for online distribution and in formats such as Video CD (VCD). DVDs can also store MPEG-1 video, though MPEG-2 is more commonly used. Although the MPEG-1 standard actually allows high resolutions, almost all applications use NTSC- or PAL-compatible image dimensions at quarter resolution or lower.

Common MPEG-1 formats include 320 x 240, 352 x 240 at 29.97 fps (NTSC), and 352 x 288 at 25 fps (PAL). Maximum data rates are often limited to around 1.5 Mbps. MPEG-1 only supports progressive-scan video.

MPEG-1 supports three layers of audio compression, called MPEG-1 Layers 1, 2, and 3. MPEG-1 Layer 2 audio is used in some formats such as HDV and DVD, but MPEG-1 Layer 3 (also known as MP3) is by far the most common. In fact, MP3 audio compression has become so popular that it is usually used independently of video.

MPEG-1 elementary stream files often have extensions such as .m1v and .m1a, for video and audio, respectively.

MPEG-2

The MPEG-2 standard made many improvements to the MPEG-1 standard, including:

  • Support for interlaced video

  • Higher data rates and larger frame sizes, including internationally accepted standard definition and high definition profiles

  • Two kinds of multiplexed system streams—Transport Streams (TS) for unreliable network transmission such as broadcast digital television, and Program Streams (PS) for local, reliable media access (such as DVD playback)

MPEG-2 categorizes video standards into MPEG-2 Profiles and MPEG-2 Levels. Profiles define the type of MPEG encoding supported (I-, P-, and B-frames) and the color sampling method used (4:2:0 or 4:2:2 Y′CBCR). For example, the MPEG-2 Simple Profile (SP) supports only I and P progressive frames using 4:2:0 color sampling, whereas the High Profile (HP) supports I, P, and B interlaced frames with 4:2:2 color sampling.

Levels define the resolution, frame rate, and bit rate of MPEG-2 video. For example, MPEG-2 Low Level (LL) is limited to MPEG-1 resolution, whereas High Level (HL) supports 1920 x 1080 HD video.

MPEG-2 formats are often described as a combination of Profiles and Levels. For example, DVD video uses Main Profile at Main Level (MP @ ML), which defines SD NTSC and PAL video at a maximum bit rate of 15 Mbps (though DVD limits this to 9.8 Mbps).

MPEG-2 supports the same audio layers as MPEG-1 but also includes support for multichannel audio. MPEG-2 Part 7 also supports a more efficient audio compression algorithm called Advanced Audio Coding, or AAC.

MPEG-2 elementary stream files often have extensions such as .m2v and .m2a, for video and audio, respectively.

MPEG-4

MPEG-4 inherited many of the features in MPEG-1 and MPEG-2 and then added a rich set of multimedia features such as discrete object encoding, scene description, rich metadata, and digital rights management (DRM). Most applications support only a subset of all the features available in MPEG-4.

Compared to MPEG-1 and MPEG-2, MPEG-4 video compression (known as MPEG-4 Part 2) provides superior quality at low bit rates. However, MPEG-4 supports high-resolution video as well. For example, Sony HDCAM SR uses a form of MPEG-4 compression.

MPEG-4 Part 3 defines and enhances AAC audio originally defined in MPEG-2 Part 7. Most applications today use the terms AAC audio and MPEG-4 audio interchangeably.

MPEG-4 Part 10, or H.264

MPEG-4 Part 10 defines a high-quality video compression algorithm called Advanced Video Coding (AVC). This is more commonly referred to as H.264. H.264 video compression works similarly to MPEG-1 and MPEG-2 encoding but adds many additional features to decrease data rate while maintaining quality. Compared to MPEG-1 and MPEG-2, H.264 compression and decompression require significant processing overhead, so this format may tax older computer systems.