I want to take you on small journey into color spaces and what place they take in our everyday life.
YCbCr and RGB are two most used color spaces there is, and while RGB is everywhere when we talk about computer graphics or screens YCbCr is dominate all types of media that we see everyday.
Our experiment will be quite simple, we will take a single RGB image and convert it to YCbCr, and track what happens along the way.
Commands for all steps will be supplied below images or as actions performed so you can try it yourself.
As source for experiment i will use a frame from Peru 8K HDR 60FPS (FUHD) It was taken from 8k source and resized to 1920x1080.`ffmpeg -i 8k_extract.png -pix_fmt rgb24 -vf scale=-2:1080 1080p.png`
All images in this post are PNG 8 bit images and are original size. Feel free to right click and open image in new tab.
YCbCr and RGB structure
Both YCbCr and RGB contain 3 planes which together form complete image.
Both color spaces have different bit depths. Bit depth refers to number of pixels that is used to represent value in each of the planes. For example 8 bit would give range of 0-255 possible values. While 10 bit will give 0-1023.
In case of RGB those planes Red,Green,Blue.
Each plane dimensions are the same and equal to image resolution. RGB is really convenient as each channel contains level of intensity of light that added together can form any color we want.
Most popular bit depths are 8,16,32 bits.
Below are 2 images of each canal, first in grayscale where value of pixel displayed as it brightness and second is channel in their respective color.
Without any compression, each plane is exactly
Width x Height x Bit depth
1920 x 1080 x 8 = 2073600
Whole image is
6220800 bytes, or
Red`ffmpeg -i 1080p.png -filter_complex "extractplanes=r" r.png`
Green`ffmpeg -i 1080p.png -filter_complex "extractplanes=g" g.png`
Blue`ffmpeg -i 1080p.png -filter_complex "extractplanes=b" r.png`
YCbCr separates luminance(brightness) from chrominance(color), and designed to efficiently store visual information, but require conversion before being displayed.
YCbCr transformed from RGB in such way that most of brightness information is stored in Y plane (luma) and most of color information of pixels is stored in Cb and Cr planes.
Most of the details, contrast, and brightness are moved to Y plane, which leaves Cb and Cr planes rather flat and smooth. That would be apparent on planes preview.
Subsampling takes advantage of that by reducing the resolution of Cb and Cr planes without significant visual difference for reconstructed image. This is used widely in image and video compression. Absolute majority of media that you will encounter is subsampled. To benefits of this step we will return later.
4:2:0 is most used and mean that our Cb and Cr planes reduced in
width and height in half.
We can convert PNG image into YUV 4:2:0 with following command:
ffmpeg -i 1080p.png -pix_fmt yuv420p 1080p.yuv
* yuv only contains the data, without information about resolution and subsampling, so usually y4m used instead, which contains headers with such information
Which gives us next resolutions of our planes in this example:
Y - 1920x1080
Cb - 960x540
Cr - 960x540
Bellow is grayscale image of each Y, Cb, Cr channels, where Cb and Cr channels are subsampled. Each image is extracted from YUV which have 4:2:0 subsampling, and collage of all planes together.
Notice that all details are concentrated in Y plane.
Most of the blue in Cb(strips of clothing), and most of the red in Cr(clothing and face).
Cb and Cr planes are quite flat and have low contrast.
Y`ffmpeg -i 1080p.y4m -filter_complex "extractplanes=y" y.png`
Cb`ffmpeg -i 1080p.y4m -filter_complex "extractplanes=u" u.png`
Cr`ffmpeg -i 1080p.y4m -filter_complex "extractplanes=v" v.png`
Y + Cb + Cr
This is a collage of all planes at their size after subsampling.`ffmpeg -i 1080p.y4m -filter_complex "[0:v]extractplanes=planes=y[y];[0:v]extractplanes=planes=u[u];[0:v]extractplane s=planes=v[v];[u][v]hstack[uv];[y][uv]vstack" i.png`
Without any compression it is
2073600 bytes for Y plane, and
for Cb and Cr planes.
3110400 bytes or
2.97MiB for whole image.
What is exactly the half of size of RGB image.
Now as we done those steps, let's put planes together and compare images,
to see how much of visual difference RGB -> YCbCr conversion and subsmapling
First is oriiginal and second is reconstructed from YCbCr.
Without zooming in and inspecting each image side by side there is little to
nothing lost visually.
Closer cross-examination could show loss of detail, best seen on colorful parts of the image.
Without having reference, it's hard to say that any loss of quality happened, while guaranteeing halving total amount of data used for image.
Furthremore we can employ some tools to measure and show us difference between
butteraugli, which give us heatmap and score which measure how
much images deviate.
Score for this particular image is:
Which gives us quality difference which you can see when encode image into jpeg quality 90-95
Heatmap of differences
Where more saturated and red color indicates higher distortion.
As you might notice from the planes, each RGB plane is detailed while
Cb and Cr planes are quite flat, and don't contain a lot of unique features.
That can be exploited by lossless compression.
Lossless compression eliminating redundant information, which can be used to reduce our data amount even futher.
Let's try to compress each of the planes of RGB and YCbCr losslessly.
Each plane is containing only raw data, making each RGB plane is
and YCbCr planes are
518400 bytes respectively.
I will use 2 methods:
- Compression on raw plane using
- Compression of grascale image of the plane using PNG and
optipng. Removing redundant information using image compression standard.
ffmpeg -i 1080p.png -filter_complex "extractplanes=r" r.rgb
ffmpeg -i 1080p.png -filter_complex "extractplanes=g" g.rgb
ffmpeg -i 1080p.png -filter_complex "extractplanes=b" b.rgb
ffmpeg -i 1080p.y4m -vf "extractplanes=y" -pix_fmt gray y.yuv
ffmpeg -i 1080p.y4m -vf "extractplanes=u" -pix_fmt gray u.yuv
ffmpeg -i 1080p.y4m -vf "extractplanes=v" -pix_fmt gray v.yuv
zstd -b22 file
optipng -o5 file
RGB - zstd:
r.rgb : 2073600 -> 942755 (x2.200)
g.rgb : 2073600 -> 734736 (x2.822)
b.rgb : 2073600 -> 916753 (x2.262)
RGB - png:
r.png : 2073600 -> 724876 (x2.874)
g.png : 2073600 -> 577642 (x3.607)
b.png : 2073600 -> 700227 (x2.975)
YCbCr - zstd:
y.yuv : 2073600 -> 789617 (x2.626)
u.yuv : 518400 -> 136283 (x3.804)
v.yuv : 518400 -> 139165 (x3.725)
YCbCr - png:
y.png : 2073600 -> 620876 (x3.355)
u.png : 518400 -> 117,888 (x4.397)
v.png : 518400 -> 119941 (x4.322)
What is quite interesing, is that even though we already reduced size of Cb, Cr
their size could be reduced even more than Y plane, and Y plane compressability
somewhere in-between RGB planes for both
This gives our small experient result of
2002745 bytes for RGB and
bytes for YCbCr.
What is (x3.106) and (x7.244) reduction in size respectively.
YCbCr colorspace is designed with human perception in mind and in a way that allows us more efficiently represent and store visually important data. Which also on average will greater compression potential than RGB.