Direct DCT scaling to a fourth and to a half

This is an implementation of a completely new approach to a reduced-size DCT output in jidctred.c, based on ideas from Rakesh Dugad and Narendra Ahuja: "A Fast Scheme for Image Size Change in the Compressed Domain" (2001). Related ideas discussing the DFT and Hartley transform are given by Andrew B. Watson in "Ideal Shrinking and Expansion of Discrete Sequences" (1986). Further ideas are presented by A. N. Skodras and C. A. Christopoulos in "Down-Sampling of Compressed Images in the DCT Domain" (1998) and "On the Down-Scaling of Compressed Pictures" (1998). Skodras and Christopoulos present yet another approach and call the here presented (Dugad & Ahuja) approach "Frequency Masking", the old approach "Traditional".

The new jidctred.c code

The old jidctred.c code

Advantages:

There are no known disadvantages!
Since the method is equivalent to the method of Dugad and Ahuja, the visual and measured quality of the downsampled output is at least as good as common bilinear (averaging) downsampling.

Derivation of the fast 2x2 IDCT algorithm

The 2x2 point DCT and IDCT (same due to symmetry) is given by
  /  cos(pi/4)   cos(pi/4)  \     /  1/sqrt(2)   1/sqrt(2)  \                /  1   1  \
  |                         |  =  |                         |  =  1/sqrt(2)  |         |
  \  cos(pi/4)  -cos(pi/4)  /     \  1/sqrt(2)  -1/sqrt(2)  /                \  1  -1  /
This leads to a simple add/sub (sum/diff) calculation scheme. The scalar multiplication is deferred to the descaling operation at the end of the calculation. After column and row application the factor is 1/sqrt(2) * 1/sqrt(2) = 1/2. We also have to account for the different size DCT with a factor of 1/4, so the final descaling factor is 1/2 * 1/4 = 1/8 (right shift 3).

Derivation of the fast 4x4 IDCT algorithm

The 4x4 point DCT is given by
             /  C4   C4   C4   C4  \          /   1    1    1    1  \
             |                     |          |                     |
             |  C2   C6  -C6  -C2  |          |  c2   c6  -c6  -c2  |
  1/sqrt(2)  |                     |  =  1/2  |                     |
             |  C4  -C4  -C4   C4  |          |   1   -1   -1    1  |
             |                     |          |                     |
             \  C6  -C2   C2  -C6  /          \  c6  -c2   c2  -c6  /


  where  Ck = cos(k*pi/16),    ck = sqrt(2) * Ck    (note C4 = 1/sqrt(2))
We use the indexing from the 8x8 point DCT for comparison. The IDCT is the transpose of the DCT, hence
       /  1   c2   1   c6  \
       |                   |
       |  1   c6  -1  -c2  |
  1/2  |                   |
       |  1  -c6  -1   c2  |
       |                   |
       \  1  -c2   1  -c6  /
The scalar multiplication is deferred to the descaling operation at the end of the calculation. After column and row application the factor is 1/2 * 1/2 = 1/4. We also have to account for the different size DCT with a factor of 1/2, so the final descaling factor is 1/4 * 1/2 = 1/8 (right shift 3).

Now let (x0 x1 x2 x3) be our input vector, (y0 y1 y2 y3) our output. Then we have

  / y0 \     /  1   c2   1   c6  \     / x0 \     /  x0 + c2 * x1 + x2 + c6 * x3  \
  |    |     |                   |     |    |     |                               |
  | y1 |     |  1   c6  -1  -c2  |     | x1 |     |  x0 + c6 * x1 - x2 - c2 * x3  |
  |    |  =  |                   |  *  |    |  =  |                               |
  | y2 |     |  1  -c6  -1   c2  |     | x2 |     |  x0 - c6 * x1 - x2 + c2 * x3  |
  |    |     |                   |     |    |     |                               |
  \ y3 /     \  1  -c2   1  -c6  /     \ x3 /     \  x0 - c2 * x1 + x2 - c6 * x3  /
You can see that the even part (x0 x2) does not contain any multiplication.
Now let
  t10 = x0 + x2
  t12 = x0 - x2            (even part)

  t0 = c2 * x1 + c6 * x3
  t2 = c6 * x1 - c2 * x3   (odd part)
Then we have
  y0 = t10 + t0
  y3 = t10 - t0
  y1 = t12 + t2
  y2 = t12 - t2
This would require 4 multiplications. We can save 1 multiplication by simple algebraic transformation:
  t0 = (c2 - c6) * x1 + c6 * (x1 + x3)
  t2 = c6 * (x1 + x3) - (c2 + c6) * x3
This requires 3 multiplications with
  c6 = sqrt(2) * cos(6*pi/16) = 0.541196100
 [c2 = sqrt(2) * cos(2*pi/16) not needed]
  c2 + c6 = 1.847759065
  c2 - c6 = 0.765366865
This is the same 'rotation' as in the even part of the 8x8 point LL&M IDCT algorithm.

More scalings

Direct 2:3 (8:12) DCT upscaling with fast 12x12 IDCT