63 lines
2.5 KiB
Plaintext
63 lines
2.5 KiB
Plaintext
|
Consider three cases just to suggest the spectrum
|
||
|
of possiblities:
|
||
|
|
||
|
a) linear upsample: each output pixel is a weighted sum
|
||
|
of 4 input pixels
|
||
|
|
||
|
b) cubic upsample: each output pixel is a weighted sum
|
||
|
of 16 input pixels
|
||
|
|
||
|
c) downsample by N with box filter: each output pixel
|
||
|
is a weighted sum of NxN input pixels, N can be very large
|
||
|
|
||
|
Now, suppose you want to handle 8-bit input, 16-bit
|
||
|
input, and float input, and you want to do sRGB correction
|
||
|
or not.
|
||
|
|
||
|
Suppose you create a temporary buffer of float pixels, say
|
||
|
one scanline tall. Actually two temp buffers, one for the
|
||
|
input and one for the output. You decode a scanline of the
|
||
|
input into the temp buffer which is always linear floats. This
|
||
|
isolates the handling of 8/16/float and sRGB to one place
|
||
|
(and still allows you to make optimized 8-bit-sRGB-to-float
|
||
|
lookup tables). This also allows you to put wrap logic here,
|
||
|
explicitly wrapping, reflecting, or replicating-from-edge
|
||
|
pixels that would come from off-edge.
|
||
|
|
||
|
You then do whatever the appropriate weighted sums are
|
||
|
into the output buffer, and you move on to the next
|
||
|
scanline of the input.
|
||
|
|
||
|
The algorithm just described works directly for case (c).
|
||
|
Suppose you're downsampling by 2.5; then output scanline 0
|
||
|
sums from input scanlines 0, 1, and 2; output scanline 1
|
||
|
sums from 2,3,4; output 2 from 5,6,7; output 3 from 7,8,9.
|
||
|
Note how 2 & 7 get reused, but we don't have to recompute
|
||
|
them because we can do things in a single linear pass
|
||
|
through the input and output at the same time.
|
||
|
|
||
|
Now, consider case (a). When upsampling, the same two input
|
||
|
scanlines will get sampled-from for multiple output scanlines.
|
||
|
So, to avoid recomputing the input scanlines, we need either
|
||
|
multiple input or multiple output temp buffer lines. Since
|
||
|
the number of output lines a given pair of input scanlines
|
||
|
might touch scales with the upsample amount, it makes more
|
||
|
sense to use two input scanline buffers. For cubic, you'll
|
||
|
need four scanline buffers, and in general the number of
|
||
|
buffers will be limited by the max filter width, which is
|
||
|
presumably hardcoded.
|
||
|
|
||
|
You want to avoid memory allocations (since you're passing
|
||
|
in the target buffer already), so instead of using a scanline-width
|
||
|
temp buffer, use some fixed-width temp buffer that's W pixels,
|
||
|
and scale the image in vertical stripes that are that wide.
|
||
|
Suppose you make the temp buffers 256 wide; then an upsample
|
||
|
by 8 computes 256-pixel-width strips (from ~32-pixel-wide input
|
||
|
strips), but a downsample by 8 computes ~32-pixel-width
|
||
|
strips (from a 256-pixel width strip). Note this limits
|
||
|
the max down/upsampling to be ballpark 256x along the
|
||
|
horizontal axis.
|
||
|
|
||
|
|
||
|
|