Merge branch 'master' of https://github.com/nothings/stb
This commit is contained in:
commit
43fb9942de
@ -1,201 +0,0 @@
|
|||||||
1.
|
|
||||||
|
|
||||||
Consider just porting this C++ public domain
|
|
||||||
library back to C:
|
|
||||||
https://code.google.com/p/imageresampler/source/browse/#svn%2Ftrunk
|
|
||||||
(recommended by @castano)
|
|
||||||
|
|
||||||
|
|
||||||
2.
|
|
||||||
|
|
||||||
Consider three cases just to suggest the spectrum
|
|
||||||
of possiblities:
|
|
||||||
|
|
||||||
a) linear upsample: each output pixel is a weighted sum
|
|
||||||
of 4 input pixels
|
|
||||||
|
|
||||||
b) cubic upsample: each output pixel is a weighted sum
|
|
||||||
of 16 input pixels
|
|
||||||
|
|
||||||
c) downsample by N with box filter: each output pixel
|
|
||||||
is a weighted sum of NxN input pixels, N can be very large
|
|
||||||
|
|
||||||
Now, suppose you want to handle 8-bit input, 16-bit
|
|
||||||
input, and float input, and you want to do sRGB correction
|
|
||||||
or not.
|
|
||||||
|
|
||||||
Suppose you create a temporary buffer of float pixels, say
|
|
||||||
one scanline tall. Actually two temp buffers, one for the
|
|
||||||
input and one for the output. You decode a scanline of the
|
|
||||||
input into the temp buffer which is always linear floats. This
|
|
||||||
isolates the handling of 8/16/float and sRGB to one place
|
|
||||||
(and still allows you to make optimized 8-bit-sRGB-to-float
|
|
||||||
lookup tables). This also allows you to put wrap logic here,
|
|
||||||
explicitly wrapping, reflecting, or replicating-from-edge
|
|
||||||
pixels that would come from off-edge.
|
|
||||||
|
|
||||||
You then do whatever the appropriate weighted sums are
|
|
||||||
into the output buffer, and you move on to the next
|
|
||||||
scanline of the input.
|
|
||||||
|
|
||||||
The algorithm just described works directly for case (c).
|
|
||||||
Suppose you're downsampling by 2.5; then output scanline 0
|
|
||||||
sums from input scanlines 0, 1, and 2; output scanline 1
|
|
||||||
sums from 2,3,4; output 2 from 5,6,7; output 3 from 7,8,9.
|
|
||||||
Note how 2 & 7 get reused, but we don't have to recompute
|
|
||||||
them because we can do things in a single linear pass
|
|
||||||
through the input and output at the same time.
|
|
||||||
|
|
||||||
Now, consider case (a). When upsampling, the same two input
|
|
||||||
scanlines will get sampled-from for multiple output scanlines.
|
|
||||||
So, to avoid recomputing the input scanlines, we need either
|
|
||||||
multiple input or multiple output temp buffer lines. Since
|
|
||||||
the number of output lines a given pair of input scanlines
|
|
||||||
might touch scales with the upsample amount, it makes more
|
|
||||||
sense to use two input scanline buffers. For cubic, you'll
|
|
||||||
need four scanline buffers, and in general the number of
|
|
||||||
buffers will be limited by the max filter width, which is
|
|
||||||
presumably hardcoded.
|
|
||||||
|
|
||||||
It turns out to be slightly different for two reasons:
|
|
||||||
|
|
||||||
1. when using an arbitrary filter and downsampling,
|
|
||||||
you actually need N output buffers and 1 input buffer
|
|
||||||
(vs 1 output buffer and N input buffers upsampling)
|
|
||||||
|
|
||||||
2. this approach will be very inefficient as written.
|
|
||||||
you want to use separable filters and actually do
|
|
||||||
seperable computation: first decode an input scanline
|
|
||||||
into a 'decode' buffer, then horizontally resample it
|
|
||||||
into the "input" buffer (kind of a misnomer, but
|
|
||||||
they're the inputs to the vertical resampler)
|
|
||||||
|
|
||||||
(The above approach isn't optimal for non-uniform resampling;
|
|
||||||
optimal is to do whichever axis is smaller first, but I don't
|
|
||||||
think we have to care about doing that right.)
|
|
||||||
|
|
||||||
|
|
||||||
Now, you can either:
|
|
||||||
|
|
||||||
1. malloc the temp memory
|
|
||||||
2. alloca it
|
|
||||||
3. allocate a fixed amount on the stack
|
|
||||||
4. let the user pass it in
|
|
||||||
|
|
||||||
I forbid #2 in stb libraries for portability.
|
|
||||||
|
|
||||||
If you're not allocating the output image, but rather requiring
|
|
||||||
the user to pass it in, it's probably worth trying to avoid #1
|
|
||||||
because people always want to use stb libs without any memory
|
|
||||||
allocations for various reason. (Note that most stb libs go
|
|
||||||
crazy with memory allocations--you shouldn't use stb_image
|
|
||||||
in a console game--but I've tried to avoid it more in newer
|
|
||||||
libs.)
|
|
||||||
|
|
||||||
The way #3 would work is instead of using a scanline-width
|
|
||||||
temp buffer, use some fixed-width temp buffer that's W pixels,
|
|
||||||
and scale the image in vertical stripes that are that wide.
|
|
||||||
Suppose you make the temp buffers 256 wide; then an upsample
|
|
||||||
by 8 computes 256-pixel-width strips (from ~32-pixel-wide input
|
|
||||||
strips), but a downsample by 8 computes ~32-pixel-width
|
|
||||||
strips (from a 256-pixel width strip). Note this limits
|
|
||||||
the max down/upsampling to be ballpark 256x along the
|
|
||||||
horizontal axis.
|
|
||||||
|
|
||||||
In the following, I do #3 and allow #4 for cases where #3 is
|
|
||||||
too small, but it's not the only possibility:
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Function prototypes:
|
|
||||||
|
|
||||||
the highest-level one could be:
|
|
||||||
|
|
||||||
stb_resample_8bit(uint8_t *dest, int dest_width, int dest_height,
|
|
||||||
uint8_t const *src , int src_width, int src_height,
|
|
||||||
int channels,
|
|
||||||
stbr_filter filter);
|
|
||||||
|
|
||||||
the lowest-level one could be:
|
|
||||||
|
|
||||||
stb_resample_arbitrary(void *dst, stbr_type dst_type, int dst_width, int dst_height, int dst_stride_in_bytes,
|
|
||||||
void const *src, stbr_type src_type, int src_width, int src_height, int src_stride_in_bytes,
|
|
||||||
float s0, float t0, float s1, float t1, // range of source to use, 0..1 in GPU texture-coordinate style
|
|
||||||
int channels,
|
|
||||||
int nonpremul_alpha_channel_index,
|
|
||||||
stbr_wrapmode wrap, // clamp, wrap, mirror
|
|
||||||
stbr_filter filter,
|
|
||||||
void *tempmem, size_t tempmem_size_in_bytes);
|
|
||||||
|
|
||||||
And there would be a bunch of convenience functions in-between those two levels.
|
|
||||||
|
|
||||||
|
|
||||||
Some notes:
|
|
||||||
|
|
||||||
s0,t0,s1,t1:
|
|
||||||
this allows fine subpixel-positioning and subpixel-resizing in an explicit way without
|
|
||||||
things having to be exact pixel multiples. it allows people to pseudo-stream
|
|
||||||
images by computing "tiles" of images a bit at a time without forcing those
|
|
||||||
tiles to quantize their source data.
|
|
||||||
|
|
||||||
nonpremul_alpha_channel_index:
|
|
||||||
if this is negative, no channels are processed specially
|
|
||||||
if this is non-negative, then it's the index of the alpha channel,
|
|
||||||
and the image should be treated as non-premultiplied alpha that
|
|
||||||
needs to be resampled accounting for this (weight the sampling
|
|
||||||
by the alpha channel, i.e. premultiply, filter, unpremultiply).
|
|
||||||
this mechanism only allows one alpha channel and ALL channels
|
|
||||||
are scaled by it; an alternative would be to find some way to
|
|
||||||
pass in which channels serve as alpha channels for which other
|
|
||||||
channels, but eh.
|
|
||||||
|
|
||||||
tempmem, tempmem_size:
|
|
||||||
all functions will needed tempmem, but they can allocate a fixed tempmem buffer
|
|
||||||
on the stack. providing an API that allows overriding the amount of tempmem
|
|
||||||
available allows people to process arbitrarily large images. the return
|
|
||||||
value for the function could be 0 on success or non-0 being the size of
|
|
||||||
tempmem needed.
|
|
||||||
|
|
||||||
src_stride, dest_stride:
|
|
||||||
the stride variables are signed to allow you to describe both traditional
|
|
||||||
top-to-bottom images (pass in a pointer to the top-left pixel and
|
|
||||||
a positive stride) and bottom-to-top images (pass in a pointer to
|
|
||||||
the bottom-left pixel and a negative stride)
|
|
||||||
|
|
||||||
ordering of src & dest:
|
|
||||||
put these in whatever order you like, i just chose one arbitrarily
|
|
||||||
|
|
||||||
width & height
|
|
||||||
these are ints not unsigned ints or size_ts because i personally forbid
|
|
||||||
unsigned variables for almost everything to avoid signed/unsigned comparison
|
|
||||||
issues, but this is a matter of personal taste and you can do differently
|
|
||||||
|
|
||||||
Intermediate-level functions should be provided for each source type & same dest type
|
|
||||||
so that the code is typesafe; only when people fall back to stb_resample_arbitrary should
|
|
||||||
they be at risk for type unsafety. (One way to deal avoid an explosion of functions of
|
|
||||||
every possible *combination* of types in a type-safe way would be to define one function
|
|
||||||
for each input type, and accept three separate output pointers, one for each type, only
|
|
||||||
one of which can be non-NULL. 9 functions isn't that bad, but if you want to have three
|
|
||||||
or four intermediate-level functions with fewer parameters, 9*4 gets silly. Could also
|
|
||||||
use the same trick for stb_resample_arbitrary, replacing it with three typesafe functions.)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Reference:
|
|
||||||
|
|
||||||
Cubic sampling function for seperable cubic:
|
|
||||||
f(x) = (a+2)*x^3 - (a+3)*x^2 + 1 for 0 <= x <= 1
|
|
||||||
f(x) = a*x^3 - 5*a*x^2 + 8*a*x - 4*a for 1 < x <= 2
|
|
||||||
f(x) = 0 otherwise
|
|
||||||
"a" is configurable, try -1/2 (from http://pixinsight.com/forum/index.php?topic=556.0 )
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Wish list:
|
|
||||||
s0, t0, s1, t1 vs scale_x, scale_y, offset_x, offset_y - What's the best interface?
|
|
||||||
Separate wrap modes and filter modes per axis
|
|
||||||
Alpha test coverage respecting resize (FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage: https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp)
|
|
||||||
Installable filter kernels
|
|
||||||
|
|
||||||
|
|
@ -31,13 +31,9 @@
|
|||||||
ADDITIONAL DOCUMENTATION
|
ADDITIONAL DOCUMENTATION
|
||||||
|
|
||||||
SRGB & FLOATING POINT REPRESENTATION
|
SRGB & FLOATING POINT REPRESENTATION
|
||||||
Some srgb-related code in this library relies on floats being 32-bit
|
The sRGB functions presume IEEE floating point. If you do not have
|
||||||
IEEE floating point, and relies on a specific bitpacking order of C
|
IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
|
||||||
bitfields. If you are on a system that uses non-IEEE floats or packs
|
a slower implementation.
|
||||||
C bitfields in the opposite order, then you can use a slower fallback
|
|
||||||
codepath by defining STBIR_NON_IEEE_FLOAT. (We didn't make this choice
|
|
||||||
idly; using mostly-but-not-100%-portable-code for this is a massive
|
|
||||||
speedup, especially upsampling where colorspace conversion dominates.)
|
|
||||||
|
|
||||||
MEMORY ALLOCATION
|
MEMORY ALLOCATION
|
||||||
The resize functions here perform a single memory allocation using
|
The resize functions here perform a single memory allocation using
|
||||||
@ -655,12 +651,6 @@ typedef union
|
|||||||
{
|
{
|
||||||
stbir_uint32 u;
|
stbir_uint32 u;
|
||||||
float f;
|
float f;
|
||||||
struct
|
|
||||||
{
|
|
||||||
stbir_uint32 Mantissa : 23;
|
|
||||||
stbir_uint32 Exponent : 8;
|
|
||||||
stbir_uint32 Sign : 1;
|
|
||||||
};
|
|
||||||
} stbir__FP32;
|
} stbir__FP32;
|
||||||
|
|
||||||
static const stbir_uint32 fp32_to_srgb8_tab4[104] = {
|
static const stbir_uint32 fp32_to_srgb8_tab4[104] = {
|
||||||
|
Loading…
Reference in New Issue
Block a user