Merge branch 'master' of https://github.com/nothings/stb

2014-09-23 17:22:52 -07:00
parent 8252a94f02 891f6d7720
commit 43fb9942de
2 changed files with 3 additions and 214 deletions
--- a/docs/stb_resample_ideas.txt
+++ b/docs/stb_resample_ideas.txt
@ -1,201 +0,0 @@
 1.
 Consider just porting this C++ public domain
 library back to C:
    https://code.google.com/p/imageresampler/source/browse/#svn%2Ftrunk
 (recommended by @castano)
 2.
 Consider three cases just to suggest the spectrum
 of possiblities:
 a) linear upsample: each output pixel is a weighted sum
 of 4 input pixels
 b) cubic upsample: each output pixel is a weighted sum
 of 16 input pixels
 c) downsample by N with box filter: each output pixel
 is a weighted sum of NxN input pixels, N can be very large
 Now, suppose you want to handle 8-bit input, 16-bit
 input, and float input, and you want to do sRGB correction
 or not.
 Suppose you create a temporary buffer of float pixels, say
 one scanline tall. Actually two temp buffers, one for the
 input and one for the output. You decode a scanline of the
 input into the temp buffer which is always linear floats. This
 isolates the handling of 8/16/float and sRGB to one place
 (and still allows you to make optimized 8-bit-sRGB-to-float
 lookup tables). This also allows you to put wrap logic here,
 explicitly wrapping, reflecting, or replicating-from-edge
 pixels that would come from off-edge.
 You then do whatever the appropriate weighted sums are
 into the output buffer, and you move on to the next
 scanline of the input.
 The algorithm just described works directly for case (c).
 Suppose you're downsampling by 2.5; then output scanline 0
 sums from input scanlines 0, 1, and 2; output scanline 1
 sums from 2,3,4; output 2 from 5,6,7; output 3 from 7,8,9.
 Note how 2 & 7 get reused, but we don't have to recompute
 them because we can do things in a single linear pass
 through the input and output at the same time.
 Now, consider case (a). When upsampling, the same two input
 scanlines will get sampled-from for multiple output scanlines.
 So, to avoid recomputing the input scanlines, we need either
 multiple input or multiple output temp buffer lines. Since
 the number of output lines a given pair of input scanlines
 might touch scales with the upsample amount, it makes more
 sense to use two input scanline buffers. For cubic, you'll
 need four scanline buffers, and in general the number of
 buffers will be limited by the max filter width, which is
 presumably hardcoded.
 It turns out to be slightly different for two reasons:
   1. when using an arbitrary filter and downsampling,
      you actually need N output buffers and 1 input buffer
      (vs 1 output buffer and N input buffers upsampling)
   2. this approach will be very inefficient as written.
      you want to use separable filters and actually do
      seperable computation: first decode an input scanline
      into a 'decode' buffer, then horizontally resample it
      into the "input" buffer (kind of a misnomer, but
      they're the inputs to the vertical resampler)
 (The above approach isn't optimal for non-uniform resampling;
 optimal is to do whichever axis is smaller first, but I don't
 think we have to care about doing that right.)
 Now, you can either:
    1. malloc the temp memory
    2. alloca it
    3. allocate a fixed amount on the stack
    4. let the user pass it in
 I forbid #2 in stb libraries for portability.
 If you're not allocating the output image, but rather requiring
 the user to pass it in, it's probably worth trying to avoid #1
 because people always want to use stb libs without any memory
 allocations for various reason. (Note that most stb libs go
 crazy with memory allocations--you shouldn't use stb_image
 in a console game--but I've tried to avoid it more in newer
 libs.)
 The way #3 would work is instead of using a scanline-width
 temp buffer, use some fixed-width temp buffer that's W pixels,
 and scale the image in vertical stripes that are that wide.
 Suppose you make the temp buffers 256 wide; then an upsample
 by 8 computes 256-pixel-width strips (from ~32-pixel-wide input
 strips), but a downsample by 8 computes ~32-pixel-width
 strips (from a 256-pixel width strip). Note this limits
 the max down/upsampling to be ballpark 256x along the
 horizontal axis.
 In the following, I do #3 and allow #4 for cases where #3 is
 too small, but it's not the only possibility:
 Function prototypes:
 the highest-level one could be:
   stb_resample_8bit(uint8_t       *dest, int dest_width, int dest_height,
                     uint8_t const *src , int  src_width, int  src_height,
                     int channels,
                     stbr_filter filter);
 the lowest-level one could be:
   stb_resample_arbitrary(void       *dst, stbr_type dst_type, int dst_width, int dst_height, int dst_stride_in_bytes,
                          void const *src, stbr_type src_type, int src_width, int src_height, int src_stride_in_bytes,
                          float s0, float t0, float s1, float t1, // range of source to use, 0..1 in GPU texture-coordinate style
                          int channels,
                          int nonpremul_alpha_channel_index,
                          stbr_wrapmode wrap,                     // clamp, wrap, mirror
                          stbr_filter filter,
                          void  *tempmem, size_t tempmem_size_in_bytes);
 And there would be a bunch of convenience functions in-between those two levels.
 Some notes:
   s0,t0,s1,t1:
       this allows fine subpixel-positioning and subpixel-resizing in an explicit way without
           things having to be exact pixel multiples. it allows people to pseudo-stream
           images by computing "tiles" of images a bit at a time without forcing those
           tiles to quantize their source data.
   nonpremul_alpha_channel_index:
       if this is negative, no channels are processed specially
       if this is non-negative, then it's the index of the alpha channel,
           and the image should be treated as non-premultiplied alpha that
           needs to be resampled accounting for this (weight the sampling
           by the alpha channel, i.e. premultiply, filter, unpremultiply).
           this mechanism only allows one alpha channel and ALL channels 
           are scaled by it; an alternative would be to find some way to
           pass in which channels serve as alpha channels for which other
           channels, but eh.
   tempmem, tempmem_size:
       all functions will needed tempmem, but they can allocate a fixed tempmem buffer
           on the stack. providing an API that allows overriding the amount of tempmem
           available allows people to process arbitrarily large images. the return
           value for the function could be 0 on success or non-0 being the size of
           tempmem needed.
   src_stride, dest_stride:
       the stride variables are signed to allow you to describe both traditional
           top-to-bottom images (pass in a pointer to the top-left pixel and
           a positive stride) and bottom-to-top images (pass in a pointer to
           the bottom-left pixel and a negative stride)
   ordering of src & dest:
       put these in whatever order you like, i just chose one arbitrarily
   width & height
       these are ints not unsigned ints or size_ts because i personally forbid
           unsigned variables for almost everything to avoid signed/unsigned comparison
           issues, but this is a matter of personal taste and you can do differently
   Intermediate-level functions should be provided for each source type & same dest type
   so that the code is typesafe; only when people fall back to stb_resample_arbitrary should
   they be at risk for type unsafety. (One way to deal avoid an explosion of functions of
   every possible *combination* of types in a type-safe way would be to define one function
   for each input type, and accept three separate output pointers, one for each type, only
   one of which can be non-NULL. 9 functions isn't that bad, but if you want to have three
   or four intermediate-level functions with fewer parameters, 9*4 gets silly. Could also
   use the same trick for stb_resample_arbitrary, replacing it with three typesafe functions.)
 Reference:
 Cubic sampling function for seperable cubic:
   f(x) = (a+2)*x^3 - (a+3)*x^2 + 1       for 0 <= x <= 1
   f(x) = a*x^3 - 5*a*x^2 + 8*a*x - 4*a   for 1 < x <= 2
   f(x) = 0                               otherwise
   "a" is configurable, try -1/2 (from http://pixinsight.com/forum/index.php?topic=556.0 )
 Wish list:
   s0, t0, s1, t1 vs scale_x, scale_y, offset_x, offset_y - What's the best interface?
   Separate wrap modes and filter modes per axis
   Alpha test coverage respecting resize (FloatImage::alphaTestCoverage and FloatImage::scaleAlphaToCoverage: https://code.google.com/p/nvidia-texture-tools/source/browse/trunk/src/nvimage/FloatImage.cpp)
   Installable filter kernels
--- a/stb_image_resize.h
+++ b/stb_image_resize.h
@ -31,13 +31,9 @@
   ADDITIONAL DOCUMENTATION
      SRGB & FLOATING POINT REPRESENTATION
-         Some srgb-related code in this library relies on floats being 32-bit
+         The sRGB functions presume IEEE floating point. If you do not have
-         IEEE floating point, and relies on a specific bitpacking order of C
+         IEEE floating point, define STBIR_NON_IEEE_FLOAT. This will use
-         bitfields. If you are on a system that uses non-IEEE floats or packs
+         a slower implementation.
         C bitfields in the opposite order, then you can use a slower fallback
         codepath by defining STBIR_NON_IEEE_FLOAT. (We didn't make this choice
         idly; using mostly-but-not-100%-portable-code for this is a massive
         speedup, especially upsampling where colorspace conversion dominates.)
      MEMORY ALLOCATION
         The resize functions here perform a single memory allocation using
@ -655,12 +651,6 @@ typedef union
 {
    stbir_uint32 u;
    float f;
    struct
    {
        stbir_uint32 Mantissa : 23;
        stbir_uint32 Exponent : 8;
        stbir_uint32 Sign : 1;
    };
 } stbir__FP32;
 static const stbir_uint32 fp32_to_srgb8_tab4[104] = {