The original AC decoding logic handled ZRL (runs of 16 zeros)
incorrectly.
The problem is that the original flow set r=16 and skipped the
final coeff write when s=0. This is not actually correct. The
problem is the intervening "refinement" bits.
With the original logic, even once we decrement r to 0, we keep
reading more refinement bits for subsequent coefficients until
we find the next currently-unsent AC in the current scan. That is,
it works as if it was trying to place 17 new AC values, and only
bails at the last minute from actually setting that 17th value.
This is wrong. Once we've found the 16th previously-unsent AC, we
need to stop reading refinement bits, otherwise we get out of sync
with the bit stream (which expects us to read a huffman code next).
The easiest fix is to just do what the JPEG standard implicitly
assumes anyway: treat ZRL as a run of 15 zeros followed by an
explicit magnitude-zero AC coeff. (That is, leave s=0 and actually
write s). So this is what this fix does.
GCC 4.7 gave the warning "signed and unsigned type in conditional
expression" because the ternary operator mixes signed and unsigned
integers. Fixed by casting to unsigned inside the "if" branch instead
of casting the result of the entire conditional.
This fixes two things. First, the logic to disable SSE2 on
GCC unless "-msse2" was not specific enough, and ended up
disabling SIMD support on NEON targets entirely. Shuffle
the detection logic around to make that bit x86-specific.
Second, 32-bit MinGW assumes 16-byte aligned stacks, but this is
not in the Windows ABI and hence DLLs and callbacks don't
necessarily provide it. This caused a crash.
This can be fixed by providing the right command-line option,
which we have no control over. As a compromise, disable the SSE2
path on MinGW unless a specific #define explained in the comments
is set. That way, we default to safe (never-crashing) behavior
unless the user explicitly signals they know what they're doing.
add STBI__ prefixes to internal SCAN_ enum;
strip unused function arguments for progressive funcs;
tweak release notes;
forget to git commit frequently so these would all be in their own commits;
support using inline asm for cpuid
YCbCr:
switch SSE code to constants that match old C;
create C version that is same as SSE;
tiny optimization(?) of SSE