1/* stb_image - v2.30 - public domain image loader - http://nothings.org/stb
   2                                  no warranty implied; use at your own risk
   3
   4   Do this:
   5      #define STB_IMAGE_IMPLEMENTATION
   6   before you include this file in *one* C or C++ file to create the implementation.
   7
   8   // i.e. it should look like this:
   9   #include ...
  10   #include ...
  11   #include ...
  12   #define STB_IMAGE_IMPLEMENTATION
  13   #include "stb_image.h"
  14
  15   You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
  16   And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
  17
  18
  19   QUICK NOTES:
  20      Primarily of interest to game developers and other people who can
  21          avoid problematic images and only need the trivial interface
  22
  23      JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
  24      PNG 1/2/4/8/16-bit-per-channel
  25
  26      TGA (not sure what subset, if a subset)
  27      BMP non-1bpp, non-RLE
  28      PSD (composited view only, no extra channels, 8/16 bit-per-channel)
  29
  30      GIF (*comp always reports as 4-channel)
  31      HDR (radiance rgbE format)
  32      PIC (Softimage PIC)
  33      PNM (PPM and PGM binary only)
  34
  35      Animated GIF still needs a proper API, but here's one way to do it:
  36          http://gist.github.com/urraka/685d9a6340b26b830d49
  37
  38      - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
  39      - decode from arbitrary I/O callbacks
  40      - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
  41
  42   Full documentation under "DOCUMENTATION" below.
  43
  44
  45LICENSE
  46
  47  See end of file for license information.
  48
  49RECENT REVISION HISTORY:
  50
  51      2.30  (2024-05-31) avoid erroneous gcc warning
  52      2.29  (2023-05-xx) optimizations
  53      2.28  (2023-01-29) many error fixes, security errors, just tons of stuff
  54      2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
  55      2.26  (2020-07-13) many minor fixes
  56      2.25  (2020-02-02) fix warnings
  57      2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
  58      2.23  (2019-08-11) fix clang static analysis warning
  59      2.22  (2019-03-04) gif fixes, fix warnings
  60      2.21  (2019-02-25) fix typo in comment
  61      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
  62      2.19  (2018-02-11) fix warning
  63      2.18  (2018-01-30) fix warnings
  64      2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
  65      2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
  66      2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
  67      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
  68      2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
  69      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
  70      2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
  71                         RGB-format JPEG; remove white matting in PSD;
  72                         allocate large structures on the stack;
  73                         correct channel count for PNG & BMP
  74      2.10  (2016-01-22) avoid warning introduced in 2.09
  75      2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
  76
  77   See end of file for full revision history.
  78
  79
  80 ============================    Contributors    =========================
  81
  82 Image formats                          Extensions, features
  83    Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
  84    Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
  85    Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
  86    Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
  87    Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
  88    Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
  89    Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
  90    github:urraka (animated gif)           Junggon Kim (PNM comments)
  91    Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
  92                                           socks-the-fox (16-bit PNG)
  93                                           Jeremy Sawicki (handle all ImageNet JPGs)
  94 Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
  95    Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
  96    Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
  97    John-Mark Allen
  98    Carmelo J Fdez-Aguera
  99
 100 Bug & warning fixes
 101    Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
 102    Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
 103    Phil Jordan                                Dave Moore           Roy Eltham
 104    Hayaki Saito            Nathan Reed        Won Chun
 105    Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
 106    Thomas Ruf              Ronny Chevalier                         github:rlyeh
 107    Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
 108    Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
 109    Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
 110    Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
 111    Cass Everitt            Ryamond Barbiero                        github:grim210
 112    Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
 113    Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
 114    Josh Tobin              Neil Bickford      Matthew Gregan       github:poppolopoppo
 115    Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
 116    Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
 117                            Brad Weinberger    Matvey Cherevko      github:mosra
 118    Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
 119    Ryan C. Gordon          [reserved]                              [reserved]
 120                     DO NOT ADD YOUR NAME HERE
 121
 122                     Jacko Dirks
 123
 124  To add your name to the credits, pick a random blank space in the middle and fill it.
 125  80% of merge conflicts on stb PRs are due to people adding their name at the end
 126  of the credits.
 127*/
 128
 129#ifndef STBI_INCLUDE_STB_IMAGE_H
 130#define STBI_INCLUDE_STB_IMAGE_H
 131
 132// DOCUMENTATION
 133//
 134// Limitations:
 135//    - no 12-bit-per-channel JPEG
 136//    - no JPEGs with arithmetic coding
 137//    - GIF always returns *comp=4
 138//
 139// Basic usage (see HDR discussion below for HDR usage):
 140//    int x,y,n;
 141//    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
 142//    // ... process data if not NULL ...
 143//    // ... x = width, y = height, n = # 8-bit components per pixel ...
 144//    // ... replace '0' with '1'..'4' to force that many components per pixel
 145//    // ... but 'n' will always be the number that it would have been if you said 0
 146//    stbi_image_free(data);
 147//
 148// Standard parameters:
 149//    int *x                 -- outputs image width in pixels
 150//    int *y                 -- outputs image height in pixels
 151//    int *channels_in_file  -- outputs # of image components in image file
 152//    int desired_channels   -- if non-zero, # of image components requested in result
 153//
 154// The return value from an image loader is an 'unsigned char *' which points
 155// to the pixel data, or NULL on an allocation failure or if the image is
 156// corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
 157// with each pixel consisting of N interleaved 8-bit components; the first
 158// pixel pointed to is top-left-most in the image. There is no padding between
 159// image scanlines or between pixels, regardless of format. The number of
 160// components N is 'desired_channels' if desired_channels is non-zero, or
 161// *channels_in_file otherwise. If desired_channels is non-zero,
 162// *channels_in_file has the number of components that _would_ have been
 163// output otherwise. E.g. if you set desired_channels to 4, you will always
 164// get RGBA output, but you can check *channels_in_file to see if it's trivially
 165// opaque because e.g. there were only 3 channels in the source image.
 166//
 167// An output image with N components has the following components interleaved
 168// in this order in each pixel:
 169//
 170//     N=#comp     components
 171//       1           grey
 172//       2           grey, alpha
 173//       3           red, green, blue
 174//       4           red, green, blue, alpha
 175//
 176// If image loading fails for any reason, the return value will be NULL,
 177// and *x, *y, *channels_in_file will be unchanged. The function
 178// stbi_failure_reason() can be queried for an extremely brief, end-user
 179// unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
 180// to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
 181// more user-friendly ones.
 182//
 183// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
 184//
 185// To query the width, height and component count of an image without having to
 186// decode the full file, you can use the stbi_info family of functions:
 187//
 188//   int x,y,n,ok;
 189//   ok = stbi_info(filename, &x, &y, &n);
 190//   // returns ok=1 and sets x, y, n if image is a supported format,
 191//   // 0 otherwise.
 192//
 193// Note that stb_image pervasively uses ints in its public API for sizes,
 194// including sizes of memory buffers. This is now part of the API and thus
 195// hard to change without causing breakage. As a result, the various image
 196// loaders all have certain limits on image size; these differ somewhat
 197// by format but generally boil down to either just under 2GB or just under
 198// 1GB. When the decoded image would be larger than this, stb_image decoding
 199// will fail.
 200//
 201// Additionally, stb_image will reject image files that have any of their
 202// dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
 203// which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
 204// the only way to have an image with such dimensions load correctly
 205// is for it to have a rather extreme aspect ratio. Either way, the
 206// assumption here is that such larger images are likely to be malformed
 207// or malicious. If you do need to load an image with individual dimensions
 208// larger than that, and it still fits in the overall size limit, you can
 209// #define STBI_MAX_DIMENSIONS on your own to be something larger.
 210//
 211// ===========================================================================
 212//
 213// UNICODE:
 214//
 215//   If compiling for Windows and you wish to use Unicode filenames, compile
 216//   with
 217//       #define STBI_WINDOWS_UTF8
 218//   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
 219//   Windows wchar_t filenames to utf8.
 220//
 221// ===========================================================================
 222//
 223// Philosophy
 224//
 225// stb libraries are designed with the following priorities:
 226//
 227//    1. easy to use
 228//    2. easy to maintain
 229//    3. good performance
 230//
 231// Sometimes I let "good performance" creep up in priority over "easy to maintain",
 232// and for best performance I may provide less-easy-to-use APIs that give higher
 233// performance, in addition to the easy-to-use ones. Nevertheless, it's important
 234// to keep in mind that from the standpoint of you, a client of this library,
 235// all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
 236//
 237// Some secondary priorities arise directly from the first two, some of which
 238// provide more explicit reasons why performance can't be emphasized.
 239//
 240//    - Portable ("ease of use")
 241//    - Small source code footprint ("easy to maintain")
 242//    - No dependencies ("ease of use")
 243//
 244// ===========================================================================
 245//
 246// I/O callbacks
 247//
 248// I/O callbacks allow you to read from arbitrary sources, like packaged
 249// files or some other source. Data read from callbacks are processed
 250// through a small internal buffer (currently 128 bytes) to try to reduce
 251// overhead.
 252//
 253// The three functions you must define are "read" (reads some bytes of data),
 254// "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
 255//
 256// ===========================================================================
 257//
 258// SIMD support
 259//
 260// The JPEG decoder will try to automatically use SIMD kernels on x86 when
 261// supported by the compiler. For ARM Neon support, you must explicitly
 262// request it.
 263//
 264// (The old do-it-yourself SIMD API is no longer supported in the current
 265// code.)
 266//
 267// On x86, SSE2 will automatically be used when available based on a run-time
 268// test; if not, the generic C versions are used as a fall-back. On ARM targets,
 269// the typical path is to have separate builds for NEON and non-NEON devices
 270// (at least this is true for iOS and Android). Therefore, the NEON support is
 271// toggled by a build flag: define STBI_NEON to get NEON loops.
 272//
 273// If for some reason you do not want to use any of SIMD code, or if
 274// you have issues compiling it, you can disable it entirely by
 275// defining STBI_NO_SIMD.
 276//
 277// ===========================================================================
 278//
 279// HDR image support   (disable by defining STBI_NO_HDR)
 280//
 281// stb_image supports loading HDR images in general, and currently the Radiance
 282// .HDR file format specifically. You can still load any file through the existing
 283// interface; if you attempt to load an HDR file, it will be automatically remapped
 284// to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
 285// both of these constants can be reconfigured through this interface:
 286//
 287//     stbi_hdr_to_ldr_gamma(2.2f);
 288//     stbi_hdr_to_ldr_scale(1.0f);
 289//
 290// (note, do not use _inverse_ constants; stbi_image will invert them
 291// appropriately).
 292//
 293// Additionally, there is a new, parallel interface for loading files as
 294// (linear) floats to preserve the full dynamic range:
 295//
 296//    float *data = stbi_loadf(filename, &x, &y, &n, 0);
 297//
 298// If you load LDR images through this interface, those images will
 299// be promoted to floating point values, run through the inverse of
 300// constants corresponding to the above:
 301//
 302//     stbi_ldr_to_hdr_scale(1.0f);
 303//     stbi_ldr_to_hdr_gamma(2.2f);
 304//
 305// Finally, given a filename (or an open file or memory block--see header
 306// file for details) containing image data, you can query for the "most
 307// appropriate" interface to use (that is, whether the image is HDR or
 308// not), using:
 309//
 310//     stbi_is_hdr(char *filename);
 311//
 312// ===========================================================================
 313//
 314// iPhone PNG support:
 315//
 316// We optionally support converting iPhone-formatted PNGs (which store
 317// premultiplied BGRA) back to RGB, even though they're internally encoded
 318// differently. To enable this conversion, call
 319// stbi_convert_iphone_png_to_rgb(1).
 320//
 321// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
 322// pixel to remove any premultiplied alpha *only* if the image file explicitly
 323// says there's premultiplied data (currently only happens in iPhone images,
 324// and only if iPhone convert-to-rgb processing is on).
 325//
 326// ===========================================================================
 327//
 328// ADDITIONAL CONFIGURATION
 329//
 330//  - You can suppress implementation of any of the decoders to reduce
 331//    your code footprint by #defining one or more of the following
 332//    symbols before creating the implementation.
 333//
 334//        STBI_NO_JPEG
 335//        STBI_NO_PNG
 336//        STBI_NO_BMP
 337//        STBI_NO_PSD
 338//        STBI_NO_TGA
 339//        STBI_NO_GIF
 340//        STBI_NO_HDR
 341//        STBI_NO_PIC
 342//        STBI_NO_PNM   (.ppm and .pgm)
 343//
 344//  - You can request *only* certain decoders and suppress all other ones
 345//    (this will be more forward-compatible, as addition of new decoders
 346//    doesn't require you to disable them explicitly):
 347//
 348//        STBI_ONLY_JPEG
 349//        STBI_ONLY_PNG
 350//        STBI_ONLY_BMP
 351//        STBI_ONLY_PSD
 352//        STBI_ONLY_TGA
 353//        STBI_ONLY_GIF
 354//        STBI_ONLY_HDR
 355//        STBI_ONLY_PIC
 356//        STBI_ONLY_PNM   (.ppm and .pgm)
 357//
 358//   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
 359//     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
 360//
 361//  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
 362//    than that size (in either width or height) without further processing.
 363//    This is to let programs in the wild set an upper bound to prevent
 364//    denial-of-service attacks on untrusted data, as one could generate a
 365//    valid image of gigantic dimensions and force stb_image to allocate a
 366//    huge block of memory and spend disproportionate time decoding it. By
 367//    default this is set to (1 << 24), which is 16777216, but that's still
 368//    very big.
 369
 370#ifndef STBI_NO_STDIO
 371#include <stdio.h>
 372#endif // STBI_NO_STDIO
 373
 374#define STBI_VERSION 1
 375
 376enum
 377{
 378   STBI_default = 0, // only used for desired_channels
 379
 380   STBI_grey       = 1,
 381   STBI_grey_alpha = 2,
 382   STBI_rgb        = 3,
 383   STBI_rgb_alpha  = 4
 384};
 385
 386#include <stdlib.h>
 387typedef unsigned char stbi_uc;
 388typedef unsigned short stbi_us;
 389
 390#ifdef __cplusplus
 391extern "C" {
 392#endif
 393
 394#ifndef STBIDEF
 395#ifdef STB_IMAGE_STATIC
 396#define STBIDEF static
 397#else
 398#define STBIDEF extern
 399#endif
 400#endif
 401
 402//////////////////////////////////////////////////////////////////////////////
 403//
 404// PRIMARY API - works on images of any type
 405//
 406
 407//
 408// load image by filename, open file, or memory buffer
 409//
 410
 411typedef struct
 412{
 413   int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
 414   void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
 415   int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
 416} stbi_io_callbacks;
 417
 418////////////////////////////////////
 419//
 420// 8-bits-per-channel interface
 421//
 422
 423STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
 424STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 425
 426#ifndef STBI_NO_STDIO
 427STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 428STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 429// for stbi_load_from_file, file pointer is left pointing immediately after image
 430#endif
 431
 432#ifndef STBI_NO_GIF
 433STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 434#endif
 435
 436#ifdef STBI_WINDOWS_UTF8
 437STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
 438#endif
 439
 440////////////////////////////////////
 441//
 442// 16-bits-per-channel interface
 443//
 444
 445STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 446STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 447
 448#ifndef STBI_NO_STDIO
 449STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 450STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 451#endif
 452
 453////////////////////////////////////
 454//
 455// float-per-channel interface
 456//
 457#ifndef STBI_NO_LINEAR
 458   STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 459   STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
 460
 461   #ifndef STBI_NO_STDIO
 462   STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 463   STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 464   #endif
 465#endif
 466
 467#ifndef STBI_NO_HDR
 468   STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
 469   STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
 470#endif // STBI_NO_HDR
 471
 472#ifndef STBI_NO_LINEAR
 473   STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
 474   STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
 475#endif // STBI_NO_LINEAR
 476
 477// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
 478STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 479STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
 480#ifndef STBI_NO_STDIO
 481STBIDEF int      stbi_is_hdr          (char const *filename);
 482STBIDEF int      stbi_is_hdr_from_file(FILE *f);
 483#endif // STBI_NO_STDIO
 484
 485
 486// get a VERY brief reason for failure
 487// on most compilers (and ALL modern mainstream compilers) this is threadsafe
 488STBIDEF const char *stbi_failure_reason  (void);
 489
 490// free the loaded image -- this is just free()
 491STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
 492
 493// get image dimensions & components without fully decoding
 494STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
 495STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
 496STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
 497STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 498
 499#ifndef STBI_NO_STDIO
 500STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
 501STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
 502STBIDEF int      stbi_is_16_bit          (char const *filename);
 503STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
 504#endif
 505
 506
 507
 508// for image formats that explicitly notate that they have premultiplied alpha,
 509// we just return the colors as stored in the file. set this flag to force
 510// unpremultiplication. results are undefined if the unpremultiply overflow.
 511STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
 512
 513// indicate whether we should process iphone images back to canonical format,
 514// or just pass them through "as-is"
 515STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
 516
 517// flip the image vertically, so the first pixel in the output array is the bottom left
 518STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
 519
 520// as above, but only applies to images loaded on the thread that calls the function
 521// this function is only available if your compiler supports thread-local variables;
 522// calling it will fail to link if your compiler doesn't
 523STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
 524STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
 525STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
 526
 527// ZLIB client - used by PNG, available for other purposes
 528
 529STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
 530STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
 531STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
 532STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 533
 534STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
 535STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 536
 537
 538#ifdef __cplusplus
 539}
 540#endif
 541
 542//
 543//
 544////   end header file   /////////////////////////////////////////////////////
 545#endif // STBI_INCLUDE_STB_IMAGE_H
 546
 547#ifdef STB_IMAGE_IMPLEMENTATION
 548
 549#if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
 550  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
 551  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
 552  || defined(STBI_ONLY_ZLIB)
 553   #ifndef STBI_ONLY_JPEG
 554   #define STBI_NO_JPEG
 555   #endif
 556   #ifndef STBI_ONLY_PNG
 557   #define STBI_NO_PNG
 558   #endif
 559   #ifndef STBI_ONLY_BMP
 560   #define STBI_NO_BMP
 561   #endif
 562   #ifndef STBI_ONLY_PSD
 563   #define STBI_NO_PSD
 564   #endif
 565   #ifndef STBI_ONLY_TGA
 566   #define STBI_NO_TGA
 567   #endif
 568   #ifndef STBI_ONLY_GIF
 569   #define STBI_NO_GIF
 570   #endif
 571   #ifndef STBI_ONLY_HDR
 572   #define STBI_NO_HDR
 573   #endif
 574   #ifndef STBI_ONLY_PIC
 575   #define STBI_NO_PIC
 576   #endif
 577   #ifndef STBI_ONLY_PNM
 578   #define STBI_NO_PNM
 579   #endif
 580#endif
 581
 582#if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
 583#define STBI_NO_ZLIB
 584#endif
 585
 586
 587#include <stdarg.h>
 588#include <stddef.h> // ptrdiff_t on osx
 589#include <stdlib.h>
 590#include <string.h>
 591#include <limits.h>
 592
 593#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
 594#include <math.h>  // ldexp, pow
 595#endif
 596
 597#ifndef STBI_NO_STDIO
 598#include <stdio.h>
 599#endif
 600
 601#ifndef STBI_ASSERT
 602#include <assert.h>
 603#define STBI_ASSERT(x) assert(x)
 604#endif
 605
 606#ifdef __cplusplus
 607#define STBI_EXTERN extern "C"
 608#else
 609#define STBI_EXTERN extern
 610#endif
 611
 612
 613#ifndef _MSC_VER
 614   #ifdef __cplusplus
 615   #define stbi_inline inline
 616   #else
 617   #define stbi_inline
 618   #endif
 619#else
 620   #define stbi_inline __forceinline
 621#endif
 622
 623#ifndef STBI_NO_THREAD_LOCALS
 624   #if defined(__cplusplus) &&  __cplusplus >= 201103L
 625      #define STBI_THREAD_LOCAL       thread_local
 626   #elif defined(__GNUC__) && __GNUC__ < 5
 627      #define STBI_THREAD_LOCAL       __thread
 628   #elif defined(_MSC_VER)
 629      #define STBI_THREAD_LOCAL       __declspec(thread)
 630   #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
 631      #define STBI_THREAD_LOCAL       _Thread_local
 632   #endif
 633
 634   #ifndef STBI_THREAD_LOCAL
 635      #if defined(__GNUC__)
 636        #define STBI_THREAD_LOCAL       __thread
 637      #endif
 638   #endif
 639#endif
 640
 641#if defined(_MSC_VER) || defined(__SYMBIAN32__)
 642typedef unsigned short stbi__uint16;
 643typedef   signed short stbi__int16;
 644typedef unsigned int   stbi__uint32;
 645typedef   signed int   stbi__int32;
 646#else
 647#include <stdint.h>
 648typedef uint16_t stbi__uint16;
 649typedef int16_t  stbi__int16;
 650typedef uint32_t stbi__uint32;
 651typedef int32_t  stbi__int32;
 652#endif
 653
 654// should produce compiler error if size is wrong
 655typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
 656
 657#ifdef _MSC_VER
 658#define STBI_NOTUSED(v)  (void)(v)
 659#else
 660#define STBI_NOTUSED(v)  (void)sizeof(v)
 661#endif
 662
 663#ifdef _MSC_VER
 664#define STBI_HAS_LROTL
 665#endif
 666
 667#ifdef STBI_HAS_LROTL
 668   #define stbi_lrot(x,y)  _lrotl(x,y)
 669#else
 670   #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
 671#endif
 672
 673#if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
 674// ok
 675#elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
 676// ok
 677#else
 678#error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
 679#endif
 680
 681#ifndef STBI_MALLOC
 682#define STBI_MALLOC(sz)           malloc(sz)
 683#define STBI_REALLOC(p,newsz)     realloc(p,newsz)
 684#define STBI_FREE(p)              free(p)
 685#endif
 686
 687#ifndef STBI_REALLOC_SIZED
 688#define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
 689#endif
 690
 691// x86/x64 detection
 692#if defined(__x86_64__) || defined(_M_X64)
 693#define STBI__X64_TARGET
 694#elif defined(__i386) || defined(_M_IX86)
 695#define STBI__X86_TARGET
 696#endif
 697
 698#if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
 699// gcc doesn't support sse2 intrinsics unless you compile with -msse2,
 700// which in turn means it gets to use SSE2 everywhere. This is unfortunate,
 701// but previous attempts to provide the SSE2 functions with runtime
 702// detection caused numerous issues. The way architecture extensions are
 703// exposed in GCC/Clang is, sadly, not really suited for one-file libs.
 704// New behavior: if compiled with -msse2, we use SSE2 without any
 705// detection; if not, we don't use it at all.
 706#define STBI_NO_SIMD
 707#endif
 708
 709#if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
 710// Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
 711//
 712// 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
 713// Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
 714// As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
 715// simultaneously enabling "-mstackrealign".
 716//
 717// See https://github.com/nothings/stb/issues/81 for more information.
 718//
 719// So default to no SSE2 on 32-bit MinGW. If you've read this far and added
 720// -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
 721#define STBI_NO_SIMD
 722#endif
 723
 724#if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
 725#define STBI_SSE2
 726#include <emmintrin.h>
 727
 728#ifdef _MSC_VER
 729
 730#if _MSC_VER >= 1400  // not VC6
 731#include <intrin.h> // __cpuid
 732static int stbi__cpuid3(void)
 733{
 734   int info[4];
 735   __cpuid(info,1);
 736   return info[3];
 737}
 738#else
 739static int stbi__cpuid3(void)
 740{
 741   int res;
 742   __asm {
 743      mov  eax,1
 744      cpuid
 745      mov  res,edx
 746   }
 747   return res;
 748}
 749#endif
 750
 751#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
 752
 753#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 754static int stbi__sse2_available(void)
 755{
 756   int info3 = stbi__cpuid3();
 757   return ((info3 >> 26) & 1) != 0;
 758}
 759#endif
 760
 761#else // assume GCC-style if not VC++
 762#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 763
 764#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 765static int stbi__sse2_available(void)
 766{
 767   // If we're even attempting to compile this on GCC/Clang, that means
 768   // -msse2 is on, which means the compiler is allowed to use SSE2
 769   // instructions at will, and so are we.
 770   return 1;
 771}
 772#endif
 773
 774#endif
 775#endif
 776
 777// ARM NEON
 778#if defined(STBI_NO_SIMD) && defined(STBI_NEON)
 779#undef STBI_NEON
 780#endif
 781
 782#ifdef STBI_NEON
 783#include <arm_neon.h>
 784#ifdef _MSC_VER
 785#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
 786#else
 787#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 788#endif
 789#endif
 790
 791#ifndef STBI_SIMD_ALIGN
 792#define STBI_SIMD_ALIGN(type, name) type name
 793#endif
 794
 795#ifndef STBI_MAX_DIMENSIONS
 796#define STBI_MAX_DIMENSIONS (1 << 24)
 797#endif
 798
 799///////////////////////////////////////////////
 800//
 801//  stbi__context struct and start_xxx functions
 802
 803// stbi__context structure is our basic context used by all images, so it
 804// contains all the IO context, plus some basic image information
 805typedef struct
 806{
 807   stbi__uint32 img_x, img_y;
 808   int img_n, img_out_n;
 809
 810   stbi_io_callbacks io;
 811   void *io_user_data;
 812
 813   int read_from_callbacks;
 814   int buflen;
 815   stbi_uc buffer_start[128];
 816   int callback_already_read;
 817
 818   stbi_uc *img_buffer, *img_buffer_end;
 819   stbi_uc *img_buffer_original, *img_buffer_original_end;
 820} stbi__context;
 821
 822
 823static void stbi__refill_buffer(stbi__context *s);
 824
 825// initialize a memory-decode context
 826static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
 827{
 828   s->io.read = NULL;
 829   s->read_from_callbacks = 0;
 830   s->callback_already_read = 0;
 831   s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
 832   s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
 833}
 834
 835// initialize a callback-based context
 836static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
 837{
 838   s->io = *c;
 839   s->io_user_data = user;
 840   s->buflen = sizeof(s->buffer_start);
 841   s->read_from_callbacks = 1;
 842   s->callback_already_read = 0;
 843   s->img_buffer = s->img_buffer_original = s->buffer_start;
 844   stbi__refill_buffer(s);
 845   s->img_buffer_original_end = s->img_buffer_end;
 846}
 847
 848#ifndef STBI_NO_STDIO
 849
 850static int stbi__stdio_read(void *user, char *data, int size)
 851{
 852   return (int) fread(data,1,size,(FILE*) user);
 853}
 854
 855static void stbi__stdio_skip(void *user, int n)
 856{
 857   int ch;
 858   fseek((FILE*) user, n, SEEK_CUR);
 859   ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
 860   if (ch != EOF) {
 861      ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
 862   }
 863}
 864
 865static int stbi__stdio_eof(void *user)
 866{
 867   return feof((FILE*) user) || ferror((FILE *) user);
 868}
 869
 870static stbi_io_callbacks stbi__stdio_callbacks =
 871{
 872   stbi__stdio_read,
 873   stbi__stdio_skip,
 874   stbi__stdio_eof,
 875};
 876
 877static void stbi__start_file(stbi__context *s, FILE *f)
 878{
 879   stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
 880}
 881
 882//static void stop_file(stbi__context *s) { }
 883
 884#endif // !STBI_NO_STDIO
 885
 886static void stbi__rewind(stbi__context *s)
 887{
 888   // conceptually rewind SHOULD rewind to the beginning of the stream,
 889   // but we just rewind to the beginning of the initial buffer, because
 890   // we only use it after doing 'test', which only ever looks at at most 92 bytes
 891   s->img_buffer = s->img_buffer_original;
 892   s->img_buffer_end = s->img_buffer_original_end;
 893}
 894
 895enum
 896{
 897   STBI_ORDER_RGB,
 898   STBI_ORDER_BGR
 899};
 900
 901typedef struct
 902{
 903   int bits_per_channel;
 904   int num_channels;
 905   int channel_order;
 906} stbi__result_info;
 907
 908#ifndef STBI_NO_JPEG
 909static int      stbi__jpeg_test(stbi__context *s);
 910static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 911static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
 912#endif
 913
 914#ifndef STBI_NO_PNG
 915static int      stbi__png_test(stbi__context *s);
 916static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 917static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
 918static int      stbi__png_is16(stbi__context *s);
 919#endif
 920
 921#ifndef STBI_NO_BMP
 922static int      stbi__bmp_test(stbi__context *s);
 923static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 924static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
 925#endif
 926
 927#ifndef STBI_NO_TGA
 928static int      stbi__tga_test(stbi__context *s);
 929static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 930static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
 931#endif
 932
 933#ifndef STBI_NO_PSD
 934static int      stbi__psd_test(stbi__context *s);
 935static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
 936static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
 937static int      stbi__psd_is16(stbi__context *s);
 938#endif
 939
 940#ifndef STBI_NO_HDR
 941static int      stbi__hdr_test(stbi__context *s);
 942static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 943static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
 944#endif
 945
 946#ifndef STBI_NO_PIC
 947static int      stbi__pic_test(stbi__context *s);
 948static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 949static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
 950#endif
 951
 952#ifndef STBI_NO_GIF
 953static int      stbi__gif_test(stbi__context *s);
 954static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 955static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 956static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
 957#endif
 958
 959#ifndef STBI_NO_PNM
 960static int      stbi__pnm_test(stbi__context *s);
 961static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 962static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
 963static int      stbi__pnm_is16(stbi__context *s);
 964#endif
 965
 966static
 967#ifdef STBI_THREAD_LOCAL
 968STBI_THREAD_LOCAL
 969#endif
 970const char *stbi__g_failure_reason;
 971
 972STBIDEF const char *stbi_failure_reason(void)
 973{
 974   return stbi__g_failure_reason;
 975}
 976
 977#ifndef STBI_NO_FAILURE_STRINGS
 978static int stbi__err(const char *str)
 979{
 980   stbi__g_failure_reason = str;
 981   return 0;
 982}
 983#endif
 984
 985static void *stbi__malloc(size_t size)
 986{
 987    return STBI_MALLOC(size);
 988}
 989
 990// stb_image uses ints pervasively, including for offset calculations.
 991// therefore the largest decoded image size we can support with the
 992// current code, even on 64-bit targets, is INT_MAX. this is not a
 993// significant limitation for the intended use case.
 994//
 995// we do, however, need to make sure our size calculations don't
 996// overflow. hence a few helper functions for size calculations that
 997// multiply integers together, making sure that they're non-negative
 998// and no overflow occurs.
 999
1000// return 1 if the sum is valid, 0 on overflow.
1001// negative terms are considered invalid.
1002static int stbi__addsizes_valid(int a, int b)
1003{
1004   if (b < 0) return 0;
1005   // now 0 <= b <= INT_MAX, hence also
1006   // 0 <= INT_MAX - b <= INTMAX.
1007   // And "a + b <= INT_MAX" (which might overflow) is the
1008   // same as a <= INT_MAX - b (no overflow)
1009   return a <= INT_MAX - b;
1010}
1011
1012// returns 1 if the product is valid, 0 on overflow.
1013// negative factors are considered invalid.
1014static int stbi__mul2sizes_valid(int a, int b)
1015{
1016   if (a < 0 || b < 0) return 0;
1017   if (b == 0) return 1; // mul-by-0 is always safe
1018   // portable way to check for no overflows in a*b
1019   return a <= INT_MAX/b;
1020}
1021
1022#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1023// returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
1024static int stbi__mad2sizes_valid(int a, int b, int add)
1025{
1026   return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
1027}
1028#endif
1029
1030// returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
1031static int stbi__mad3sizes_valid(int a, int b, int c, int add)
1032{
1033   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1034      stbi__addsizes_valid(a*b*c, add);
1035}
1036
1037// returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
1038#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1039static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
1040{
1041   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1042      stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
1043}
1044#endif
1045
1046#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1047// mallocs with size overflow checking
1048static void *stbi__malloc_mad2(int a, int b, int add)
1049{
1050   if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
1051   return stbi__malloc(a*b + add);
1052}
1053#endif
1054
1055static void *stbi__malloc_mad3(int a, int b, int c, int add)
1056{
1057   if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
1058   return stbi__malloc(a*b*c + add);
1059}
1060
1061#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1062static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
1063{
1064   if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
1065   return stbi__malloc(a*b*c*d + add);
1066}
1067#endif
1068
1069// returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1 inclusive), 0 on overflow.
1070static int stbi__addints_valid(int a, int b)
1071{
1072   if ((a >= 0) != (b >= 0)) return 1; // a and b have different signs, so no overflow
1073   if (a < 0 && b < 0) return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot overflow since b < 0.
1074   return a <= INT_MAX - b;
1075}
1076
1077// returns 1 if the product of two ints fits in a signed short, 0 on overflow.
1078static int stbi__mul2shorts_valid(int a, int b)
1079{
1080   if (b == 0 || b == -1) return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b doesn't overflow
1081   if ((a >= 0) == (b >= 0)) return a <= SHRT_MAX/b; // product is positive, so similar to mul2sizes_valid
1082   if (b < 0) return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN
1083   return a >= SHRT_MIN / b;
1084}
1085
1086// stbi__err - error
1087// stbi__errpf - error returning pointer to float
1088// stbi__errpuc - error returning pointer to unsigned char
1089
1090#ifdef STBI_NO_FAILURE_STRINGS
1091   #define stbi__err(x,y)  0
1092#elif defined(STBI_FAILURE_USERMSG)
1093   #define stbi__err(x,y)  stbi__err(y)
1094#else
1095   #define stbi__err(x,y)  stbi__err(x)
1096#endif
1097
1098#define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
1099#define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
1100
1101STBIDEF void stbi_image_free(void *retval_from_stbi_load)
1102{
1103   STBI_FREE(retval_from_stbi_load);
1104}
1105
1106#ifndef STBI_NO_LINEAR
1107static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1108#endif
1109
1110#ifndef STBI_NO_HDR
1111static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
1112#endif
1113
1114static int stbi__vertically_flip_on_load_global = 0;
1115
1116STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1117{
1118   stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
1119}
1120
1121#ifndef STBI_THREAD_LOCAL
1122#define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
1123#else
1124static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
1125
1126STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
1127{
1128   stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
1129   stbi__vertically_flip_on_load_set = 1;
1130}
1131
1132#define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
1133                                         ? stbi__vertically_flip_on_load_local  \
1134                                         : stbi__vertically_flip_on_load_global)
1135#endif // STBI_THREAD_LOCAL
1136
1137static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
1138{
1139   memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
1140   ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
1141   ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
1142   ri->num_channels = 0;
1143
1144   // test the formats with a very explicit header first (at least a FOURCC
1145   // or distinctive magic number first)
1146   #ifndef STBI_NO_PNG
1147   if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
1148   #endif
1149   #ifndef STBI_NO_BMP
1150   if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
1151   #endif
1152   #ifndef STBI_NO_GIF
1153   if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
1154   #endif
1155   #ifndef STBI_NO_PSD
1156   if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
1157   #else
1158   STBI_NOTUSED(bpc);
1159   #endif
1160   #ifndef STBI_NO_PIC
1161   if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
1162   #endif
1163
1164   // then the formats that can end up attempting to load with just 1 or 2
1165   // bytes matching expectations; these are prone to false positives, so
1166   // try them later
1167   #ifndef STBI_NO_JPEG
1168   if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
1169   #endif
1170   #ifndef STBI_NO_PNM
1171   if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
1172   #endif
1173
1174   #ifndef STBI_NO_HDR
1175   if (stbi__hdr_test(s)) {
1176      float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
1177      return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1178   }
1179   #endif
1180
1181   #ifndef STBI_NO_TGA
1182   // test tga last because it's a crappy test!
1183   if (stbi__tga_test(s))
1184      return stbi__tga_load(s,x,y,comp,req_comp, ri);
1185   #endif
1186
1187   return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1188}
1189
1190static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1191{
1192   int i;
1193   int img_len = w * h * channels;
1194   stbi_uc *reduced;
1195
1196   reduced = (stbi_uc *) stbi__malloc(img_len);
1197   if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
1198
1199   for (i = 0; i < img_len; ++i)
1200      reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
1201
1202   STBI_FREE(orig);
1203   return reduced;
1204}
1205
1206static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1207{
1208   int i;
1209   int img_len = w * h * channels;
1210   stbi__uint16 *enlarged;
1211
1212   enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
1213   if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1214
1215   for (i = 0; i < img_len; ++i)
1216      enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
1217
1218   STBI_FREE(orig);
1219   return enlarged;
1220}
1221
1222static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
1223{
1224   int row;
1225   size_t bytes_per_row = (size_t)w * bytes_per_pixel;
1226   stbi_uc temp[2048];
1227   stbi_uc *bytes = (stbi_uc *)image;
1228
1229   for (row = 0; row < (h>>1); row++) {
1230      stbi_uc *row0 = bytes + row*bytes_per_row;
1231      stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
1232      // swap row0 with row1
1233      size_t bytes_left = bytes_per_row;
1234      while (bytes_left) {
1235         size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
1236         memcpy(temp, row0, bytes_copy);
1237         memcpy(row0, row1, bytes_copy);
1238         memcpy(row1, temp, bytes_copy);
1239         row0 += bytes_copy;
1240         row1 += bytes_copy;
1241         bytes_left -= bytes_copy;
1242      }
1243   }
1244}
1245
1246#ifndef STBI_NO_GIF
1247static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
1248{
1249   int slice;
1250   int slice_size = w * h * bytes_per_pixel;
1251
1252   stbi_uc *bytes = (stbi_uc *)image;
1253   for (slice = 0; slice < z; ++slice) {
1254      stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
1255      bytes += slice_size;
1256   }
1257}
1258#endif
1259
1260static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1261{
1262   stbi__result_info ri;
1263   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1264
1265   if (result == NULL)
1266      return NULL;
1267
1268   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1269   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1270
1271   if (ri.bits_per_channel != 8) {
1272      result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1273      ri.bits_per_channel = 8;
1274   }
1275
1276   // @TODO: move stbi__convert_format to here
1277
1278   if (stbi__vertically_flip_on_load) {
1279      int channels = req_comp ? req_comp : *comp;
1280      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
1281   }
1282
1283   return (unsigned char *) result;
1284}
1285
1286static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1287{
1288   stbi__result_info ri;
1289   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1290
1291   if (result == NULL)
1292      return NULL;
1293
1294   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1295   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1296
1297   if (ri.bits_per_channel != 16) {
1298      result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1299      ri.bits_per_channel = 16;
1300   }
1301
1302   // @TODO: move stbi__convert_format16 to here
1303   // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
1304
1305   if (stbi__vertically_flip_on_load) {
1306      int channels = req_comp ? req_comp : *comp;
1307      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
1308   }
1309
1310   return (stbi__uint16 *) result;
1311}
1312
1313#if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
1314static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1315{
1316   if (stbi__vertically_flip_on_load && result != NULL) {
1317      int channels = req_comp ? req_comp : *comp;
1318      stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
1319   }
1320}
1321#endif
1322
1323#ifndef STBI_NO_STDIO
1324
1325#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1326STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
1327STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
1328#endif
1329
1330#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1331STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
1332{
1333	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
1334}
1335#endif
1336
1337static FILE *stbi__fopen(char const *filename, char const *mode)
1338{
1339   FILE *f;
1340#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1341   wchar_t wMode[64];
1342   wchar_t wFilename[1024];
1343	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
1344      return 0;
1345
1346	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
1347      return 0;
1348
1349#if defined(_MSC_VER) && _MSC_VER >= 1400
1350	if (0 != _wfopen_s(&f, wFilename, wMode))
1351		f = 0;
1352#else
1353   f = _wfopen(wFilename, wMode);
1354#endif
1355
1356#elif defined(_MSC_VER) && _MSC_VER >= 1400
1357   if (0 != fopen_s(&f, filename, mode))
1358      f=0;
1359#else
1360   f = fopen(filename, mode);
1361#endif
1362   return f;
1363}
1364
1365
1366STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1367{
1368   FILE *f = stbi__fopen(filename, "rb");
1369   unsigned char *result;
1370   if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1371   result = stbi_load_from_file(f,x,y,comp,req_comp);
1372   fclose(f);
1373   return result;
1374}
1375
1376STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1377{
1378   unsigned char *result;
1379   stbi__context s;
1380   stbi__start_file(&s,f);
1381   result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1382   if (result) {
1383      // need to 'unget' all the characters in the IO buffer
1384      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1385   }
1386   return result;
1387}
1388
1389STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1390{
1391   stbi__uint16 *result;
1392   stbi__context s;
1393   stbi__start_file(&s,f);
1394   result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
1395   if (result) {
1396      // need to 'unget' all the characters in the IO buffer
1397      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1398   }
1399   return result;
1400}
1401
1402STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1403{
1404   FILE *f = stbi__fopen(filename, "rb");
1405   stbi__uint16 *result;
1406   if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
1407   result = stbi_load_from_file_16(f,x,y,comp,req_comp);
1408   fclose(f);
1409   return result;
1410}
1411
1412
1413#endif //!STBI_NO_STDIO
1414
1415STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
1416{
1417   stbi__context s;
1418   stbi__start_mem(&s,buffer,len);
1419   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1420}
1421
1422STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
1423{
1424   stbi__context s;
1425   stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1426   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1427}
1428
1429STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1430{
1431   stbi__context s;
1432   stbi__start_mem(&s,buffer,len);
1433   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1434}
1435
1436STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1437{
1438   stbi__context s;
1439   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1440   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1441}
1442
1443#ifndef STBI_NO_GIF
1444STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
1445{
1446   unsigned char *result;
1447   stbi__context s;
1448   stbi__start_mem(&s,buffer,len);
1449
1450   result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
1451   if (stbi__vertically_flip_on_load) {
1452      stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
1453   }
1454
1455   return result;
1456}
1457#endif
1458
1459#ifndef STBI_NO_LINEAR
1460static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1461{
1462   unsigned char *data;
1463   #ifndef STBI_NO_HDR
1464   if (stbi__hdr_test(s)) {
1465      stbi__result_info ri;
1466      float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
1467      if (hdr_data)
1468         stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1469      return hdr_data;
1470   }
1471   #endif
1472   data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1473   if (data)
1474      return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1475   return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1476}
1477
1478STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1479{
1480   stbi__context s;
1481   stbi__start_mem(&s,buffer,len);
1482   return stbi__loadf_main(&s,x,y,comp,req_comp);
1483}
1484
1485STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1486{
1487   stbi__context s;
1488   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1489   return stbi__loadf_main(&s,x,y,comp,req_comp);
1490}
1491
1492#ifndef STBI_NO_STDIO
1493STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1494{
1495   float *result;
1496   FILE *f = stbi__fopen(filename, "rb");
1497   if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1498   result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1499   fclose(f);
1500   return result;
1501}
1502
1503STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1504{
1505   stbi__context s;
1506   stbi__start_file(&s,f);
1507   return stbi__loadf_main(&s,x,y,comp,req_comp);
1508}
1509#endif // !STBI_NO_STDIO
1510
1511#endif // !STBI_NO_LINEAR
1512
1513// these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1514// defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1515// reports false!
1516
1517STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1518{
1519   #ifndef STBI_NO_HDR
1520   stbi__context s;
1521   stbi__start_mem(&s,buffer,len);
1522   return stbi__hdr_test(&s);
1523   #else
1524   STBI_NOTUSED(buffer);
1525   STBI_NOTUSED(len);
1526   return 0;
1527   #endif
1528}
1529
1530#ifndef STBI_NO_STDIO
1531STBIDEF int      stbi_is_hdr          (char const *filename)
1532{
1533   FILE *f = stbi__fopen(filename, "rb");
1534   int result=0;
1535   if (f) {
1536      result = stbi_is_hdr_from_file(f);
1537      fclose(f);
1538   }
1539   return result;
1540}
1541
1542STBIDEF int stbi_is_hdr_from_file(FILE *f)
1543{
1544   #ifndef STBI_NO_HDR
1545   long pos = ftell(f);
1546   int res;
1547   stbi__context s;
1548   stbi__start_file(&s,f);
1549   res = stbi__hdr_test(&s);
1550   fseek(f, pos, SEEK_SET);
1551   return res;
1552   #else
1553   STBI_NOTUSED(f);
1554   return 0;
1555   #endif
1556}
1557#endif // !STBI_NO_STDIO
1558
1559STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1560{
1561   #ifndef STBI_NO_HDR
1562   stbi__context s;
1563   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1564   return stbi__hdr_test(&s);
1565   #else
1566   STBI_NOTUSED(clbk);
1567   STBI_NOTUSED(user);
1568   return 0;
1569   #endif
1570}
1571
1572#ifndef STBI_NO_LINEAR
1573static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1574
1575STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1576STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1577#endif
1578
1579static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1580
1581STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1582STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1583
1584
1585//////////////////////////////////////////////////////////////////////////////
1586//
1587// Common code used by all image loaders
1588//
1589
1590enum
1591{
1592   STBI__SCAN_load=0,
1593   STBI__SCAN_type,
1594   STBI__SCAN_header
1595};
1596
1597static void stbi__refill_buffer(stbi__context *s)
1598{
1599   int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1600   s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
1601   if (n == 0) {
1602      // at end of file, treat same as if from memory, but need to handle case
1603      // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1604      s->read_from_callbacks = 0;
1605      s->img_buffer = s->buffer_start;
1606      s->img_buffer_end = s->buffer_start+1;
1607      *s->img_buffer = 0;
1608   } else {
1609      s->img_buffer = s->buffer_start;
1610      s->img_buffer_end = s->buffer_start + n;
1611   }
1612}
1613
1614stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1615{
1616   if (s->img_buffer < s->img_buffer_end)
1617      return *s->img_buffer++;
1618   if (s->read_from_callbacks) {
1619      stbi__refill_buffer(s);
1620      return *s->img_buffer++;
1621   }
1622   return 0;
1623}
1624
1625#if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1626// nothing
1627#else
1628stbi_inline static int stbi__at_eof(stbi__context *s)
1629{
1630   if (s->io.read) {
1631      if (!(s->io.eof)(s->io_user_data)) return 0;
1632      // if feof() is true, check if buffer = end
1633      // special case: we've only got the special 0 character at the end
1634      if (s->read_from_callbacks == 0) return 1;
1635   }
1636
1637   return s->img_buffer >= s->img_buffer_end;
1638}
1639#endif
1640
1641#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
1642// nothing
1643#else
1644static void stbi__skip(stbi__context *s, int n)
1645{
1646   if (n == 0) return;  // already there!
1647   if (n < 0) {
1648      s->img_buffer = s->img_buffer_end;
1649      return;
1650   }
1651   if (s->io.read) {
1652      int blen = (int) (s->img_buffer_end - s->img_buffer);
1653      if (blen < n) {
1654         s->img_buffer = s->img_buffer_end;
1655         (s->io.skip)(s->io_user_data, n - blen);
1656         return;
1657      }
1658   }
1659   s->img_buffer += n;
1660}
1661#endif
1662
1663#if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
1664// nothing
1665#else
1666static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1667{
1668   if (s->io.read) {
1669      int blen = (int) (s->img_buffer_end - s->img_buffer);
1670      if (blen < n) {
1671         int res, count;
1672
1673         memcpy(buffer, s->img_buffer, blen);
1674
1675         count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1676         res = (count == (n-blen));
1677         s->img_buffer = s->img_buffer_end;
1678         return res;
1679      }
1680   }
1681
1682   if (s->img_buffer+n <= s->img_buffer_end) {
1683      memcpy(buffer, s->img_buffer, n);
1684      s->img_buffer += n;
1685      return 1;
1686   } else
1687      return 0;
1688}
1689#endif
1690
1691#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1692// nothing
1693#else
1694static int stbi__get16be(stbi__context *s)
1695{
1696   int z = stbi__get8(s);
1697   return (z << 8) + stbi__get8(s);
1698}
1699#endif
1700
1701#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1702// nothing
1703#else
1704static stbi__uint32 stbi__get32be(stbi__context *s)
1705{
1706   stbi__uint32 z = stbi__get16be(s);
1707   return (z << 16) + stbi__get16be(s);
1708}
1709#endif
1710
1711#if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1712// nothing
1713#else
1714static int stbi__get16le(stbi__context *s)
1715{
1716   int z = stbi__get8(s);
1717   return z + (stbi__get8(s) << 8);
1718}
1719#endif
1720
1721#ifndef STBI_NO_BMP
1722static stbi__uint32 stbi__get32le(stbi__context *s)
1723{
1724   stbi__uint32 z = stbi__get16le(s);
1725   z += (stbi__uint32)stbi__get16le(s) << 16;
1726   return z;
1727}
1728#endif
1729
1730#define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1731
1732#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1733// nothing
1734#else
1735//////////////////////////////////////////////////////////////////////////////
1736//
1737//  generic converter from built-in img_n to req_comp
1738//    individual types do this automatically as much as possible (e.g. jpeg
1739//    does all cases internally since it needs to colorspace convert anyway,
1740//    and it never has alpha, so very few cases ). png can automatically
1741//    interleave an alpha=255 channel, but falls back to this for other cases
1742//
1743//  assume data buffer is malloced, so malloc a new one and free that one
1744//  only failure mode is malloc failing
1745
1746static stbi_uc stbi__compute_y(int r, int g, int b)
1747{
1748   return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1749}
1750#endif
1751
1752#if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1753// nothing
1754#else
1755static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1756{
1757   int i,j;
1758   unsigned char *good;
1759
1760   if (req_comp == img_n) return data;
1761   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1762
1763   good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
1764   if (good == NULL) {
1765      STBI_FREE(data);
1766      return stbi__errpuc("outofmem", "Out of memory");
1767   }
1768
1769   for (j=0; j < (int) y; ++j) {
1770      unsigned char *src  = data + j * x * img_n   ;
1771      unsigned char *dest = good + j * x * req_comp;
1772
1773      #define STBI__COMBO(a,b)  ((a)*8+(b))
1774      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1775      // convert source image with img_n components to one with req_comp components;
1776      // avoid switch per pixel, so use switch per scanline and massive macros
1777      switch (STBI__COMBO(img_n, req_comp)) {
1778         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
1779         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1780         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
1781         STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
1782         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1783         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
1784         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
1785         STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1786         STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
1787         STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1788         STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1789         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
1790         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
1791      }
1792      #undef STBI__CASE
1793   }
1794
1795   STBI_FREE(data);
1796   return good;
1797}
1798#endif
1799
1800#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1801// nothing
1802#else
1803static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
1804{
1805   return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
1806}
1807#endif
1808
1809#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1810// nothing
1811#else
1812static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1813{
1814   int i,j;
1815   stbi__uint16 *good;
1816
1817   if (req_comp == img_n) return data;
1818   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1819
1820   good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
1821   if (good == NULL) {
1822      STBI_FREE(data);
1823      return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1824   }
1825
1826   for (j=0; j < (int) y; ++j) {
1827      stbi__uint16 *src  = data + j * x * img_n   ;
1828      stbi__uint16 *dest = good + j * x * req_comp;
1829
1830      #define STBI__COMBO(a,b)  ((a)*8+(b))
1831      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1832      // convert source image with img_n components to one with req_comp components;
1833      // avoid switch per pixel, so use switch per scanline and massive macros
1834      switch (STBI__COMBO(img_n, req_comp)) {
1835         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
1836         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1837         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
1838         STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
1839         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1840         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
1841         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
1842         STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1843         STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
1844         STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1845         STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1846         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
1847         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
1848      }
1849      #undef STBI__CASE
1850   }
1851
1852   STBI_FREE(data);
1853   return good;
1854}
1855#endif
1856
1857#ifndef STBI_NO_LINEAR
1858static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1859{
1860   int i,k,n;
1861   float *output;
1862   if (!data) return NULL;
1863   output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
1864   if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1865   // compute number of non-alpha components
1866   if (comp & 1) n = comp; else n = comp-1;
1867   for (i=0; i < x*y; ++i) {
1868      for (k=0; k < n; ++k) {
1869         output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1870      }
1871   }
1872   if (n < comp) {
1873      for (i=0; i < x*y; ++i) {
1874         output[i*comp + n] = data[i*comp + n]/255.0f;
1875      }
1876   }
1877   STBI_FREE(data);
1878   return output;
1879}
1880#endif
1881
1882#ifndef STBI_NO_HDR
1883#define stbi__float2int(x)   ((int) (x))
1884static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1885{
1886   int i,k,n;
1887   stbi_uc *output;
1888   if (!data) return NULL;
1889   output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
1890   if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1891   // compute number of non-alpha components
1892   if (comp & 1) n = comp; else n = comp-1;
1893   for (i=0; i < x*y; ++i) {
1894      for (k=0; k < n; ++k) {
1895         float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1896         if (z < 0) z = 0;
1897         if (z > 255) z = 255;
1898         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1899      }
1900      if (k < comp) {
1901         float z = data[i*comp+k] * 255 + 0.5f;
1902         if (z < 0) z = 0;
1903         if (z > 255) z = 255;
1904         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1905      }
1906   }
1907   STBI_FREE(data);
1908   return output;
1909}
1910#endif
1911
1912//////////////////////////////////////////////////////////////////////////////
1913//
1914//  "baseline" JPEG/JFIF decoder
1915//
1916//    simple implementation
1917//      - doesn't support delayed output of y-dimension
1918//      - simple interface (only one output format: 8-bit interleaved RGB)
1919//      - doesn't try to recover corrupt jpegs
1920//      - doesn't allow partial loading, loading multiple at once
1921//      - still fast on x86 (copying globals into locals doesn't help x86)
1922//      - allocates lots of intermediate memory (full size of all components)
1923//        - non-interleaved case requires this anyway
1924//        - allows good upsampling (see next)
1925//    high-quality
1926//      - upsampled channels are bilinearly interpolated, even across blocks
1927//      - quality integer IDCT derived from IJG's 'slow'
1928//    performance
1929//      - fast huffman; reasonable integer IDCT
1930//      - some SIMD kernels for common paths on targets with SSE2/NEON
1931//      - uses a lot of intermediate memory, could cache poorly
1932
1933#ifndef STBI_NO_JPEG
1934
1935// huffman decoding acceleration
1936#define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1937
1938typedef struct
1939{
1940   stbi_uc  fast[1 << FAST_BITS];
1941   // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1942   stbi__uint16 code[256];
1943   stbi_uc  values[256];
1944   stbi_uc  size[257];
1945   unsigned int maxcode[18];
1946   int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1947} stbi__huffman;
1948
1949typedef struct
1950{
1951   stbi__context *s;
1952   stbi__huffman huff_dc[4];
1953   stbi__huffman huff_ac[4];
1954   stbi__uint16 dequant[4][64];
1955   stbi__int16 fast_ac[4][1 << FAST_BITS];
1956
1957// sizes for components, interleaved MCUs
1958   int img_h_max, img_v_max;
1959   int img_mcu_x, img_mcu_y;
1960   int img_mcu_w, img_mcu_h;
1961
1962// definition of jpeg image component
1963   struct
1964   {
1965      int id;
1966      int h,v;
1967      int tq;
1968      int hd,ha;
1969      int dc_pred;
1970
1971      int x,y,w2,h2;
1972      stbi_uc *data;
1973      void *raw_data, *raw_coeff;
1974      stbi_uc *linebuf;
1975      short   *coeff;   // progressive only
1976      int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1977   } img_comp[4];
1978
1979   stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1980   int            code_bits;   // number of valid bits
1981   unsigned char  marker;      // marker seen while filling entropy buffer
1982   int            nomore;      // flag if we saw a marker so must stop
1983
1984   int            progressive;
1985   int            spec_start;
1986   int            spec_end;
1987   int            succ_high;
1988   int            succ_low;
1989   int            eob_run;
1990   int            jfif;
1991   int            app14_color_transform; // Adobe APP14 tag
1992   int            rgb;
1993
1994   int scan_n, order[4];
1995   int restart_interval, todo;
1996
1997// kernels
1998   void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1999   void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
2000   stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
2001} stbi__jpeg;
2002
2003static int stbi__build_huffman(stbi__huffman *h, int *count)
2004{
2005   int i,j,k=0;
2006   unsigned int code;
2007   // build size list for each symbol (from JPEG spec)
2008   for (i=0; i < 16; ++i) {
2009      for (j=0; j < count[i]; ++j) {
2010         h->size[k++] = (stbi_uc) (i+1);
2011         if(k >= 257) return stbi__err("bad size list","Corrupt JPEG");
2012      }
2013   }
2014   h->size[k] = 0;
2015
2016   // compute actual symbols (from jpeg spec)
2017   code = 0;
2018   k = 0;
2019   for(j=1; j <= 16; ++j) {
2020      // compute delta to add to code to compute symbol id
2021      h->delta[j] = k - code;
2022      if (h->size[k] == j) {
2023         while (h->size[k] == j)
2024            h->code[k++] = (stbi__uint16) (code++);
2025         if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
2026      }
2027      // compute largest code + 1 for this size, preshifted as needed later
2028      h->maxcode[j] = code << (16-j);
2029      code <<= 1;
2030   }
2031   h->maxcode[j] = 0xffffffff;
2032
2033   // build non-spec acceleration table; 255 is flag for not-accelerated
2034   memset(h->fast, 255, 1 << FAST_BITS);
2035   for (i=0; i < k; ++i) {
2036      int s = h->size[i];
2037      if (s <= FAST_BITS) {
2038         int c = h->code[i] << (FAST_BITS-s);
2039         int m = 1 << (FAST_BITS-s);
2040         for (j=0; j < m; ++j) {
2041            h->fast[c+j] = (stbi_uc) i;
2042         }
2043      }
2044   }
2045   return 1;
2046}
2047
2048// build a table that decodes both magnitude and value of small ACs in
2049// one go.
2050static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
2051{
2052   int i;
2053   for (i=0; i < (1 << FAST_BITS); ++i) {
2054      stbi_uc fast = h->fast[i];
2055      fast_ac[i] = 0;
2056      if (fast < 255) {
2057         int rs = h->values[fast];
2058         int run = (rs >> 4) & 15;
2059         int magbits = rs & 15;
2060         int len = h->size[fast];
2061
2062         if (magbits && len + magbits <= FAST_BITS) {
2063            // magnitude code followed by receive_extend code
2064            int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
2065            int m = 1 << (magbits - 1);
2066            if (k < m) k += (~0U << magbits) + 1;
2067            // if the result is small enough, we can fit it in fast_ac table
2068            if (k >= -128 && k <= 127)
2069               fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
2070         }
2071      }
2072   }
2073}
2074
2075static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
2076{
2077   do {
2078      unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
2079      if (b == 0xff) {
2080         int c = stbi__get8(j->s);
2081         while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
2082         if (c != 0) {
2083            j->marker = (unsigned char) c;
2084            j->nomore = 1;
2085            return;
2086         }
2087      }
2088      j->code_buffer |= b << (24 - j->code_bits);
2089      j->code_bits += 8;
2090   } while (j->code_bits <= 24);
2091}
2092
2093// (1 << n) - 1
2094static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
2095
2096// decode a jpeg huffman value from the bitstream
2097stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
2098{
2099   unsigned int temp;
2100   int c,k;
2101
2102   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2103
2104   // look at the top FAST_BITS and determine what symbol ID it is,
2105   // if the code is <= FAST_BITS
2106   c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2107   k = h->fast[c];
2108   if (k < 255) {
2109      int s = h->size[k];
2110      if (s > j->code_bits)
2111         return -1;
2112      j->code_buffer <<= s;
2113      j->code_bits -= s;
2114      return h->values[k];
2115   }
2116
2117   // naive test is to shift the code_buffer down so k bits are
2118   // valid, then test against maxcode. To speed this up, we've
2119   // preshifted maxcode left so that it has (16-k) 0s at the
2120   // end; in other words, regardless of the number of bits, it
2121   // wants to be compared against something shifted to have 16;
2122   // that way we don't need to shift inside the loop.
2123   temp = j->code_buffer >> 16;
2124   for (k=FAST_BITS+1 ; ; ++k)
2125      if (temp < h->maxcode[k])
2126         break;
2127   if (k == 17) {
2128      // error! code not found
2129      j->code_bits -= 16;
2130      return -1;
2131   }
2132
2133   if (k > j->code_bits)
2134      return -1;
2135
2136   // convert the huffman code to the symbol id
2137   c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
2138   if(c < 0 || c >= 256) // symbol id out of bounds!
2139       return -1;
2140   STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
2141
2142   // convert the id to a symbol
2143   j->code_bits -= k;
2144   j->code_buffer <<= k;
2145   return h->values[c];
2146}
2147
2148// bias[n] = (-1<<n) + 1
2149static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
2150
2151// combined JPEG 'receive' and JPEG 'extend', since baseline
2152// always extends everything it receives.
2153stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
2154{
2155   unsigned int k;
2156   int sgn;
2157   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2158   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
2159
2160   sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
2161   k = stbi_lrot(j->code_buffer, n);
2162   j->code_buffer = k & ~stbi__bmask[n];
2163   k &= stbi__bmask[n];
2164   j->code_bits -= n;
2165   return k + (stbi__jbias[n] & (sgn - 1));
2166}
2167
2168// get some unsigned bits
2169stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
2170{
2171   unsigned int k;
2172   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2173   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
2174   k = stbi_lrot(j->code_buffer, n);
2175   j->code_buffer = k & ~stbi__bmask[n];
2176   k &= stbi__bmask[n];
2177   j->code_bits -= n;
2178   return k;
2179}
2180
2181stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
2182{
2183   unsigned int k;
2184   if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
2185   if (j->code_bits < 1) return 0; // ran out of bits from stream, return 0s intead of continuing
2186   k = j->code_buffer;
2187   j->code_buffer <<= 1;
2188   --j->code_bits;
2189   return k & 0x80000000;
2190}
2191
2192// given a value that's at position X in the zigzag stream,
2193// where does it appear in the 8x8 matrix coded as row-major?
2194static const stbi_uc stbi__jpeg_dezigzag[64+15] =
2195{
2196    0,  1,  8, 16,  9,  2,  3, 10,
2197   17, 24, 32, 25, 18, 11,  4,  5,
2198   12, 19, 26, 33, 40, 48, 41, 34,
2199   27, 20, 13,  6,  7, 14, 21, 28,
2200   35, 42, 49, 56, 57, 50, 43, 36,
2201   29, 22, 15, 23, 30, 37, 44, 51,
2202   58, 59, 52, 45, 38, 31, 39, 46,
2203   53, 60, 61, 54, 47, 55, 62, 63,
2204   // let corrupt input sample past end
2205   63, 63, 63, 63, 63, 63, 63, 63,
2206   63, 63, 63, 63, 63, 63, 63
2207};
2208
2209// decode one 64-entry block--
2210static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
2211{
2212   int diff,dc,k;
2213   int t;
2214
2215   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2216   t = stbi__jpeg_huff_decode(j, hdc);
2217   if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
2218
2219   // 0 all the ac values now so we can do it 32-bits at a time
2220   memset(data,0,64*sizeof(data[0]));
2221
2222   diff = t ? stbi__extend_receive(j, t) : 0;
2223   if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta","Corrupt JPEG");
2224   dc = j->img_comp[b].dc_pred + diff;
2225   j->img_comp[b].dc_pred = dc;
2226   if (!stbi__mul2shorts_valid(dc, dequant[0])) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2227   data[0] = (short) (dc * dequant[0]);
2228
2229   // decode AC components, see JPEG spec
2230   k = 1;
2231   do {
2232      unsigned int zig;
2233      int c,r,s;
2234      if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2235      c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2236      r = fac[c];
2237      if (r) { // fast-AC path
2238         k += (r >> 4) & 15; // run
2239         s = r & 15; // combined length
2240         if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
2241         j->code_buffer <<= s;
2242         j->code_bits -= s;
2243         // decode into unzigzag'd location
2244         zig = stbi__jpeg_dezigzag[k++];
2245         data[zig] = (short) ((r >> 8) * dequant[zig]);
2246      } else {
2247         int rs = stbi__jpeg_huff_decode(j, hac);
2248         if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2249         s = rs & 15;
2250         r = rs >> 4;
2251         if (s == 0) {
2252            if (rs != 0xf0) break; // end block
2253            k += 16;
2254         } else {
2255            k += r;
2256            // decode into unzigzag'd location
2257            zig = stbi__jpeg_dezigzag[k++];
2258            data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
2259         }
2260      }
2261   } while (k < 64);
2262   return 1;
2263}
2264
2265static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
2266{
2267   int diff,dc;
2268   int t;
2269   if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2270
2271   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2272
2273   if (j->succ_high == 0) {
2274      // first scan for DC coefficient, must be first
2275      memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
2276      t = stbi__jpeg_huff_decode(j, hdc);
2277      if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2278      diff = t ? stbi__extend_receive(j, t) : 0;
2279
2280      if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta", "Corrupt JPEG");
2281      dc = j->img_comp[b].dc_pred + diff;
2282      j->img_comp[b].dc_pred = dc;
2283      if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2284      data[0] = (short) (dc * (1 << j->succ_low));
2285   } else {
2286      // refinement scan for DC coefficient
2287      if (stbi__jpeg_get_bit(j))
2288         data[0] += (short) (1 << j->succ_low);
2289   }
2290   return 1;
2291}
2292
2293// @OPTIMIZE: store non-zigzagged during the decode passes,
2294// and only de-zigzag when dequantizing
2295static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
2296{
2297   int k;
2298   if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2299
2300   if (j->succ_high == 0) {
2301      int shift = j->succ_low;
2302
2303      if (j->eob_run) {
2304         --j->eob_run;
2305         return 1;
2306      }
2307
2308      k = j->spec_start;
2309      do {
2310         unsigned int zig;
2311         int c,r,s;
2312         if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2313         c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2314         r = fac[c];
2315         if (r) { // fast-AC path
2316            k += (r >> 4) & 15; // run
2317            s = r & 15; // combined length
2318            if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
2319            j->code_buffer <<= s;
2320            j->code_bits -= s;
2321            zig = stbi__jpeg_dezigzag[k++];
2322            data[zig] = (short) ((r >> 8) * (1 << shift));
2323         } else {
2324            int rs = stbi__jpeg_huff_decode(j, hac);
2325            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2326            s = rs & 15;
2327            r = rs >> 4;
2328            if (s == 0) {
2329               if (r < 15) {
2330                  j->eob_run = (1 << r);
2331                  if (r)
2332                     j->eob_run += stbi__jpeg_get_bits(j, r);
2333                  --j->eob_run;
2334                  break;
2335               }
2336               k += 16;
2337            } else {
2338               k += r;
2339               zig = stbi__jpeg_dezigzag[k++];
2340               data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
2341            }
2342         }
2343      } while (k <= j->spec_end);
2344   } else {
2345      // refinement scan for these AC coefficients
2346
2347      short bit = (short) (1 << j->succ_low);
2348
2349      if (j->eob_run) {
2350         --j->eob_run;
2351         for (k = j->spec_start; k <= j->spec_end; ++k) {
2352            short *p = &data[stbi__jpeg_dezigzag[k]];
2353            if (*p != 0)
2354               if (stbi__jpeg_get_bit(j))
2355                  if ((*p & bit)==0) {
2356                     if (*p > 0)
2357                        *p += bit;
2358                     else
2359                        *p -= bit;
2360                  }
2361         }
2362      } else {
2363         k = j->spec_start;
2364         do {
2365            int r,s;
2366            int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
2367            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2368            s = rs & 15;
2369            r = rs >> 4;
2370            if (s == 0) {
2371               if (r < 15) {
2372                  j->eob_run = (1 << r) - 1;
2373                  if (r)
2374                     j->eob_run += stbi__jpeg_get_bits(j, r);
2375                  r = 64; // force end of block
2376               } else {
2377                  // r=15 s=0 should write 16 0s, so we just do
2378                  // a run of 15 0s and then write s (which is 0),
2379                  // so we don't have to do anything special here
2380               }
2381            } else {
2382               if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
2383               // sign bit
2384               if (stbi__jpeg_get_bit(j))
2385                  s = bit;
2386               else
2387                  s = -bit;
2388            }
2389
2390            // advance by r
2391            while (k <= j->spec_end) {
2392               short *p = &data[stbi__jpeg_dezigzag[k++]];
2393               if (*p != 0) {
2394                  if (stbi__jpeg_get_bit(j))
2395                     if ((*p & bit)==0) {
2396                        if (*p > 0)
2397                           *p += bit;
2398                        else
2399                           *p -= bit;
2400                     }
2401               } else {
2402                  if (r == 0) {
2403                     *p = (short) s;
2404                     break;
2405                  }
2406                  --r;
2407               }
2408            }
2409         } while (k <= j->spec_end);
2410      }
2411   }
2412   return 1;
2413}
2414
2415// take a -128..127 value and stbi__clamp it and convert to 0..255
2416stbi_inline static stbi_uc stbi__clamp(int x)
2417{
2418   // trick to use a single test to catch both cases
2419   if ((unsigned int) x > 255) {
2420      if (x < 0) return 0;
2421      if (x > 255) return 255;
2422   }
2423   return (stbi_uc) x;
2424}
2425
2426#define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
2427#define stbi__fsh(x)  ((x) * 4096)
2428
2429// derived from jidctint -- DCT_ISLOW
2430#define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
2431   int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
2432   p2 = s2;                                    \
2433   p3 = s6;                                    \
2434   p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
2435   t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
2436   t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
2437   p2 = s0;                                    \
2438   p3 = s4;                                    \
2439   t0 = stbi__fsh(p2+p3);                      \
2440   t1 = stbi__fsh(p2-p3);                      \
2441   x0 = t0+t3;                                 \
2442   x3 = t0-t3;                                 \
2443   x1 = t1+t2;                                 \
2444   x2 = t1-t2;                                 \
2445   t0 = s7;                                    \
2446   t1 = s5;                                    \
2447   t2 = s3;                                    \
2448   t3 = s1;                                    \
2449   p3 = t0+t2;                                 \
2450   p4 = t1+t3;                                 \
2451   p1 = t0+t3;                                 \
2452   p2 = t1+t2;                                 \
2453   p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
2454   t0 = t0*stbi__f2f( 0.298631336f);           \
2455   t1 = t1*stbi__f2f( 2.053119869f);           \
2456   t2 = t2*stbi__f2f( 3.072711026f);           \
2457   t3 = t3*stbi__f2f( 1.501321110f);           \
2458   p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
2459   p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
2460   p3 = p3*stbi__f2f(-1.961570560f);           \
2461   p4 = p4*stbi__f2f(-0.390180644f);           \
2462   t3 += p1+p4;                                \
2463   t2 += p2+p3;                                \
2464   t1 += p2+p4;                                \
2465   t0 += p1+p3;
2466
2467static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2468{
2469   int i,val[64],*v=val;
2470   stbi_uc *o;
2471   short *d = data;
2472
2473   // columns
2474   for (i=0; i < 8; ++i,++d, ++v) {
2475      // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2476      if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
2477           && d[40]==0 && d[48]==0 && d[56]==0) {
2478         //    no shortcut                 0     seconds
2479         //    (1|2|3|4|5|6|7)==0          0     seconds
2480         //    all separate               -0.047 seconds
2481         //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
2482         int dcterm = d[0]*4;
2483         v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2484      } else {
2485         STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2486         // constants scaled things up by 1<<12; let's bring them back
2487         // down, but keep 2 extra bits of precision
2488         x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2489         v[ 0] = (x0+t3) >> 10;
2490         v[56] = (x0-t3) >> 10;
2491         v[ 8] = (x1+t2) >> 10;
2492         v[48] = (x1-t2) >> 10;
2493         v[16] = (x2+t1) >> 10;
2494         v[40] = (x2-t1) >> 10;
2495         v[24] = (x3+t0) >> 10;
2496         v[32] = (x3-t0) >> 10;
2497      }
2498   }
2499
2500   for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2501      // no fast case since the first 1D IDCT spread components out
2502      STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2503      // constants scaled things up by 1<<12, plus we had 1<<2 from first
2504      // loop, plus horizontal and vertical each scale by sqrt(8) so together
2505      // we've got an extra 1<<3, so 1<<17 total we need to remove.
2506      // so we want to round that, which means adding 0.5 * 1<<17,
2507      // aka 65536. Also, we'll end up with -128 to 127 that we want
2508      // to encode as 0..255 by adding 128, so we'll add that before the shift
2509      x0 += 65536 + (128<<17);
2510      x1 += 65536 + (128<<17);
2511      x2 += 65536 + (128<<17);
2512      x3 += 65536 + (128<<17);
2513      // tried computing the shifts into temps, or'ing the temps to see
2514      // if any were out of range, but that was slower
2515      o[0] = stbi__clamp((x0+t3) >> 17);
2516      o[7] = stbi__clamp((x0-t3) >> 17);
2517      o[1] = stbi__clamp((x1+t2) >> 17);
2518      o[6] = stbi__clamp((x1-t2) >> 17);
2519      o[2] = stbi__clamp((x2+t1) >> 17);
2520      o[5] = stbi__clamp((x2-t1) >> 17);
2521      o[3] = stbi__clamp((x3+t0) >> 17);
2522      o[4] = stbi__clamp((x3-t0) >> 17);
2523   }
2524}
2525
2526#ifdef STBI_SSE2
2527// sse2 integer IDCT. not the fastest possible implementation but it
2528// produces bit-identical results to the generic C version so it's
2529// fully "transparent".
2530static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2531{
2532   // This is constructed to match our regular (generic) integer IDCT exactly.
2533   __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2534   __m128i tmp;
2535
2536   // dot product constant: even elems=x, odd elems=y
2537   #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2538
2539   // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2540   // out(1) = c1[even]*x + c1[odd]*y
2541   #define dct_rot(out0,out1, x,y,c0,c1) \
2542      __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2543      __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2544      __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2545      __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2546      __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2547      __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2548
2549   // out = in << 12  (in 16-bit, out 32-bit)
2550   #define dct_widen(out, in) \
2551      __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2552      __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2553
2554   // wide add
2555   #define dct_wadd(out, a, b) \
2556      __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2557      __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2558
2559   // wide sub
2560   #define dct_wsub(out, a, b) \
2561      __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2562      __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2563
2564   // butterfly a/b, add bias, then shift by "s" and pack
2565   #define dct_bfly32o(out0, out1, a,b,bias,s) \
2566      { \
2567         __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2568         __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2569         dct_wadd(sum, abiased, b); \
2570         dct_wsub(dif, abiased, b); \
2571         out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2572         out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2573      }
2574
2575   // 8-bit interleave step (for transposes)
2576   #define dct_interleave8(a, b) \
2577      tmp = a; \
2578      a = _mm_unpacklo_epi8(a, b); \
2579      b = _mm_unpackhi_epi8(tmp, b)
2580
2581   // 16-bit interleave step (for transposes)
2582   #define dct_interleave16(a, b) \
2583      tmp = a; \
2584      a = _mm_unpacklo_epi16(a, b); \
2585      b = _mm_unpackhi_epi16(tmp, b)
2586
2587   #define dct_pass(bias,shift) \
2588      { \
2589         /* even part */ \
2590         dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2591         __m128i sum04 = _mm_add_epi16(row0, row4); \
2592         __m128i dif04 = _mm_sub_epi16(row0, row4); \
2593         dct_widen(t0e, sum04); \
2594         dct_widen(t1e, dif04); \
2595         dct_wadd(x0, t0e, t3e); \
2596         dct_wsub(x3, t0e, t3e); \
2597         dct_wadd(x1, t1e, t2e); \
2598         dct_wsub(x2, t1e, t2e); \
2599         /* odd part */ \
2600         dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2601         dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2602         __m128i sum17 = _mm_add_epi16(row1, row7); \
2603         __m128i sum35 = _mm_add_epi16(row3, row5); \
2604         dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2605         dct_wadd(x4, y0o, y4o); \
2606         dct_wadd(x5, y1o, y5o); \
2607         dct_wadd(x6, y2o, y5o); \
2608         dct_wadd(x7, y3o, y4o); \
2609         dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2610         dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2611         dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2612         dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2613      }
2614
2615   __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2616   __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2617   __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2618   __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2619   __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2620   __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2621   __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2622   __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2623
2624   // rounding biases in column/row passes, see stbi__idct_block for explanation.
2625   __m128i bias_0 = _mm_set1_epi32(512);
2626   __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2627
2628   // load
2629   row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2630   row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2631   row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2632   row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2633   row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2634   row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2635   row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2636   row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2637
2638   // column pass
2639   dct_pass(bias_0, 10);
2640
2641   {
2642      // 16bit 8x8 transpose pass 1
2643      dct_interleave16(row0, row4);
2644      dct_interleave16(row1, row5);
2645      dct_interleave16(row2, row6);
2646      dct_interleave16(row3, row7);
2647
2648      // transpose pass 2
2649      dct_interleave16(row0, row2);
2650      dct_interleave16(row1, row3);
2651      dct_interleave16(row4, row6);
2652      dct_interleave16(row5, row7);
2653
2654      // transpose pass 3
2655      dct_interleave16(row0, row1);
2656      dct_interleave16(row2, row3);
2657      dct_interleave16(row4, row5);
2658      dct_interleave16(row6, row7);
2659   }
2660
2661   // row pass
2662   dct_pass(bias_1, 17);
2663
2664   {
2665      // pack
2666      __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2667      __m128i p1 = _mm_packus_epi16(row2, row3);
2668      __m128i p2 = _mm_packus_epi16(row4, row5);
2669      __m128i p3 = _mm_packus_epi16(row6, row7);
2670
2671      // 8bit 8x8 transpose pass 1
2672      dct_interleave8(p0, p2); // a0e0a1e1...
2673      dct_interleave8(p1, p3); // c0g0c1g1...
2674
2675      // transpose pass 2
2676      dct_interleave8(p0, p1); // a0c0e0g0...
2677      dct_interleave8(p2, p3); // b0d0f0h0...
2678
2679      // transpose pass 3
2680      dct_interleave8(p0, p2); // a0b0c0d0...
2681      dct_interleave8(p1, p3); // a4b4c4d4...
2682
2683      // store
2684      _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2685      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2686      _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2687      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2688      _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2689      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2690      _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2691      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2692   }
2693
2694#undef dct_const
2695#undef dct_rot
2696#undef dct_widen
2697#undef dct_wadd
2698#undef dct_wsub
2699#undef dct_bfly32o
2700#undef dct_interleave8
2701#undef dct_interleave16
2702#undef dct_pass
2703}
2704
2705#endif // STBI_SSE2
2706
2707#ifdef STBI_NEON
2708
2709// NEON integer IDCT. should produce bit-identical
2710// results to the generic C version.
2711static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2712{
2713   int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2714
2715   int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2716   int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2717   int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2718   int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2719   int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2720   int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2721   int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2722   int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2723   int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2724   int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2725   int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2726   int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2727
2728#define dct_long_mul(out, inq, coeff) \
2729   int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2730   int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2731
2732#define dct_long_mac(out, acc, inq, coeff) \
2733   int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2734   int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2735
2736#define dct_widen(out, inq) \
2737   int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2738   int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2739
2740// wide add
2741#define dct_wadd(out, a, b) \
2742   int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2743   int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2744
2745// wide sub
2746#define dct_wsub(out, a, b) \
2747   int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2748   int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2749
2750// butterfly a/b, then shift using "shiftop" by "s" and pack
2751#define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2752   { \
2753      dct_wadd(sum, a, b); \
2754      dct_wsub(dif, a, b); \
2755      out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2756      out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2757   }
2758
2759#define dct_pass(shiftop, shift) \
2760   { \
2761      /* even part */ \
2762      int16x8_t sum26 = vaddq_s16(row2, row6); \
2763      dct_long_mul(p1e, sum26, rot0_0); \
2764      dct_long_mac(t2e, p1e, row6, rot0_1); \
2765      dct_long_mac(t3e, p1e, row2, rot0_2); \
2766      int16x8_t sum04 = vaddq_s16(row0, row4); \
2767      int16x8_t dif04 = vsubq_s16(row0, row4); \
2768      dct_widen(t0e, sum04); \
2769      dct_widen(t1e, dif04); \
2770      dct_wadd(x0, t0e, t3e); \
2771      dct_wsub(x3, t0e, t3e); \
2772      dct_wadd(x1, t1e, t2e); \
2773      dct_wsub(x2, t1e, t2e); \
2774      /* odd part */ \
2775      int16x8_t sum15 = vaddq_s16(row1, row5); \
2776      int16x8_t sum17 = vaddq_s16(row1, row7); \
2777      int16x8_t sum35 = vaddq_s16(row3, row5); \
2778      int16x8_t sum37 = vaddq_s16(row3, row7); \
2779      int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2780      dct_long_mul(p5o, sumodd, rot1_0); \
2781      dct_long_mac(p1o, p5o, sum17, rot1_1); \
2782      dct_long_mac(p2o, p5o, sum35, rot1_2); \
2783      dct_long_mul(p3o, sum37, rot2_0); \
2784      dct_long_mul(p4o, sum15, rot2_1); \
2785      dct_wadd(sump13o, p1o, p3o); \
2786      dct_wadd(sump24o, p2o, p4o); \
2787      dct_wadd(sump23o, p2o, p3o); \
2788      dct_wadd(sump14o, p1o, p4o); \
2789      dct_long_mac(x4, sump13o, row7, rot3_0); \
2790      dct_long_mac(x5, sump24o, row5, rot3_1); \
2791      dct_long_mac(x6, sump23o, row3, rot3_2); \
2792      dct_long_mac(x7, sump14o, row1, rot3_3); \
2793      dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2794      dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2795      dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2796      dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2797   }
2798
2799   // load
2800   row0 = vld1q_s16(data + 0*8);
2801   row1 = vld1q_s16(data + 1*8);
2802   row2 = vld1q_s16(data + 2*8);
2803   row3 = vld1q_s16(data + 3*8);
2804   row4 = vld1q_s16(data + 4*8);
2805   row5 = vld1q_s16(data + 5*8);
2806   row6 = vld1q_s16(data + 6*8);
2807   row7 = vld1q_s16(data + 7*8);
2808
2809   // add DC bias
2810   row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2811
2812   // column pass
2813   dct_pass(vrshrn_n_s32, 10);
2814
2815   // 16bit 8x8 transpose
2816   {
2817// these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2818// whether compilers actually get this is another story, sadly.
2819#define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2820#define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2821#define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2822
2823      // pass 1
2824      dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2825      dct_trn16(row2, row3);
2826      dct_trn16(row4, row5);
2827      dct_trn16(row6, row7);
2828
2829      // pass 2
2830      dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2831      dct_trn32(row1, row3);
2832      dct_trn32(row4, row6);
2833      dct_trn32(row5, row7);
2834
2835      // pass 3
2836      dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2837      dct_trn64(row1, row5);
2838      dct_trn64(row2, row6);
2839      dct_trn64(row3, row7);
2840
2841#undef dct_trn16
2842#undef dct_trn32
2843#undef dct_trn64
2844   }
2845
2846   // row pass
2847   // vrshrn_n_s32 only supports shifts up to 16, we need
2848   // 17. so do a non-rounding shift of 16 first then follow
2849   // up with a rounding shift by 1.
2850   dct_pass(vshrn_n_s32, 16);
2851
2852   {
2853      // pack and round
2854      uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2855      uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2856      uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2857      uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2858      uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2859      uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2860      uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2861      uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2862
2863      // again, these can translate into one instruction, but often don't.
2864#define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2865#define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2866#define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2867
2868      // sadly can't use interleaved stores here since we only write
2869      // 8 bytes to each scan line!
2870
2871      // 8x8 8-bit transpose pass 1
2872      dct_trn8_8(p0, p1);
2873      dct_trn8_8(p2, p3);
2874      dct_trn8_8(p4, p5);
2875      dct_trn8_8(p6, p7);
2876
2877      // pass 2
2878      dct_trn8_16(p0, p2);
2879      dct_trn8_16(p1, p3);
2880      dct_trn8_16(p4, p6);
2881      dct_trn8_16(p5, p7);
2882
2883      // pass 3
2884      dct_trn8_32(p0, p4);
2885      dct_trn8_32(p1, p5);
2886      dct_trn8_32(p2, p6);
2887      dct_trn8_32(p3, p7);
2888
2889      // store
2890      vst1_u8(out, p0); out += out_stride;
2891      vst1_u8(out, p1); out += out_stride;
2892      vst1_u8(out, p2); out += out_stride;
2893      vst1_u8(out, p3); out += out_stride;
2894      vst1_u8(out, p4); out += out_stride;
2895      vst1_u8(out, p5); out += out_stride;
2896      vst1_u8(out, p6); out += out_stride;
2897      vst1_u8(out, p7);
2898
2899#undef dct_trn8_8
2900#undef dct_trn8_16
2901#undef dct_trn8_32
2902   }
2903
2904#undef dct_long_mul
2905#undef dct_long_mac
2906#undef dct_widen
2907#undef dct_wadd
2908#undef dct_wsub
2909#undef dct_bfly32o
2910#undef dct_pass
2911}
2912
2913#endif // STBI_NEON
2914
2915#define STBI__MARKER_none  0xff
2916// if there's a pending marker from the entropy stream, return that
2917// otherwise, fetch from the stream and get a marker. if there's no
2918// marker, return 0xff, which is never a valid marker value
2919static stbi_uc stbi__get_marker(stbi__jpeg *j)
2920{
2921   stbi_uc x;
2922   if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2923   x = stbi__get8(j->s);
2924   if (x != 0xff) return STBI__MARKER_none;
2925   while (x == 0xff)
2926      x = stbi__get8(j->s); // consume repeated 0xff fill bytes
2927   return x;
2928}
2929
2930// in each scan, we'll have scan_n components, and the order
2931// of the components is specified by order[]
2932#define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2933
2934// after a restart interval, stbi__jpeg_reset the entropy decoder and
2935// the dc prediction
2936static void stbi__jpeg_reset(stbi__jpeg *j)
2937{
2938   j->code_bits = 0;
2939   j->code_buffer = 0;
2940   j->nomore = 0;
2941   j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
2942   j->marker = STBI__MARKER_none;
2943   j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2944   j->eob_run = 0;
2945   // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2946   // since we don't even allow 1<<30 pixels
2947}
2948
2949static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2950{
2951   stbi__jpeg_reset(z);
2952   if (!z->progressive) {
2953      if (z->scan_n == 1) {
2954         int i,j;
2955         STBI_SIMD_ALIGN(short, data[64]);
2956         int n = z->order[0];
2957         // non-interleaved data, we just need to process one block at a time,
2958         // in trivial scanline order
2959         // number of blocks to do just depends on how many actual "pixels" this
2960         // component has, independent of interleaved MCU blocking and such
2961         int w = (z->img_comp[n].x+7) >> 3;
2962         int h = (z->img_comp[n].y+7) >> 3;
2963         for (j=0; j < h; ++j) {
2964            for (i=0; i < w; ++i) {
2965               int ha = z->img_comp[n].ha;
2966               if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2967               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2968               // every data block is an MCU, so countdown the restart interval
2969               if (--z->todo <= 0) {
2970                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2971                  // if it's NOT a restart, then just bail, so we get corrupt data
2972                  // rather than no data
2973                  if (!STBI__RESTART(z->marker)) return 1;
2974                  stbi__jpeg_reset(z);
2975               }
2976            }
2977         }
2978         return 1;
2979      } else { // interleaved
2980         int i,j,k,x,y;
2981         STBI_SIMD_ALIGN(short, data[64]);
2982         for (j=0; j < z->img_mcu_y; ++j) {
2983            for (i=0; i < z->img_mcu_x; ++i) {
2984               // scan an interleaved mcu... process scan_n components in order
2985               for (k=0; k < z->scan_n; ++k) {
2986                  int n = z->order[k];
2987                  // scan out an mcu's worth of this component; that's just determined
2988                  // by the basic H and V specified for the component
2989                  for (y=0; y < z->img_comp[n].v; ++y) {
2990                     for (x=0; x < z->img_comp[n].h; ++x) {
2991                        int x2 = (i*z->img_comp[n].h + x)*8;
2992                        int y2 = (j*z->img_comp[n].v + y)*8;
2993                        int ha = z->img_comp[n].ha;
2994                        if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2995                        z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2996                     }
2997                  }
2998               }
2999               // after all interleaved components, that's an interleaved MCU,
3000               // so now count down the restart interval
3001               if (--z->todo <= 0) {
3002                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3003                  if (!STBI__RESTART(z->marker)) return 1;
3004                  stbi__jpeg_reset(z);
3005               }
3006            }
3007         }
3008         return 1;
3009      }
3010   } else {
3011      if (z->scan_n == 1) {
3012         int i,j;
3013         int n = z->order[0];
3014         // non-interleaved data, we just need to process one block at a time,
3015         // in trivial scanline order
3016         // number of blocks to do just depends on how many actual "pixels" this
3017         // component has, independent of interleaved MCU blocking and such
3018         int w = (z->img_comp[n].x+7) >> 3;
3019         int h = (z->img_comp[n].y+7) >> 3;
3020         for (j=0; j < h; ++j) {
3021            for (i=0; i < w; ++i) {
3022               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3023               if (z->spec_start == 0) {
3024                  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
3025                     return 0;
3026               } else {
3027                  int ha = z->img_comp[n].ha;
3028                  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
3029                     return 0;
3030               }
3031               // every data block is an MCU, so countdown the restart interval
3032               if (--z->todo <= 0) {
3033                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3034                  if (!STBI__RESTART(z->marker)) return 1;
3035                  stbi__jpeg_reset(z);
3036               }
3037            }
3038         }
3039         return 1;
3040      } else { // interleaved
3041         int i,j,k,x,y;
3042         for (j=0; j < z->img_mcu_y; ++j) {
3043            for (i=0; i < z->img_mcu_x; ++i) {
3044               // scan an interleaved mcu... process scan_n components in order
3045               for (k=0; k < z->scan_n; ++k) {
3046                  int n = z->order[k];
3047                  // scan out an mcu's worth of this component; that's just determined
3048                  // by the basic H and V specified for the component
3049                  for (y=0; y < z->img_comp[n].v; ++y) {
3050                     for (x=0; x < z->img_comp[n].h; ++x) {
3051                        int x2 = (i*z->img_comp[n].h + x);
3052                        int y2 = (j*z->img_comp[n].v + y);
3053                        short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
3054                        if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
3055                           return 0;
3056                     }
3057                  }
3058               }
3059               // after all interleaved components, that's an interleaved MCU,
3060               // so now count down the restart interval
3061               if (--z->todo <= 0) {
3062                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3063                  if (!STBI__RESTART(z->marker)) return 1;
3064                  stbi__jpeg_reset(z);
3065               }
3066            }
3067         }
3068         return 1;
3069      }
3070   }
3071}
3072
3073static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
3074{
3075   int i;
3076   for (i=0; i < 64; ++i)
3077      data[i] *= dequant[i];
3078}
3079
3080static void stbi__jpeg_finish(stbi__jpeg *z)
3081{
3082   if (z->progressive) {
3083      // dequantize and idct the data
3084      int i,j,n;
3085      for (n=0; n < z->s->img_n; ++n) {
3086         int w = (z->img_comp[n].x+7) >> 3;
3087         int h = (z->img_comp[n].y+7) >> 3;
3088         for (j=0; j < h; ++j) {
3089            for (i=0; i < w; ++i) {
3090               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3091               stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
3092               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
3093            }
3094         }
3095      }
3096   }
3097}
3098
3099static int stbi__process_marker(stbi__jpeg *z, int m)
3100{
3101   int L;
3102   switch (m) {
3103      case STBI__MARKER_none: // no marker found
3104         return stbi__err("expected marker","Corrupt JPEG");
3105
3106      case 0xDD: // DRI - specify restart interval
3107         if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
3108         z->restart_interval = stbi__get16be(z->s);
3109         return 1;
3110
3111      case 0xDB: // DQT - define quantization table
3112         L = stbi__get16be(z->s)-2;
3113         while (L > 0) {
3114            int q = stbi__get8(z->s);
3115            int p = q >> 4, sixteen = (p != 0);
3116            int t = q & 15,i;
3117            if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
3118            if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
3119
3120            for (i=0; i < 64; ++i)
3121               z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
3122            L -= (sixteen ? 129 : 65);
3123         }
3124         return L==0;
3125
3126      case 0xC4: // DHT - define huffman table
3127         L = stbi__get16be(z->s)-2;
3128         while (L > 0) {
3129            stbi_uc *v;
3130            int sizes[16],i,n=0;
3131            int q = stbi__get8(z->s);
3132            int tc = q >> 4;
3133            int th = q & 15;
3134            if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
3135            for (i=0; i < 16; ++i) {
3136               sizes[i] = stbi__get8(z->s);
3137               n += sizes[i];
3138            }
3139            if(n > 256) return stbi__err("bad DHT header","Corrupt JPEG"); // Loop over i < n would write past end of values!
3140            L -= 17;
3141            if (tc == 0) {
3142               if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
3143               v = z->huff_dc[th].values;
3144            } else {
3145               if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
3146               v = z->huff_ac[th].values;
3147            }
3148            for (i=0; i < n; ++i)
3149               v[i] = stbi__get8(z->s);
3150            if (tc != 0)
3151               stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
3152            L -= n;
3153         }
3154         return L==0;
3155   }
3156
3157   // check for comment block or APP blocks
3158   if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
3159      L = stbi__get16be(z->s);
3160      if (L < 2) {
3161         if (m == 0xFE)
3162            return stbi__err("bad COM len","Corrupt JPEG");
3163         else
3164            return stbi__err("bad APP len","Corrupt JPEG");
3165      }
3166      L -= 2;
3167
3168      if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
3169         static const unsigned char tag[5] = {'J','F','I','F','\0'};
3170         int ok = 1;
3171         int i;
3172         for (i=0; i < 5; ++i)
3173            if (stbi__get8(z->s) != tag[i])
3174               ok = 0;
3175         L -= 5;
3176         if (ok)
3177            z->jfif = 1;
3178      } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
3179         static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
3180         int ok = 1;
3181         int i;
3182         for (i=0; i < 6; ++i)
3183            if (stbi__get8(z->s) != tag[i])
3184               ok = 0;
3185         L -= 6;
3186         if (ok) {
3187            stbi__get8(z->s); // version
3188            stbi__get16be(z->s); // flags0
3189            stbi__get16be(z->s); // flags1
3190            z->app14_color_transform = stbi__get8(z->s); // color transform
3191            L -= 6;
3192         }
3193      }
3194
3195      stbi__skip(z->s, L);
3196      return 1;
3197   }
3198
3199   return stbi__err("unknown marker","Corrupt JPEG");
3200}
3201
3202// after we see SOS
3203static int stbi__process_scan_header(stbi__jpeg *z)
3204{
3205   int i;
3206   int Ls = stbi__get16be(z->s);
3207   z->scan_n = stbi__get8(z->s);
3208   if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
3209   if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
3210   for (i=0; i < z->scan_n; ++i) {
3211      int id = stbi__get8(z->s), which;
3212      int q = stbi__get8(z->s);
3213      for (which = 0; which < z->s->img_n; ++which)
3214         if (z->img_comp[which].id == id)
3215            break;
3216      if (which == z->s->img_n) return 0; // no match
3217      z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
3218      z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
3219      z->order[i] = which;
3220   }
3221
3222   {
3223      int aa;
3224      z->spec_start = stbi__get8(z->s);
3225      z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
3226      aa = stbi__get8(z->s);
3227      z->succ_high = (aa >> 4);
3228      z->succ_low  = (aa & 15);
3229      if (z->progressive) {
3230         if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
3231            return stbi__err("bad SOS", "Corrupt JPEG");
3232      } else {
3233         if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
3234         if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
3235         z->spec_end = 63;
3236      }
3237   }
3238
3239   return 1;
3240}
3241
3242static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
3243{
3244   int i;
3245   for (i=0; i < ncomp; ++i) {
3246      if (z->img_comp[i].raw_data) {
3247         STBI_FREE(z->img_comp[i].raw_data);
3248         z->img_comp[i].raw_data = NULL;
3249         z->img_comp[i].data = NULL;
3250      }
3251      if (z->img_comp[i].raw_coeff) {
3252         STBI_FREE(z->img_comp[i].raw_coeff);
3253         z->img_comp[i].raw_coeff = 0;
3254         z->img_comp[i].coeff = 0;
3255      }
3256      if (z->img_comp[i].linebuf) {
3257         STBI_FREE(z->img_comp[i].linebuf);
3258         z->img_comp[i].linebuf = NULL;
3259      }
3260   }
3261   return why;
3262}
3263
3264static int stbi__process_frame_header(stbi__jpeg *z, int scan)
3265{
3266   stbi__context *s = z->s;
3267   int Lf,p,i,q, h_max=1,v_max=1,c;
3268   Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
3269   p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
3270   s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
3271   s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
3272   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3273   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3274   c = stbi__get8(s);
3275   if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
3276   s->img_n = c;
3277   for (i=0; i < c; ++i) {
3278      z->img_comp[i].data = NULL;
3279      z->img_comp[i].linebuf = NULL;
3280   }
3281
3282   if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
3283
3284   z->rgb = 0;
3285   for (i=0; i < s->img_n; ++i) {
3286      static const unsigned char rgb[3] = { 'R', 'G', 'B' };
3287      z->img_comp[i].id = stbi__get8(s);
3288      if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
3289         ++z->rgb;
3290      q = stbi__get8(s);
3291      z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
3292      z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
3293      z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
3294   }
3295
3296   if (scan != STBI__SCAN_load) return 1;
3297
3298   if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
3299
3300   for (i=0; i < s->img_n; ++i) {
3301      if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
3302      if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
3303   }
3304
3305   // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
3306   // and I've never seen a non-corrupted JPEG file actually use them
3307   for (i=0; i < s->img_n; ++i) {
3308      if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
3309      if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
3310   }
3311
3312   // compute interleaved mcu info
3313   z->img_h_max = h_max;
3314   z->img_v_max = v_max;
3315   z->img_mcu_w = h_max * 8;
3316   z->img_mcu_h = v_max * 8;
3317   // these sizes can't be more than 17 bits
3318   z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
3319   z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
3320
3321   for (i=0; i < s->img_n; ++i) {
3322      // number of effective pixels (e.g. for non-interleaved MCU)
3323      z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
3324      z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
3325      // to simplify generation, we'll allocate enough memory to decode
3326      // the bogus oversized data from using interleaved MCUs and their
3327      // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3328      // discard the extra data until colorspace conversion
3329      //
3330      // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
3331      // so these muls can't overflow with 32-bit ints (which we require)
3332      z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3333      z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3334      z->img_comp[i].coeff = 0;
3335      z->img_comp[i].raw_coeff = 0;
3336      z->img_comp[i].linebuf = NULL;
3337      z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
3338      if (z->img_comp[i].raw_data == NULL)
3339         return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3340      // align blocks for idct using mmx/sse
3341      z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
3342      if (z->progressive) {
3343         // w2, h2 are multiples of 8 (see above)
3344         z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
3345         z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
3346         z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
3347         if (z->img_comp[i].raw_coeff == NULL)
3348            return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3349         z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
3350      }
3351   }
3352
3353   return 1;
3354}
3355
3356// use comparisons since in some cases we handle more than one case (e.g. SOF)
3357#define stbi__DNL(x)         ((x) == 0xdc)
3358#define stbi__SOI(x)         ((x) == 0xd8)
3359#define stbi__EOI(x)         ((x) == 0xd9)
3360#define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3361#define stbi__SOS(x)         ((x) == 0xda)
3362
3363#define stbi__SOF_progressive(x)   ((x) == 0xc2)
3364
3365static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
3366{
3367   int m;
3368   z->jfif = 0;
3369   z->app14_color_transform = -1; // valid values are 0,1,2
3370   z->marker = STBI__MARKER_none; // initialize cached marker to empty
3371   m = stbi__get_marker(z);
3372   if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
3373   if (scan == STBI__SCAN_type) return 1;
3374   m = stbi__get_marker(z);
3375   while (!stbi__SOF(m)) {
3376      if (!stbi__process_marker(z,m)) return 0;
3377      m = stbi__get_marker(z);
3378      while (m == STBI__MARKER_none) {
3379         // some files have extra padding after their blocks, so ok, we'll scan
3380         if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
3381         m = stbi__get_marker(z);
3382      }
3383   }
3384   z->progressive = stbi__SOF_progressive(m);
3385   if (!stbi__process_frame_header(z, scan)) return 0;
3386   return 1;
3387}
3388
3389static stbi_uc stbi__skip_jpeg_junk_at_end(stbi__jpeg *j)
3390{
3391   // some JPEGs have junk at end, skip over it but if we find what looks
3392   // like a valid marker, resume there
3393   while (!stbi__at_eof(j->s)) {
3394      stbi_uc x = stbi__get8(j->s);
3395      while (x == 0xff) { // might be a marker
3396         if (stbi__at_eof(j->s)) return STBI__MARKER_none;
3397         x = stbi__get8(j->s);
3398         if (x != 0x00 && x != 0xff) {
3399            // not a stuffed zero or lead-in to another marker, looks
3400            // like an actual marker, return it
3401            return x;
3402         }
3403         // stuffed zero has x=0 now which ends the loop, meaning we go
3404         // back to regular scan loop.
3405         // repeated 0xff keeps trying to read the next byte of the marker.
3406      }
3407   }
3408   return STBI__MARKER_none;
3409}
3410
3411// decode image to YCbCr format
3412static int stbi__decode_jpeg_image(stbi__jpeg *j)
3413{
3414   int m;
3415   for (m = 0; m < 4; m++) {
3416      j->img_comp[m].raw_data = NULL;
3417      j->img_comp[m].raw_coeff = NULL;
3418   }
3419   j->restart_interval = 0;
3420   if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
3421   m = stbi__get_marker(j);
3422   while (!stbi__EOI(m)) {
3423      if (stbi__SOS(m)) {
3424         if (!stbi__process_scan_header(j)) return 0;
3425         if (!stbi__parse_entropy_coded_data(j)) return 0;
3426         if (j->marker == STBI__MARKER_none ) {
3427         j->marker = stbi__skip_jpeg_junk_at_end(j);
3428            // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
3429         }
3430         m = stbi__get_marker(j);
3431         if (STBI__RESTART(m))
3432            m = stbi__get_marker(j);
3433      } else if (stbi__DNL(m)) {
3434         int Ld = stbi__get16be(j->s);
3435         stbi__uint32 NL = stbi__get16be(j->s);
3436         if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
3437         if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
3438         m = stbi__get_marker(j);
3439      } else {
3440         if (!stbi__process_marker(j, m)) return 1;
3441         m = stbi__get_marker(j);
3442      }
3443   }
3444   if (j->progressive)
3445      stbi__jpeg_finish(j);
3446   return 1;
3447}
3448
3449// static jfif-centered resampling (across block boundaries)
3450
3451typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
3452                                    int w, int hs);
3453
3454#define stbi__div4(x) ((stbi_uc) ((x) >> 2))
3455
3456static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3457{
3458   STBI_NOTUSED(out);
3459   STBI_NOTUSED(in_far);
3460   STBI_NOTUSED(w);
3461   STBI_NOTUSED(hs);
3462   return in_near;
3463}
3464
3465static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3466{
3467   // need to generate two samples vertically for every one in input
3468   int i;
3469   STBI_NOTUSED(hs);
3470   for (i=0; i < w; ++i)
3471      out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
3472   return out;
3473}
3474
3475static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3476{
3477   // need to generate two samples horizontally for every one in input
3478   int i;
3479   stbi_uc *input = in_near;
3480
3481   if (w == 1) {
3482      // if only one sample, can't do any interpolation
3483      out[0] = out[1] = input[0];
3484      return out;
3485   }
3486
3487   out[0] = input[0];
3488   out[1] = stbi__div4(input[0]*3 + input[1] + 2);
3489   for (i=1; i < w-1; ++i) {
3490      int n = 3*input[i]+2;
3491      out[i*2+0] = stbi__div4(n+input[i-1]);
3492      out[i*2+1] = stbi__div4(n+input[i+1]);
3493   }
3494   out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
3495   out[i*2+1] = input[w-1];
3496
3497   STBI_NOTUSED(in_far);
3498   STBI_NOTUSED(hs);
3499
3500   return out;
3501}
3502
3503#define stbi__div16(x) ((stbi_uc) ((x) >> 4))
3504
3505static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3506{
3507   // need to generate 2x2 samples for every one in input
3508   int i,t0,t1;
3509   if (w == 1) {
3510      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3511      return out;
3512   }
3513
3514   t1 = 3*in_near[0] + in_far[0];
3515   out[0] = stbi__div4(t1+2);
3516   for (i=1; i < w; ++i) {
3517      t0 = t1;
3518      t1 = 3*in_near[i]+in_far[i];
3519      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3520      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3521   }
3522   out[w*2-1] = stbi__div4(t1+2);
3523
3524   STBI_NOTUSED(hs);
3525
3526   return out;
3527}
3528
3529#if defined(STBI_SSE2) || defined(STBI_NEON)
3530static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3531{
3532   // need to generate 2x2 samples for every one in input
3533   int i=0,t0,t1;
3534
3535   if (w == 1) {
3536      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3537      return out;
3538   }
3539
3540   t1 = 3*in_near[0] + in_far[0];
3541   // process groups of 8 pixels for as long as we can.
3542   // note we can't handle the last pixel in a row in this loop
3543   // because we need to handle the filter boundary conditions.
3544   for (; i < ((w-1) & ~7); i += 8) {
3545#if defined(STBI_SSE2)
3546      // load and perform the vertical filtering pass
3547      // this uses 3*x + y = 4*x + (y - x)
3548      __m128i zero  = _mm_setzero_si128();
3549      __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
3550      __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
3551      __m128i farw  = _mm_unpacklo_epi8(farb, zero);
3552      __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3553      __m128i diff  = _mm_sub_epi16(farw, nearw);
3554      __m128i nears = _mm_slli_epi16(nearw, 2);
3555      __m128i curr  = _mm_add_epi16(nears, diff); // current row
3556
3557      // horizontal filter works the same based on shifted vers of current
3558      // row. "prev" is current row shifted right by 1 pixel; we need to
3559      // insert the previous pixel value (from t1).
3560      // "next" is current row shifted left by 1 pixel, with first pixel
3561      // of next block of 8 pixels added in.
3562      __m128i prv0 = _mm_slli_si128(curr, 2);
3563      __m128i nxt0 = _mm_srli_si128(curr, 2);
3564      __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3565      __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
3566
3567      // horizontal filter, polyphase implementation since it's convenient:
3568      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3569      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3570      // note the shared term.
3571      __m128i bias  = _mm_set1_epi16(8);
3572      __m128i curs = _mm_slli_epi16(curr, 2);
3573      __m128i prvd = _mm_sub_epi16(prev, curr);
3574      __m128i nxtd = _mm_sub_epi16(next, curr);
3575      __m128i curb = _mm_add_epi16(curs, bias);
3576      __m128i even = _mm_add_epi16(prvd, curb);
3577      __m128i odd  = _mm_add_epi16(nxtd, curb);
3578
3579      // interleave even and odd pixels, then undo scaling.
3580      __m128i int0 = _mm_unpacklo_epi16(even, odd);
3581      __m128i int1 = _mm_unpackhi_epi16(even, odd);
3582      __m128i de0  = _mm_srli_epi16(int0, 4);
3583      __m128i de1  = _mm_srli_epi16(int1, 4);
3584
3585      // pack and write output
3586      __m128i outv = _mm_packus_epi16(de0, de1);
3587      _mm_storeu_si128((__m128i *) (out + i*2), outv);
3588#elif defined(STBI_NEON)
3589      // load and perform the vertical filtering pass
3590      // this uses 3*x + y = 4*x + (y - x)
3591      uint8x8_t farb  = vld1_u8(in_far + i);
3592      uint8x8_t nearb = vld1_u8(in_near + i);
3593      int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3594      int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3595      int16x8_t curr  = vaddq_s16(nears, diff); // current row
3596
3597      // horizontal filter works the same based on shifted vers of current
3598      // row. "prev" is current row shifted right by 1 pixel; we need to
3599      // insert the previous pixel value (from t1).
3600      // "next" is current row shifted left by 1 pixel, with first pixel
3601      // of next block of 8 pixels added in.
3602      int16x8_t prv0 = vextq_s16(curr, curr, 7);
3603      int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3604      int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3605      int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3606
3607      // horizontal filter, polyphase implementation since it's convenient:
3608      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3609      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3610      // note the shared term.
3611      int16x8_t curs = vshlq_n_s16(curr, 2);
3612      int16x8_t prvd = vsubq_s16(prev, curr);
3613      int16x8_t nxtd = vsubq_s16(next, curr);
3614      int16x8_t even = vaddq_s16(curs, prvd);
3615      int16x8_t odd  = vaddq_s16(curs, nxtd);
3616
3617      // undo scaling and round, then store with even/odd phases interleaved
3618      uint8x8x2_t o;
3619      o.val[0] = vqrshrun_n_s16(even, 4);
3620      o.val[1] = vqrshrun_n_s16(odd,  4);
3621      vst2_u8(out + i*2, o);
3622#endif
3623
3624      // "previous" value for next iter
3625      t1 = 3*in_near[i+7] + in_far[i+7];
3626   }
3627
3628   t0 = t1;
3629   t1 = 3*in_near[i] + in_far[i];
3630   out[i*2] = stbi__div16(3*t1 + t0 + 8);
3631
3632   for (++i; i < w; ++i) {
3633      t0 = t1;
3634      t1 = 3*in_near[i]+in_far[i];
3635      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3636      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3637   }
3638   out[w*2-1] = stbi__div4(t1+2);
3639
3640   STBI_NOTUSED(hs);
3641
3642   return out;
3643}
3644#endif
3645
3646static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3647{
3648   // resample with nearest-neighbor
3649   int i,j;
3650   STBI_NOTUSED(in_far);
3651   for (i=0; i < w; ++i)
3652      for (j=0; j < hs; ++j)
3653         out[i*hs+j] = in_near[i];
3654   return out;
3655}
3656
3657// this is a reduced-precision calculation of YCbCr-to-RGB introduced
3658// to make sure the code produces the same results in both SIMD and scalar
3659#define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
3660static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3661{
3662   int i;
3663   for (i=0; i < count; ++i) {
3664      int y_fixed = (y[i] << 20) + (1<<19); // rounding
3665      int r,g,b;
3666      int cr = pcr[i] - 128;
3667      int cb = pcb[i] - 128;
3668      r = y_fixed +  cr* stbi__float2fixed(1.40200f);
3669      g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3670      b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
3671      r >>= 20;
3672      g >>= 20;
3673      b >>= 20;
3674      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3675      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3676      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3677      out[0] = (stbi_uc)r;
3678      out[1] = (stbi_uc)g;
3679      out[2] = (stbi_uc)b;
3680      out[3] = 255;
3681      out += step;
3682   }
3683}
3684
3685#if defined(STBI_SSE2) || defined(STBI_NEON)
3686static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3687{
3688   int i = 0;
3689
3690#ifdef STBI_SSE2
3691   // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3692   // it's useful in practice (you wouldn't use it for textures, for example).
3693   // so just accelerate step == 4 case.
3694   if (step == 4) {
3695      // this is a fairly straightforward implementation and not super-optimized.
3696      __m128i signflip  = _mm_set1_epi8(-0x80);
3697      __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3698      __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3699      __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3700      __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3701      __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3702      __m128i xw = _mm_set1_epi16(255); // alpha channel
3703
3704      for (; i+7 < count; i += 8) {
3705         // load
3706         __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3707         __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3708         __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3709         __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3710         __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3711
3712         // unpack to short (and left-shift cr, cb by 8)
3713         __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3714         __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3715         __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3716
3717         // color transform
3718         __m128i yws = _mm_srli_epi16(yw, 4);
3719         __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3720         __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3721         __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3722         __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3723         __m128i rws = _mm_add_epi16(cr0, yws);
3724         __m128i gwt = _mm_add_epi16(cb0, yws);
3725         __m128i bws = _mm_add_epi16(yws, cb1);
3726         __m128i gws = _mm_add_epi16(gwt, cr1);
3727
3728         // descale
3729         __m128i rw = _mm_srai_epi16(rws, 4);
3730         __m128i bw = _mm_srai_epi16(bws, 4);
3731         __m128i gw = _mm_srai_epi16(gws, 4);
3732
3733         // back to byte, set up for transpose
3734         __m128i brb = _mm_packus_epi16(rw, bw);
3735         __m128i gxb = _mm_packus_epi16(gw, xw);
3736
3737         // transpose to interleave channels
3738         __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3739         __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3740         __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3741         __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3742
3743         // store
3744         _mm_storeu_si128((__m128i *) (out + 0), o0);
3745         _mm_storeu_si128((__m128i *) (out + 16), o1);
3746         out += 32;
3747      }
3748   }
3749#endif
3750
3751#ifdef STBI_NEON
3752   // in this version, step=3 support would be easy to add. but is there demand?
3753   if (step == 4) {
3754      // this is a fairly straightforward implementation and not super-optimized.
3755      uint8x8_t signflip = vdup_n_u8(0x80);
3756      int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3757      int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3758      int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3759      int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3760
3761      for (; i+7 < count; i += 8) {
3762         // load
3763         uint8x8_t y_bytes  = vld1_u8(y + i);
3764         uint8x8_t cr_bytes = vld1_u8(pcr + i);
3765         uint8x8_t cb_bytes = vld1_u8(pcb + i);
3766         int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3767         int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3768
3769         // expand to s16
3770         int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3771         int16x8_t crw = vshll_n_s8(cr_biased, 7);
3772         int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3773
3774         // color transform
3775         int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3776         int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3777         int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3778         int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3779         int16x8_t rws = vaddq_s16(yws, cr0);
3780         int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3781         int16x8_t bws = vaddq_s16(yws, cb1);
3782
3783         // undo scaling, round, convert to byte
3784         uint8x8x4_t o;
3785         o.val[0] = vqrshrun_n_s16(rws, 4);
3786         o.val[1] = vqrshrun_n_s16(gws, 4);
3787         o.val[2] = vqrshrun_n_s16(bws, 4);
3788         o.val[3] = vdup_n_u8(255);
3789
3790         // store, interleaving r/g/b/a
3791         vst4_u8(out, o);
3792         out += 8*4;
3793      }
3794   }
3795#endif
3796
3797   for (; i < count; ++i) {
3798      int y_fixed = (y[i] << 20) + (1<<19); // rounding
3799      int r,g,b;
3800      int cr = pcr[i] - 128;
3801      int cb = pcb[i] - 128;
3802      r = y_fixed + cr* stbi__float2fixed(1.40200f);
3803      g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3804      b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
3805      r >>= 20;
3806      g >>= 20;
3807      b >>= 20;
3808      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3809      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3810      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3811      out[0] = (stbi_uc)r;
3812      out[1] = (stbi_uc)g;
3813      out[2] = (stbi_uc)b;
3814      out[3] = 255;
3815      out += step;
3816   }
3817}
3818#endif
3819
3820// set up the kernels
3821static void stbi__setup_jpeg(stbi__jpeg *j)
3822{
3823   j->idct_block_kernel = stbi__idct_block;
3824   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3825   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3826
3827#ifdef STBI_SSE2
3828   if (stbi__sse2_available()) {
3829      j->idct_block_kernel = stbi__idct_simd;
3830      j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3831      j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3832   }
3833#endif
3834
3835#ifdef STBI_NEON
3836   j->idct_block_kernel = stbi__idct_simd;
3837   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3838   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3839#endif
3840}
3841
3842// clean up the temporary component buffers
3843static void stbi__cleanup_jpeg(stbi__jpeg *j)
3844{
3845   stbi__free_jpeg_components(j, j->s->img_n, 0);
3846}
3847
3848typedef struct
3849{
3850   resample_row_func resample;
3851   stbi_uc *line0,*line1;
3852   int hs,vs;   // expansion factor in each axis
3853   int w_lores; // horizontal pixels pre-expansion
3854   int ystep;   // how far through vertical expansion we are
3855   int ypos;    // which pre-expansion row we're on
3856} stbi__resample;
3857
3858// fast 0..255 * 0..255 => 0..255 rounded multiplication
3859static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
3860{
3861   unsigned int t = x*y + 128;
3862   return (stbi_uc) ((t + (t >>8)) >> 8);
3863}
3864
3865static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3866{
3867   int n, decode_n, is_rgb;
3868   z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3869
3870   // validate req_comp
3871   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3872
3873   // load a jpeg image from whichever source, but leave in YCbCr format
3874   if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3875
3876   // determine actual number of components to generate
3877   n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
3878
3879   is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
3880
3881   if (z->s->img_n == 3 && n < 3 && !is_rgb)
3882      decode_n = 1;
3883   else
3884      decode_n = z->s->img_n;
3885
3886   // nothing to do if no components requested; check this now to avoid
3887   // accessing uninitialized coutput[0] later
3888   if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
3889
3890   // resample and color-convert
3891   {
3892      int k;
3893      unsigned int i,j;
3894      stbi_uc *output;
3895      stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
3896
3897      stbi__resample res_comp[4];
3898
3899      for (k=0; k < decode_n; ++k) {
3900         stbi__resample *r = &res_comp[k];
3901
3902         // allocate line buffer big enough for upsampling off the edges
3903         // with upsample factor of 4
3904         z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3905         if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3906
3907         r->hs      = z->img_h_max / z->img_comp[k].h;
3908         r->vs      = z->img_v_max / z->img_comp[k].v;
3909         r->ystep   = r->vs >> 1;
3910         r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3911         r->ypos    = 0;
3912         r->line0   = r->line1 = z->img_comp[k].data;
3913
3914         if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3915         else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3916         else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3917         else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3918         else                               r->resample = stbi__resample_row_generic;
3919      }
3920
3921      // can't error after this so, this is safe
3922      output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
3923      if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3924
3925      // now go ahead and resample
3926      for (j=0; j < z->s->img_y; ++j) {
3927         stbi_uc *out = output + n * z->s->img_x * j;
3928         for (k=0; k < decode_n; ++k) {
3929            stbi__resample *r = &res_comp[k];
3930            int y_bot = r->ystep >= (r->vs >> 1);
3931            coutput[k] = r->resample(z->img_comp[k].linebuf,
3932                                     y_bot ? r->line1 : r->line0,
3933                                     y_bot ? r->line0 : r->line1,
3934                                     r->w_lores, r->hs);
3935            if (++r->ystep >= r->vs) {
3936               r->ystep = 0;
3937               r->line0 = r->line1;
3938               if (++r->ypos < z->img_comp[k].y)
3939                  r->line1 += z->img_comp[k].w2;
3940            }
3941         }
3942         if (n >= 3) {
3943            stbi_uc *y = coutput[0];
3944            if (z->s->img_n == 3) {
3945               if (is_rgb) {
3946                  for (i=0; i < z->s->img_x; ++i) {
3947                     out[0] = y[i];
3948                     out[1] = coutput[1][i];
3949                     out[2] = coutput[2][i];
3950                     out[3] = 255;
3951                     out += n;
3952                  }
3953               } else {
3954                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3955               }
3956            } else if (z->s->img_n == 4) {
3957               if (z->app14_color_transform == 0) { // CMYK
3958                  for (i=0; i < z->s->img_x; ++i) {
3959                     stbi_uc m = coutput[3][i];
3960                     out[0] = stbi__blinn_8x8(coutput[0][i], m);
3961                     out[1] = stbi__blinn_8x8(coutput[1][i], m);
3962                     out[2] = stbi__blinn_8x8(coutput[2][i], m);
3963                     out[3] = 255;
3964                     out += n;
3965                  }
3966               } else if (z->app14_color_transform == 2) { // YCCK
3967                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3968                  for (i=0; i < z->s->img_x; ++i) {
3969                     stbi_uc m = coutput[3][i];
3970                     out[0] = stbi__blinn_8x8(255 - out[0], m);
3971                     out[1] = stbi__blinn_8x8(255 - out[1], m);
3972                     out[2] = stbi__blinn_8x8(255 - out[2], m);
3973                     out += n;
3974                  }
3975               } else { // YCbCr + alpha?  Ignore the fourth channel for now
3976                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3977               }
3978            } else
3979               for (i=0; i < z->s->img_x; ++i) {
3980                  out[0] = out[1] = out[2] = y[i];
3981                  out[3] = 255; // not used if n==3
3982                  out += n;
3983               }
3984         } else {
3985            if (is_rgb) {
3986               if (n == 1)
3987                  for (i=0; i < z->s->img_x; ++i)
3988                     *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3989               else {
3990                  for (i=0; i < z->s->img_x; ++i, out += 2) {
3991                     out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3992                     out[1] = 255;
3993                  }
3994               }
3995            } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
3996               for (i=0; i < z->s->img_x; ++i) {
3997                  stbi_uc m = coutput[3][i];
3998                  stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
3999                  stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
4000                  stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
4001                  out[0] = stbi__compute_y(r, g, b);
4002                  out[1] = 255;
4003                  out += n;
4004               }
4005            } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
4006               for (i=0; i < z->s->img_x; ++i) {
4007                  out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
4008                  out[1] = 255;
4009                  out += n;
4010               }
4011            } else {
4012               stbi_uc *y = coutput[0];
4013               if (n == 1)
4014                  for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
4015               else
4016                  for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
4017            }
4018         }
4019      }
4020      stbi__cleanup_jpeg(z);
4021      *out_x = z->s->img_x;
4022      *out_y = z->s->img_y;
4023      if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
4024      return output;
4025   }
4026}
4027
4028static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
4029{
4030   unsigned char* result;
4031   stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
4032   if (!j) return stbi__errpuc("outofmem", "Out of memory");
4033   memset(j, 0, sizeof(stbi__jpeg));
4034   STBI_NOTUSED(ri);
4035   j->s = s;
4036   stbi__setup_jpeg(j);
4037   result = load_jpeg_image(j, x,y,comp,req_comp);
4038   STBI_FREE(j);
4039   return result;
4040}
4041
4042static int stbi__jpeg_test(stbi__context *s)
4043{
4044   int r;
4045   stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
4046   if (!j) return stbi__err("outofmem", "Out of memory");
4047   memset(j, 0, sizeof(stbi__jpeg));
4048   j->s = s;
4049   stbi__setup_jpeg(j);
4050   r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
4051   stbi__rewind(s);
4052   STBI_FREE(j);
4053   return r;
4054}
4055
4056static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
4057{
4058   if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
4059      stbi__rewind( j->s );
4060      return 0;
4061   }
4062   if (x) *x = j->s->img_x;
4063   if (y) *y = j->s->img_y;
4064   if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
4065   return 1;
4066}
4067
4068static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
4069{
4070   int result;
4071   stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
4072   if (!j) return stbi__err("outofmem", "Out of memory");
4073   memset(j, 0, sizeof(stbi__jpeg));
4074   j->s = s;
4075   result = stbi__jpeg_info_raw(j, x, y, comp);
4076   STBI_FREE(j);
4077   return result;
4078}
4079#endif
4080
4081// public domain zlib decode    v0.2  Sean Barrett 2006-11-18
4082//    simple implementation
4083//      - all input must be provided in an upfront buffer
4084//      - all output is written to a single output buffer (can malloc/realloc)
4085//    performance
4086//      - fast huffman
4087
4088#ifndef STBI_NO_ZLIB
4089
4090// fast-way is faster to check than jpeg huffman, but slow way is slower
4091#define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
4092#define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
4093#define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
4094
4095// zlib-style huffman encoding
4096// (jpegs packs from left, zlib from right, so can't share code)
4097typedef struct
4098{
4099   stbi__uint16 fast[1 << STBI__ZFAST_BITS];
4100   stbi__uint16 firstcode[16];
4101   int maxcode[17];
4102   stbi__uint16 firstsymbol[16];
4103   stbi_uc  size[STBI__ZNSYMS];
4104   stbi__uint16 value[STBI__ZNSYMS];
4105} stbi__zhuffman;
4106
4107stbi_inline static int stbi__bitreverse16(int n)
4108{
4109  n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
4110  n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
4111  n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
4112  n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
4113  return n;
4114}
4115
4116stbi_inline static int stbi__bit_reverse(int v, int bits)
4117{
4118   STBI_ASSERT(bits <= 16);
4119   // to bit reverse n bits, reverse 16 and shift
4120   // e.g. 11 bits, bit reverse and shift away 5
4121   return stbi__bitreverse16(v) >> (16-bits);
4122}
4123
4124static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
4125{
4126   int i,k=0;
4127   int code, next_code[16], sizes[17];
4128
4129   // DEFLATE spec for generating codes
4130   memset(sizes, 0, sizeof(sizes));
4131   memset(z->fast, 0, sizeof(z->fast));
4132   for (i=0; i < num; ++i)
4133      ++sizes[sizelist[i]];
4134   sizes[0] = 0;
4135   for (i=1; i < 16; ++i)
4136      if (sizes[i] > (1 << i))
4137         return stbi__err("bad sizes", "Corrupt PNG");
4138   code = 0;
4139   for (i=1; i < 16; ++i) {
4140      next_code[i] = code;
4141      z->firstcode[i] = (stbi__uint16) code;
4142      z->firstsymbol[i] = (stbi__uint16) k;
4143      code = (code + sizes[i]);
4144      if (sizes[i])
4145         if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
4146      z->maxcode[i] = code << (16-i); // preshift for inner loop
4147      code <<= 1;
4148      k += sizes[i];
4149   }
4150   z->maxcode[16] = 0x10000; // sentinel
4151   for (i=0; i < num; ++i) {
4152      int s = sizelist[i];
4153      if (s) {
4154         int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
4155         stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
4156         z->size [c] = (stbi_uc     ) s;
4157         z->value[c] = (stbi__uint16) i;
4158         if (s <= STBI__ZFAST_BITS) {
4159            int j = stbi__bit_reverse(next_code[s],s);
4160            while (j < (1 << STBI__ZFAST_BITS)) {
4161               z->fast[j] = fastv;
4162               j += (1 << s);
4163            }
4164         }
4165         ++next_code[s];
4166      }
4167   }
4168   return 1;
4169}
4170
4171// zlib-from-memory implementation for PNG reading
4172//    because PNG allows splitting the zlib stream arbitrarily,
4173//    and it's annoying structurally to have PNG call ZLIB call PNG,
4174//    we require PNG read all the IDATs and combine them into a single
4175//    memory buffer
4176
4177typedef struct
4178{
4179   stbi_uc *zbuffer, *zbuffer_end;
4180   int num_bits;
4181   int hit_zeof_once;
4182   stbi__uint32 code_buffer;
4183
4184   char *zout;
4185   char *zout_start;
4186   char *zout_end;
4187   int   z_expandable;
4188
4189   stbi__zhuffman z_length, z_distance;
4190} stbi__zbuf;
4191
4192stbi_inline static int stbi__zeof(stbi__zbuf *z)
4193{
4194   return (z->zbuffer >= z->zbuffer_end);
4195}
4196
4197stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
4198{
4199   return stbi__zeof(z) ? 0 : *z->zbuffer++;
4200}
4201
4202static void stbi__fill_bits(stbi__zbuf *z)
4203{
4204   do {
4205      if (z->code_buffer >= (1U << z->num_bits)) {
4206        z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
4207        return;
4208      }
4209      z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
4210      z->num_bits += 8;
4211   } while (z->num_bits <= 24);
4212}
4213
4214stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
4215{
4216   unsigned int k;
4217   if (z->num_bits < n) stbi__fill_bits(z);
4218   k = z->code_buffer & ((1 << n) - 1);
4219   z->code_buffer >>= n;
4220   z->num_bits -= n;
4221   return k;
4222}
4223
4224static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
4225{
4226   int b,s,k;
4227   // not resolved by fast table, so compute it the slow way
4228   // use jpeg approach, which requires MSbits at top
4229   k = stbi__bit_reverse(a->code_buffer, 16);
4230   for (s=STBI__ZFAST_BITS+1; ; ++s)
4231      if (k < z->maxcode[s])
4232         break;
4233   if (s >= 16) return -1; // invalid code!
4234   // code size is s, so:
4235   b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
4236   if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
4237   if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
4238   a->code_buffer >>= s;
4239   a->num_bits -= s;
4240   return z->value[b];
4241}
4242
4243stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
4244{
4245   int b,s;
4246   if (a->num_bits < 16) {
4247      if (stbi__zeof(a)) {
4248         if (!a->hit_zeof_once) {
4249            // This is the first time we hit eof, insert 16 extra padding btis
4250            // to allow us to keep going; if we actually consume any of them
4251            // though, that is invalid data. This is caught later.
4252            a->hit_zeof_once = 1;
4253            a->num_bits += 16; // add 16 implicit zero bits
4254         } else {
4255            // We already inserted our extra 16 padding bits and are again
4256            // out, this stream is actually prematurely terminated.
4257            return -1;
4258         }
4259      } else {
4260         stbi__fill_bits(a);
4261      }
4262   }
4263   b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
4264   if (b) {
4265      s = b >> 9;
4266      a->code_buffer >>= s;
4267      a->num_bits -= s;
4268      return b & 511;
4269   }
4270   return stbi__zhuffman_decode_slowpath(a, z);
4271}
4272
4273static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
4274{
4275   char *q;
4276   unsigned int cur, limit, old_limit;
4277   z->zout = zout;
4278   if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
4279   cur   = (unsigned int) (z->zout - z->zout_start);
4280   limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
4281   if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
4282   while (cur + n > limit) {
4283      if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
4284      limit *= 2;
4285   }
4286   q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
4287   STBI_NOTUSED(old_limit);
4288   if (q == NULL) return stbi__err("outofmem", "Out of memory");
4289   z->zout_start = q;
4290   z->zout       = q + cur;
4291   z->zout_end   = q + limit;
4292   return 1;
4293}
4294
4295static const int stbi__zlength_base[31] = {
4296   3,4,5,6,7,8,9,10,11,13,
4297   15,17,19,23,27,31,35,43,51,59,
4298   67,83,99,115,131,163,195,227,258,0,0 };
4299
4300static const int stbi__zlength_extra[31]=
4301{ 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
4302
4303static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
4304257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
4305
4306static const int stbi__zdist_extra[32] =
4307{ 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
4308
4309static int stbi__parse_huffman_block(stbi__zbuf *a)
4310{
4311   char *zout = a->zout;
4312   for(;;) {
4313      int z = stbi__zhuffman_decode(a, &a->z_length);
4314      if (z < 256) {
4315         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
4316         if (zout >= a->zout_end) {
4317            if (!stbi__zexpand(a, zout, 1)) return 0;
4318            zout = a->zout;
4319         }
4320         *zout++ = (char) z;
4321      } else {
4322         stbi_uc *p;
4323         int len,dist;
4324         if (z == 256) {
4325            a->zout = zout;
4326            if (a->hit_zeof_once && a->num_bits < 16) {
4327               // The first time we hit zeof, we inserted 16 extra zero bits into our bit
4328               // buffer so the decoder can just do its speculative decoding. But if we
4329               // actually consumed any of those bits (which is the case when num_bits < 16),
4330               // the stream actually read past the end so it is malformed.
4331               return stbi__err("unexpected end","Corrupt PNG");
4332            }
4333            return 1;
4334         }
4335         if (z >= 286) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, length codes 286 and 287 must not appear in compressed data
4336         z -= 257;
4337         len = stbi__zlength_base[z];
4338         if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
4339         z = stbi__zhuffman_decode(a, &a->z_distance);
4340         if (z < 0 || z >= 30) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, distance codes 30 and 31 must not appear in compressed data
4341         dist = stbi__zdist_base[z];
4342         if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
4343         if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
4344         if (len > a->zout_end - zout) {
4345            if (!stbi__zexpand(a, zout, len)) return 0;
4346            zout = a->zout;
4347         }
4348         p = (stbi_uc *) (zout - dist);
4349         if (dist == 1) { // run of one byte; common in images.
4350            stbi_uc v = *p;
4351            if (len) { do *zout++ = v; while (--len); }
4352         } else {
4353            if (len) { do *zout++ = *p++; while (--len); }
4354         }
4355      }
4356   }
4357}
4358
4359static int stbi__compute_huffman_codes(stbi__zbuf *a)
4360{
4361   static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
4362   stbi__zhuffman z_codelength;
4363   stbi_uc lencodes[286+32+137];//padding for maximum single op
4364   stbi_uc codelength_sizes[19];
4365   int i,n;
4366
4367   int hlit  = stbi__zreceive(a,5) + 257;
4368   int hdist = stbi__zreceive(a,5) + 1;
4369   int hclen = stbi__zreceive(a,4) + 4;
4370   int ntot  = hlit + hdist;
4371
4372   memset(codelength_sizes, 0, sizeof(codelength_sizes));
4373   for (i=0; i < hclen; ++i) {
4374      int s = stbi__zreceive(a,3);
4375      codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
4376   }
4377   if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
4378
4379   n = 0;
4380   while (n < ntot) {
4381      int c = stbi__zhuffman_decode(a, &z_codelength);
4382      if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
4383      if (c < 16)
4384         lencodes[n++] = (stbi_uc) c;
4385      else {
4386         stbi_uc fill = 0;
4387         if (c == 16) {
4388            c = stbi__zreceive(a,2)+3;
4389            if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
4390            fill = lencodes[n-1];
4391         } else if (c == 17) {
4392            c = stbi__zreceive(a,3)+3;
4393         } else if (c == 18) {
4394            c = stbi__zreceive(a,7)+11;
4395         } else {
4396            return stbi__err("bad codelengths", "Corrupt PNG");
4397         }
4398         if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
4399         memset(lencodes+n, fill, c);
4400         n += c;
4401      }
4402   }
4403   if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
4404   if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
4405   if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
4406   return 1;
4407}
4408
4409static int stbi__parse_uncompressed_block(stbi__zbuf *a)
4410{
4411   stbi_uc header[4];
4412   int len,nlen,k;
4413   if (a->num_bits & 7)
4414      stbi__zreceive(a, a->num_bits & 7); // discard
4415   // drain the bit-packed data into header
4416   k = 0;
4417   while (a->num_bits > 0) {
4418      header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
4419      a->code_buffer >>= 8;
4420      a->num_bits -= 8;
4421   }
4422   if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
4423   // now fill header the normal way
4424   while (k < 4)
4425      header[k++] = stbi__zget8(a);
4426   len  = header[1] * 256 + header[0];
4427   nlen = header[3] * 256 + header[2];
4428   if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
4429   if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
4430   if (a->zout + len > a->zout_end)
4431      if (!stbi__zexpand(a, a->zout, len)) return 0;
4432   memcpy(a->zout, a->zbuffer, len);
4433   a->zbuffer += len;
4434   a->zout += len;
4435   return 1;
4436}
4437
4438static int stbi__parse_zlib_header(stbi__zbuf *a)
4439{
4440   int cmf   = stbi__zget8(a);
4441   int cm    = cmf & 15;
4442   /* int cinfo = cmf >> 4; */
4443   int flg   = stbi__zget8(a);
4444   if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4445   if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4446   if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
4447   if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
4448   // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4449   return 1;
4450}
4451
4452static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
4453{
4454   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4455   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4456   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4457   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4458   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4459   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4460   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4461   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4462   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
4463};
4464static const stbi_uc stbi__zdefault_distance[32] =
4465{
4466   5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
4467};
4468/*
4469Init algorithm:
4470{
4471   int i;   // use <= to match clearly with spec
4472   for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
4473   for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
4474   for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
4475   for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
4476
4477   for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
4478}
4479*/
4480
4481static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
4482{
4483   int final, type;
4484   if (parse_header)
4485      if (!stbi__parse_zlib_header(a)) return 0;
4486   a->num_bits = 0;
4487   a->code_buffer = 0;
4488   a->hit_zeof_once = 0;
4489   do {
4490      final = stbi__zreceive(a,1);
4491      type = stbi__zreceive(a,2);
4492      if (type == 0) {
4493         if (!stbi__parse_uncompressed_block(a)) return 0;
4494      } else if (type == 3) {
4495         return 0;
4496      } else {
4497         if (type == 1) {
4498            // use fixed code lengths
4499            if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
4500            if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
4501         } else {
4502            if (!stbi__compute_huffman_codes(a)) return 0;
4503         }
4504         if (!stbi__parse_huffman_block(a)) return 0;
4505      }
4506   } while (!final);
4507   return 1;
4508}
4509
4510static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
4511{
4512   a->zout_start = obuf;
4513   a->zout       = obuf;
4514   a->zout_end   = obuf + olen;
4515   a->z_expandable = exp;
4516
4517   return stbi__parse_zlib(a, parse_header);
4518}
4519
4520STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
4521{
4522   stbi__zbuf a;
4523   char *p = (char *) stbi__malloc(initial_size);
4524   if (p == NULL) return NULL;
4525   a.zbuffer = (stbi_uc *) buffer;
4526   a.zbuffer_end = (stbi_uc *) buffer + len;
4527   if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4528      if (outlen) *outlen = (int) (a.zout - a.zout_start);
4529      return a.zout_start;
4530   } else {
4531      STBI_FREE(a.zout_start);
4532      return NULL;
4533   }
4534}
4535
4536STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
4537{
4538   return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4539}
4540
4541STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
4542{
4543   stbi__zbuf a;
4544   char *p = (char *) stbi__malloc(initial_size);
4545   if (p == NULL) return NULL;
4546   a.zbuffer = (stbi_uc *) buffer;
4547   a.zbuffer_end = (stbi_uc *) buffer + len;
4548   if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4549      if (outlen) *outlen = (int) (a.zout - a.zout_start);
4550      return a.zout_start;
4551   } else {
4552      STBI_FREE(a.zout_start);
4553      return NULL;
4554   }
4555}
4556
4557STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
4558{
4559   stbi__zbuf a;
4560   a.zbuffer = (stbi_uc *) ibuffer;
4561   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4562   if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4563      return (int) (a.zout - a.zout_start);
4564   else
4565      return -1;
4566}
4567
4568STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
4569{
4570   stbi__zbuf a;
4571   char *p = (char *) stbi__malloc(16384);
4572   if (p == NULL) return NULL;
4573   a.zbuffer = (stbi_uc *) buffer;
4574   a.zbuffer_end = (stbi_uc *) buffer+len;
4575   if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4576      if (outlen) *outlen = (int) (a.zout - a.zout_start);
4577      return a.zout_start;
4578   } else {
4579      STBI_FREE(a.zout_start);
4580      return NULL;
4581   }
4582}
4583
4584STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
4585{
4586   stbi__zbuf a;
4587   a.zbuffer = (stbi_uc *) ibuffer;
4588   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4589   if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4590      return (int) (a.zout - a.zout_start);
4591   else
4592      return -1;
4593}
4594#endif
4595
4596// public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
4597//    simple implementation
4598//      - only 8-bit samples
4599//      - no CRC checking
4600//      - allocates lots of intermediate memory
4601//        - avoids problem of streaming data between subsystems
4602//        - avoids explicit window management
4603//    performance
4604//      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4605
4606#ifndef STBI_NO_PNG
4607typedef struct
4608{
4609   stbi__uint32 length;
4610   stbi__uint32 type;
4611} stbi__pngchunk;
4612
4613static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
4614{
4615   stbi__pngchunk c;
4616   c.length = stbi__get32be(s);
4617   c.type   = stbi__get32be(s);
4618   return c;
4619}
4620
4621static int stbi__check_png_header(stbi__context *s)
4622{
4623   static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
4624   int i;
4625   for (i=0; i < 8; ++i)
4626      if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
4627   return 1;
4628}
4629
4630typedef struct
4631{
4632   stbi__context *s;
4633   stbi_uc *idata, *expanded, *out;
4634   int depth;
4635} stbi__png;
4636
4637
4638enum {
4639   STBI__F_none=0,
4640   STBI__F_sub=1,
4641   STBI__F_up=2,
4642   STBI__F_avg=3,
4643   STBI__F_paeth=4,
4644   // synthetic filter used for first scanline to avoid needing a dummy row of 0s
4645   STBI__F_avg_first
4646};
4647
4648static stbi_uc first_row_filter[5] =
4649{
4650   STBI__F_none,
4651   STBI__F_sub,
4652   STBI__F_none,
4653   STBI__F_avg_first,
4654   STBI__F_sub // Paeth with b=c=0 turns out to be equivalent to sub
4655};
4656
4657static int stbi__paeth(int a, int b, int c)
4658{
4659   // This formulation looks very different from the reference in the PNG spec, but is
4660   // actually equivalent and has favorable data dependencies and admits straightforward
4661   // generation of branch-free code, which helps performance significantly.
4662   int thresh = c*3 - (a + b);
4663   int lo = a < b ? a : b;
4664   int hi = a < b ? b : a;
4665   int t0 = (hi <= thresh) ? lo : c;
4666   int t1 = (thresh <= lo) ? hi : t0;
4667   return t1;
4668}
4669
4670static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4671
4672// adds an extra all-255 alpha channel
4673// dest == src is legal
4674// img_n must be 1 or 3
4675static void stbi__create_png_alpha_expand8(stbi_uc *dest, stbi_uc *src, stbi__uint32 x, int img_n)
4676{
4677   int i;
4678   // must process data backwards since we allow dest==src
4679   if (img_n == 1) {
4680      for (i=x-1; i >= 0; --i) {
4681         dest[i*2+1] = 255;
4682         dest[i*2+0] = src[i];
4683      }
4684   } else {
4685      STBI_ASSERT(img_n == 3);
4686      for (i=x-1; i >= 0; --i) {
4687         dest[i*4+3] = 255;
4688         dest[i*4+2] = src[i*3+2];
4689         dest[i*4+1] = src[i*3+1];
4690         dest[i*4+0] = src[i*3+0];
4691      }
4692   }
4693}
4694
4695// create the png data from post-deflated data
4696static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4697{
4698   int bytes = (depth == 16 ? 2 : 1);
4699   stbi__context *s = a->s;
4700   stbi__uint32 i,j,stride = x*out_n*bytes;
4701   stbi__uint32 img_len, img_width_bytes;
4702   stbi_uc *filter_buf;
4703   int all_ok = 1;
4704   int k;
4705   int img_n = s->img_n; // copy it into a local for later
4706
4707   int output_bytes = out_n*bytes;
4708   int filter_bytes = img_n*bytes;
4709   int width = x;
4710
4711   STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4712   a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
4713   if (!a->out) return stbi__err("outofmem", "Out of memory");
4714
4715   // note: error exits here don't need to clean up a->out individually,
4716   // stbi__do_png always does on error.
4717   if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
4718   img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4719   if (!stbi__mad2sizes_valid(img_width_bytes, y, img_width_bytes)) return stbi__err("too large", "Corrupt PNG");
4720   img_len = (img_width_bytes + 1) * y;
4721
4722   // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
4723   // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
4724   // so just check for raw_len < img_len always.
4725   if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4726
4727   // Allocate two scan lines worth of filter workspace buffer.
4728   filter_buf = (stbi_uc *) stbi__malloc_mad2(img_width_bytes, 2, 0);
4729   if (!filter_buf) return stbi__err("outofmem", "Out of memory");
4730
4731   // Filtering for low-bit-depth images
4732   if (depth < 8) {
4733      filter_bytes = 1;
4734      width = img_width_bytes;
4735   }
4736
4737   for (j=0; j < y; ++j) {
4738      // cur/prior filter buffers alternate
4739      stbi_uc *cur = filter_buf + (j & 1)*img_width_bytes;
4740      stbi_uc *prior = filter_buf + (~j & 1)*img_width_bytes;
4741      stbi_uc *dest = a->out + stride*j;
4742      int nk = width * filter_bytes;
4743      int filter = *raw++;
4744
4745      // check filter type
4746      if (filter > 4) {
4747         all_ok = stbi__err("invalid filter","Corrupt PNG");
4748         break;
4749      }
4750
4751      // if first row, use special filter that doesn't sample previous row
4752      if (j == 0) filter = first_row_filter[filter];
4753
4754      // perform actual filtering
4755      switch (filter) {
4756      case STBI__F_none:
4757         memcpy(cur, raw, nk);
4758         break;
4759      case STBI__F_sub:
4760         memcpy(cur, raw, filter_bytes);
4761         for (k = filter_bytes; k < nk; ++k)
4762            cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]);
4763         break;
4764      case STBI__F_up:
4765         for (k = 0; k < nk; ++k)
4766            cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
4767         break;
4768      case STBI__F_avg:
4769         for (k = 0; k < filter_bytes; ++k)
4770            cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1));
4771         for (k = filter_bytes; k < nk; ++k)
4772            cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1));
4773         break;
4774      case STBI__F_paeth:
4775         for (k = 0; k < filter_bytes; ++k)
4776            cur[k] = STBI__BYTECAST(raw[k] + prior[k]); // prior[k] == stbi__paeth(0,prior[k],0)
4777         for (k = filter_bytes; k < nk; ++k)
4778            cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes], prior[k], prior[k-filter_bytes]));
4779         break;
4780      case STBI__F_avg_first:
4781         memcpy(cur, raw, filter_bytes);
4782         for (k = filter_bytes; k < nk; ++k)
4783            cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1));
4784         break;
4785      }
4786
4787      raw += nk;
4788
4789      // expand decoded bits in cur to dest, also adding an extra alpha channel if desired
4790      if (depth < 8) {
4791         stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4792         stbi_uc *in = cur;
4793         stbi_uc *out = dest;
4794         stbi_uc inb = 0;
4795         stbi__uint32 nsmp = x*img_n;
4796
4797         // expand bits to bytes first
4798         if (depth == 4) {
4799            for (i=0; i < nsmp; ++i) {
4800               if ((i & 1) == 0) inb = *in++;
4801               *out++ = scale * (inb >> 4);
4802               inb <<= 4;
4803            }
4804         } else if (depth == 2) {
4805            for (i=0; i < nsmp; ++i) {
4806               if ((i & 3) == 0) inb = *in++;
4807               *out++ = scale * (inb >> 6);
4808               inb <<= 2;
4809            }
4810         } else {
4811            STBI_ASSERT(depth == 1);
4812            for (i=0; i < nsmp; ++i) {
4813               if ((i & 7) == 0) inb = *in++;
4814               *out++ = scale * (inb >> 7);
4815               inb <<= 1;
4816            }
4817         }
4818
4819         // insert alpha=255 values if desired
4820         if (img_n != out_n)
4821            stbi__create_png_alpha_expand8(dest, dest, x, img_n);
4822      } else if (depth == 8) {
4823         if (img_n == out_n)
4824            memcpy(dest, cur, x*img_n);
4825         else
4826            stbi__create_png_alpha_expand8(dest, cur, x, img_n);
4827      } else if (depth == 16) {
4828         // convert the image data from big-endian to platform-native
4829         stbi__uint16 *dest16 = (stbi__uint16*)dest;
4830         stbi__uint32 nsmp = x*img_n;
4831
4832         if (img_n == out_n) {
4833            for (i = 0; i < nsmp; ++i, ++dest16, cur += 2)
4834               *dest16 = (cur[0] << 8) | cur[1];
4835         } else {
4836            STBI_ASSERT(img_n+1 == out_n);
4837            if (img_n == 1) {
4838               for (i = 0; i < x; ++i, dest16 += 2, cur += 2) {
4839                  dest16[0] = (cur[0] << 8) | cur[1];
4840                  dest16[1] = 0xffff;
4841               }
4842            } else {
4843               STBI_ASSERT(img_n == 3);
4844               for (i = 0; i < x; ++i, dest16 += 4, cur += 6) {
4845                  dest16[0] = (cur[0] << 8) | cur[1];
4846                  dest16[1] = (cur[2] << 8) | cur[3];
4847                  dest16[2] = (cur[4] << 8) | cur[5];
4848                  dest16[3] = 0xffff;
4849               }
4850            }
4851         }
4852      }
4853   }
4854
4855   STBI_FREE(filter_buf);
4856   if (!all_ok) return 0;
4857
4858   return 1;
4859}
4860
4861static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4862{
4863   int bytes = (depth == 16 ? 2 : 1);
4864   int out_bytes = out_n * bytes;
4865   stbi_uc *final;
4866   int p;
4867   if (!interlaced)
4868      return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4869
4870   // de-interlacing
4871   final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
4872   if (!final) return stbi__err("outofmem", "Out of memory");
4873   for (p=0; p < 7; ++p) {
4874      int xorig[] = { 0,4,0,2,0,1,0 };
4875      int yorig[] = { 0,0,4,0,2,0,1 };
4876      int xspc[]  = { 8,8,4,4,2,2,1 };
4877      int yspc[]  = { 8,8,8,4,4,2,2 };
4878      int i,j,x,y;
4879      // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4880      x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4881      y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4882      if (x && y) {
4883         stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4884         if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4885            STBI_FREE(final);
4886            return 0;
4887         }
4888         for (j=0; j < y; ++j) {
4889            for (i=0; i < x; ++i) {
4890               int out_y = j*yspc[p]+yorig[p];
4891               int out_x = i*xspc[p]+xorig[p];
4892               memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
4893                      a->out + (j*x+i)*out_bytes, out_bytes);
4894            }
4895         }
4896         STBI_FREE(a->out);
4897         image_data += img_len;
4898         image_data_len -= img_len;
4899      }
4900   }
4901   a->out = final;
4902
4903   return 1;
4904}
4905
4906static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4907{
4908   stbi__context *s = z->s;
4909   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4910   stbi_uc *p = z->out;
4911
4912   // compute color-based transparency, assuming we've
4913   // already got 255 as the alpha value in the output
4914   STBI_ASSERT(out_n == 2 || out_n == 4);
4915
4916   if (out_n == 2) {
4917      for (i=0; i < pixel_count; ++i) {
4918         p[1] = (p[0] == tc[0] ? 0 : 255);
4919         p += 2;
4920      }
4921   } else {
4922      for (i=0; i < pixel_count; ++i) {
4923         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4924            p[3] = 0;
4925         p += 4;
4926      }
4927   }
4928   return 1;
4929}
4930
4931static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4932{
4933   stbi__context *s = z->s;
4934   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4935   stbi__uint16 *p = (stbi__uint16*) z->out;
4936
4937   // compute color-based transparency, assuming we've
4938   // already got 65535 as the alpha value in the output
4939   STBI_ASSERT(out_n == 2 || out_n == 4);
4940
4941   if (out_n == 2) {
4942      for (i = 0; i < pixel_count; ++i) {
4943         p[1] = (p[0] == tc[0] ? 0 : 65535);
4944         p += 2;
4945      }
4946   } else {
4947      for (i = 0; i < pixel_count; ++i) {
4948         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4949            p[3] = 0;
4950         p += 4;
4951      }
4952   }
4953   return 1;
4954}
4955
4956static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4957{
4958   stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4959   stbi_uc *p, *temp_out, *orig = a->out;
4960
4961   p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
4962   if (p == NULL) return stbi__err("outofmem", "Out of memory");
4963
4964   // between here and free(out) below, exitting would leak
4965   temp_out = p;
4966
4967   if (pal_img_n == 3) {
4968      for (i=0; i < pixel_count; ++i) {
4969         int n = orig[i]*4;
4970         p[0] = palette[n  ];
4971         p[1] = palette[n+1];
4972         p[2] = palette[n+2];
4973         p += 3;
4974      }
4975   } else {
4976      for (i=0; i < pixel_count; ++i) {
4977         int n = orig[i]*4;
4978         p[0] = palette[n  ];
4979         p[1] = palette[n+1];
4980         p[2] = palette[n+2];
4981         p[3] = palette[n+3];
4982         p += 4;
4983      }
4984   }
4985   STBI_FREE(a->out);
4986   a->out = temp_out;
4987
4988   STBI_NOTUSED(len);
4989
4990   return 1;
4991}
4992
4993static int stbi__unpremultiply_on_load_global = 0;
4994static int stbi__de_iphone_flag_global = 0;
4995
4996STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4997{
4998   stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
4999}
5000
5001STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
5002{
5003   stbi__de_iphone_flag_global = flag_true_if_should_convert;
5004}
5005
5006#ifndef STBI_THREAD_LOCAL
5007#define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
5008#define stbi__de_iphone_flag  stbi__de_iphone_flag_global
5009#else
5010static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
5011static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
5012
5013STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
5014{
5015   stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
5016   stbi__unpremultiply_on_load_set = 1;
5017}
5018
5019STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
5020{
5021   stbi__de_iphone_flag_local = flag_true_if_should_convert;
5022   stbi__de_iphone_flag_set = 1;
5023}
5024
5025#define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
5026                                       ? stbi__unpremultiply_on_load_local      \
5027                                       : stbi__unpremultiply_on_load_global)
5028#define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
5029                                ? stbi__de_iphone_flag_local                    \
5030                                : stbi__de_iphone_flag_global)
5031#endif // STBI_THREAD_LOCAL
5032
5033static void stbi__de_iphone(stbi__png *z)
5034{
5035   stbi__context *s = z->s;
5036   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
5037   stbi_uc *p = z->out;
5038
5039   if (s->img_out_n == 3) {  // convert bgr to rgb
5040      for (i=0; i < pixel_count; ++i) {
5041         stbi_uc t = p[0];
5042         p[0] = p[2];
5043         p[2] = t;
5044         p += 3;
5045      }
5046   } else {
5047      STBI_ASSERT(s->img_out_n == 4);
5048      if (stbi__unpremultiply_on_load) {
5049         // convert bgr to rgb and unpremultiply
5050         for (i=0; i < pixel_count; ++i) {
5051            stbi_uc a = p[3];
5052            stbi_uc t = p[0];
5053            if (a) {
5054               stbi_uc half = a / 2;
5055               p[0] = (p[2] * 255 + half) / a;
5056               p[1] = (p[1] * 255 + half) / a;
5057               p[2] = ( t   * 255 + half) / a;
5058            } else {
5059               p[0] = p[2];
5060               p[2] = t;
5061            }
5062            p += 4;
5063         }
5064      } else {
5065         // convert bgr to rgb
5066         for (i=0; i < pixel_count; ++i) {
5067            stbi_uc t = p[0];
5068            p[0] = p[2];
5069            p[2] = t;
5070            p += 4;
5071         }
5072      }
5073   }
5074}
5075
5076#define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
5077
5078static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
5079{
5080   stbi_uc palette[1024], pal_img_n=0;
5081   stbi_uc has_trans=0, tc[3]={0};
5082   stbi__uint16 tc16[3];
5083   stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
5084   int first=1,k,interlace=0, color=0, is_iphone=0;
5085   stbi__context *s = z->s;
5086
5087   z->expanded = NULL;
5088   z->idata = NULL;
5089   z->out = NULL;
5090
5091   if (!stbi__check_png_header(s)) return 0;
5092
5093   if (scan == STBI__SCAN_type) return 1;
5094
5095   for (;;) {
5096      stbi__pngchunk c = stbi__get_chunk_header(s);
5097      switch (c.type) {
5098         case STBI__PNG_TYPE('C','g','B','I'):
5099            is_iphone = 1;
5100            stbi__skip(s, c.length);
5101            break;
5102         case STBI__PNG_TYPE('I','H','D','R'): {
5103            int comp,filter;
5104            if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
5105            first = 0;
5106            if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
5107            s->img_x = stbi__get32be(s);
5108            s->img_y = stbi__get32be(s);
5109            if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5110            if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5111            z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
5112            color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
5113            if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
5114            if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
5115            comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
5116            filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
5117            interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
5118            if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
5119            if (!pal_img_n) {
5120               s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
5121               if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
5122            } else {
5123               // if paletted, then pal_n is our final components, and
5124               // img_n is # components to decompress/filter.
5125               s->img_n = 1;
5126               if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
5127            }
5128            // even with SCAN_header, have to scan to see if we have a tRNS
5129            break;
5130         }
5131
5132         case STBI__PNG_TYPE('P','L','T','E'):  {
5133            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5134            if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
5135            pal_len = c.length / 3;
5136            if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
5137            for (i=0; i < pal_len; ++i) {
5138               palette[i*4+0] = stbi__get8(s);
5139               palette[i*4+1] = stbi__get8(s);
5140               palette[i*4+2] = stbi__get8(s);
5141               palette[i*4+3] = 255;
5142            }
5143            break;
5144         }
5145
5146         case STBI__PNG_TYPE('t','R','N','S'): {
5147            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5148            if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
5149            if (pal_img_n) {
5150               if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
5151               if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
5152               if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
5153               pal_img_n = 4;
5154               for (i=0; i < c.length; ++i)
5155                  palette[i*4+3] = stbi__get8(s);
5156            } else {
5157               if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
5158               if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
5159               has_trans = 1;
5160               // non-paletted with tRNS = constant alpha. if header-scanning, we can stop now.
5161               if (scan == STBI__SCAN_header) { ++s->img_n; return 1; }
5162               if (z->depth == 16) {
5163                  for (k = 0; k < s->img_n && k < 3; ++k) // extra loop test to suppress false GCC warning
5164                     tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
5165               } else {
5166                  for (k = 0; k < s->img_n && k < 3; ++k)
5167                     tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
5168               }
5169            }
5170            break;
5171         }
5172
5173         case STBI__PNG_TYPE('I','D','A','T'): {
5174            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5175            if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
5176            if (scan == STBI__SCAN_header) {
5177               // header scan definitely stops at first IDAT
5178               if (pal_img_n)
5179                  s->img_n = pal_img_n;
5180               return 1;
5181            }
5182            if (c.length > (1u << 30)) return stbi__err("IDAT size limit", "IDAT section larger than 2^30 bytes");
5183            if ((int)(ioff + c.length) < (int)ioff) return 0;
5184            if (ioff + c.length > idata_limit) {
5185               stbi__uint32 idata_limit_old = idata_limit;
5186               stbi_uc *p;
5187               if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
5188               while (ioff + c.length > idata_limit)
5189                  idata_limit *= 2;
5190               STBI_NOTUSED(idata_limit_old);
5191               p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
5192               z->idata = p;
5193            }
5194            if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
5195            ioff += c.length;
5196            break;
5197         }
5198
5199         case STBI__PNG_TYPE('I','E','N','D'): {
5200            stbi__uint32 raw_len, bpl;
5201            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5202            if (scan != STBI__SCAN_load) return 1;
5203            if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
5204            // initial guess for decoded data size to avoid unnecessary reallocs
5205            bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
5206            raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
5207            z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
5208            if (z->expanded == NULL) return 0; // zlib should set error
5209            STBI_FREE(z->idata); z->idata = NULL;
5210            if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
5211               s->img_out_n = s->img_n+1;
5212            else
5213               s->img_out_n = s->img_n;
5214            if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
5215            if (has_trans) {
5216               if (z->depth == 16) {
5217                  if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
5218               } else {
5219                  if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
5220               }
5221            }
5222            if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
5223               stbi__de_iphone(z);
5224            if (pal_img_n) {
5225               // pal_img_n == 3 or 4
5226               s->img_n = pal_img_n; // record the actual colors we had
5227               s->img_out_n = pal_img_n;
5228               if (req_comp >= 3) s->img_out_n = req_comp;
5229               if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
5230                  return 0;
5231            } else if (has_trans) {
5232               // non-paletted image with tRNS -> source image has (constant) alpha
5233               ++s->img_n;
5234            }
5235            STBI_FREE(z->expanded); z->expanded = NULL;
5236            // end of PNG chunk, read and skip CRC
5237            stbi__get32be(s);
5238            return 1;
5239         }
5240
5241         default:
5242            // if critical, fail
5243            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5244            if ((c.type & (1 << 29)) == 0) {
5245               #ifndef STBI_NO_FAILURE_STRINGS
5246               // not threadsafe
5247               static char invalid_chunk[] = "XXXX PNG chunk not known";
5248               invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
5249               invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
5250               invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
5251               invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
5252               #endif
5253               return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
5254            }
5255            stbi__skip(s, c.length);
5256            break;
5257      }
5258      // end of PNG chunk, read and skip CRC
5259      stbi__get32be(s);
5260   }
5261}
5262
5263static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
5264{
5265   void *result=NULL;
5266   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
5267   if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
5268      if (p->depth <= 8)
5269         ri->bits_per_channel = 8;
5270      else if (p->depth == 16)
5271         ri->bits_per_channel = 16;
5272      else
5273         return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
5274      result = p->out;
5275      p->out = NULL;
5276      if (req_comp && req_comp != p->s->img_out_n) {
5277         if (ri->bits_per_channel == 8)
5278            result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5279         else
5280            result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5281         p->s->img_out_n = req_comp;
5282         if (result == NULL) return result;
5283      }
5284      *x = p->s->img_x;
5285      *y = p->s->img_y;
5286      if (n) *n = p->s->img_n;
5287   }
5288   STBI_FREE(p->out);      p->out      = NULL;
5289   STBI_FREE(p->expanded); p->expanded = NULL;
5290   STBI_FREE(p->idata);    p->idata    = NULL;
5291
5292   return result;
5293}
5294
5295static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5296{
5297   stbi__png p;
5298   p.s = s;
5299   return stbi__do_png(&p, x,y,comp,req_comp, ri);
5300}
5301
5302static int stbi__png_test(stbi__context *s)
5303{
5304   int r;
5305   r = stbi__check_png_header(s);
5306   stbi__rewind(s);
5307   return r;
5308}
5309
5310static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
5311{
5312   if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
5313      stbi__rewind( p->s );
5314      return 0;
5315   }
5316   if (x) *x = p->s->img_x;
5317   if (y) *y = p->s->img_y;
5318   if (comp) *comp = p->s->img_n;
5319   return 1;
5320}
5321
5322static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
5323{
5324   stbi__png p;
5325   p.s = s;
5326   return stbi__png_info_raw(&p, x, y, comp);
5327}
5328
5329static int stbi__png_is16(stbi__context *s)
5330{
5331   stbi__png p;
5332   p.s = s;
5333   if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
5334	   return 0;
5335   if (p.depth != 16) {
5336      stbi__rewind(p.s);
5337      return 0;
5338   }
5339   return 1;
5340}
5341#endif
5342
5343// Microsoft/Windows BMP image
5344
5345#ifndef STBI_NO_BMP
5346static int stbi__bmp_test_raw(stbi__context *s)
5347{
5348   int r;
5349   int sz;
5350   if (stbi__get8(s) != 'B') return 0;
5351   if (stbi__get8(s) != 'M') return 0;
5352   stbi__get32le(s); // discard filesize
5353   stbi__get16le(s); // discard reserved
5354   stbi__get16le(s); // discard reserved
5355   stbi__get32le(s); // discard data offset
5356   sz = stbi__get32le(s);
5357   r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
5358   return r;
5359}
5360
5361static int stbi__bmp_test(stbi__context *s)
5362{
5363   int r = stbi__bmp_test_raw(s);
5364   stbi__rewind(s);
5365   return r;
5366}
5367
5368
5369// returns 0..31 for the highest set bit
5370static int stbi__high_bit(unsigned int z)
5371{
5372   int n=0;
5373   if (z == 0) return -1;
5374   if (z >= 0x10000) { n += 16; z >>= 16; }
5375   if (z >= 0x00100) { n +=  8; z >>=  8; }
5376   if (z >= 0x00010) { n +=  4; z >>=  4; }
5377   if (z >= 0x00004) { n +=  2; z >>=  2; }
5378   if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
5379   return n;
5380}
5381
5382static int stbi__bitcount(unsigned int a)
5383{
5384   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
5385   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
5386   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
5387   a = (a + (a >> 8)); // max 16 per 8 bits
5388   a = (a + (a >> 16)); // max 32 per 8 bits
5389   return a & 0xff;
5390}
5391
5392// extract an arbitrarily-aligned N-bit value (N=bits)
5393// from v, and then make it 8-bits long and fractionally
5394// extend it to full full range.
5395static int stbi__shiftsigned(unsigned int v, int shift, int bits)
5396{
5397   static unsigned int mul_table[9] = {
5398      0,
5399      0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
5400      0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
5401   };
5402   static unsigned int shift_table[9] = {
5403      0, 0,0,1,0,2,4,6,0,
5404   };
5405   if (shift < 0)
5406      v <<= -shift;
5407   else
5408      v >>= shift;
5409   STBI_ASSERT(v < 256);
5410   v >>= (8-bits);
5411   STBI_ASSERT(bits >= 0 && bits <= 8);
5412   return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
5413}
5414
5415typedef struct
5416{
5417   int bpp, offset, hsz;
5418   unsigned int mr,mg,mb,ma, all_a;
5419   int extra_read;
5420} stbi__bmp_data;
5421
5422static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
5423{
5424   // BI_BITFIELDS specifies masks explicitly, don't override
5425   if (compress == 3)
5426      return 1;
5427
5428   if (compress == 0) {
5429      if (info->bpp == 16) {
5430         info->mr = 31u << 10;
5431         info->mg = 31u <<  5;
5432         info->mb = 31u <<  0;
5433      } else if (info->bpp == 32) {
5434         info->mr = 0xffu << 16;
5435         info->mg = 0xffu <<  8;
5436         info->mb = 0xffu <<  0;
5437         info->ma = 0xffu << 24;
5438         info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5439      } else {
5440         // otherwise, use defaults, which is all-0
5441         info->mr = info->mg = info->mb = info->ma = 0;
5442      }
5443      return 1;
5444   }
5445   return 0; // error
5446}
5447
5448static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
5449{
5450   int hsz;
5451   if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
5452   stbi__get32le(s); // discard filesize
5453   stbi__get16le(s); // discard reserved
5454   stbi__get16le(s); // discard reserved
5455   info->offset = stbi__get32le(s);
5456   info->hsz = hsz = stbi__get32le(s);
5457   info->mr = info->mg = info->mb = info->ma = 0;
5458   info->extra_read = 14;
5459
5460   if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
5461
5462   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
5463   if (hsz == 12) {
5464      s->img_x = stbi__get16le(s);
5465      s->img_y = stbi__get16le(s);
5466   } else {
5467      s->img_x = stbi__get32le(s);
5468      s->img_y = stbi__get32le(s);
5469   }
5470   if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
5471   info->bpp = stbi__get16le(s);
5472   if (hsz != 12) {
5473      int compress = stbi__get32le(s);
5474      if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5475      if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
5476      if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
5477      stbi__get32le(s); // discard sizeof
5478      stbi__get32le(s); // discard hres
5479      stbi__get32le(s); // discard vres
5480      stbi__get32le(s); // discard colorsused
5481      stbi__get32le(s); // discard max important
5482      if (hsz == 40 || hsz == 56) {
5483         if (hsz == 56) {
5484            stbi__get32le(s);
5485            stbi__get32le(s);
5486            stbi__get32le(s);
5487            stbi__get32le(s);
5488         }
5489         if (info->bpp == 16 || info->bpp == 32) {
5490            if (compress == 0) {
5491               stbi__bmp_set_mask_defaults(info, compress);
5492            } else if (compress == 3) {
5493               info->mr = stbi__get32le(s);
5494               info->mg = stbi__get32le(s);
5495               info->mb = stbi__get32le(s);
5496               info->extra_read += 12;
5497               // not documented, but generated by photoshop and handled by mspaint
5498               if (info->mr == info->mg && info->mg == info->mb) {
5499                  // ?!?!?
5500                  return stbi__errpuc("bad BMP", "bad BMP");
5501               }
5502            } else
5503               return stbi__errpuc("bad BMP", "bad BMP");
5504         }
5505      } else {
5506         // V4/V5 header
5507         int i;
5508         if (hsz != 108 && hsz != 124)
5509            return stbi__errpuc("bad BMP", "bad BMP");
5510         info->mr = stbi__get32le(s);
5511         info->mg = stbi__get32le(s);
5512         info->mb = stbi__get32le(s);
5513         info->ma = stbi__get32le(s);
5514         if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
5515            stbi__bmp_set_mask_defaults(info, compress);
5516         stbi__get32le(s); // discard color space
5517         for (i=0; i < 12; ++i)
5518            stbi__get32le(s); // discard color space parameters
5519         if (hsz == 124) {
5520            stbi__get32le(s); // discard rendering intent
5521            stbi__get32le(s); // discard offset of profile data
5522            stbi__get32le(s); // discard size of profile data
5523            stbi__get32le(s); // discard reserved
5524         }
5525      }
5526   }
5527   return (void *) 1;
5528}
5529
5530
5531static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5532{
5533   stbi_uc *out;
5534   unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
5535   stbi_uc pal[256][4];
5536   int psize=0,i,j,width;
5537   int flip_vertically, pad, target;
5538   stbi__bmp_data info;
5539   STBI_NOTUSED(ri);
5540
5541   info.all_a = 255;
5542   if (stbi__bmp_parse_header(s, &info) == NULL)
5543      return NULL; // error code already set
5544
5545   flip_vertically = ((int) s->img_y) > 0;
5546   s->img_y = abs((int) s->img_y);
5547
5548   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5549   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5550
5551   mr = info.mr;
5552   mg = info.mg;
5553   mb = info.mb;
5554   ma = info.ma;
5555   all_a = info.all_a;
5556
5557   if (info.hsz == 12) {
5558      if (info.bpp < 24)
5559         psize = (info.offset - info.extra_read - 24) / 3;
5560   } else {
5561      if (info.bpp < 16)
5562         psize = (info.offset - info.extra_read - info.hsz) >> 2;
5563   }
5564   if (psize == 0) {
5565      // accept some number of extra bytes after the header, but if the offset points either to before
5566      // the header ends or implies a large amount of extra data, reject the file as malformed
5567      int bytes_read_so_far = s->callback_already_read + (int)(s->img_buffer - s->img_buffer_original);
5568      int header_limit = 1024; // max we actually read is below 256 bytes currently.
5569      int extra_data_limit = 256*4; // what ordinarily goes here is a palette; 256 entries*4 bytes is its max size.
5570      if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) {
5571         return stbi__errpuc("bad header", "Corrupt BMP");
5572      }
5573      // we established that bytes_read_so_far is positive and sensible.
5574      // the first half of this test rejects offsets that are either too small positives, or
5575      // negative, and guarantees that info.offset >= bytes_read_so_far > 0. this in turn
5576      // ensures the number computed in the second half of the test can't overflow.
5577      if (info.offset < bytes_read_so_far || info.offset - bytes_read_so_far > extra_data_limit) {
5578         return stbi__errpuc("bad offset", "Corrupt BMP");
5579      } else {
5580         stbi__skip(s, info.offset - bytes_read_so_far);
5581      }
5582   }
5583
5584   if (info.bpp == 24 && ma == 0xff000000)
5585      s->img_n = 3;
5586   else
5587      s->img_n = ma ? 4 : 3;
5588   if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5589      target = req_comp;
5590   else
5591      target = s->img_n; // if they want monochrome, we'll post-convert
5592
5593   // sanity-check size
5594   if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
5595      return stbi__errpuc("too large", "Corrupt BMP");
5596
5597   out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
5598   if (!out) return stbi__errpuc("outofmem", "Out of memory");
5599   if (info.bpp < 16) {
5600      int z=0;
5601      if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
5602      for (i=0; i < psize; ++i) {
5603         pal[i][2] = stbi__get8(s);
5604         pal[i][1] = stbi__get8(s);
5605         pal[i][0] = stbi__get8(s);
5606         if (info.hsz != 12) stbi__get8(s);
5607         pal[i][3] = 255;
5608      }
5609      stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5610      if (info.bpp == 1) width = (s->img_x + 7) >> 3;
5611      else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
5612      else if (info.bpp == 8) width = s->img_x;
5613      else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
5614      pad = (-width)&3;
5615      if (info.bpp == 1) {
5616         for (j=0; j < (int) s->img_y; ++j) {
5617            int bit_offset = 7, v = stbi__get8(s);
5618            for (i=0; i < (int) s->img_x; ++i) {
5619               int color = (v>>bit_offset)&0x1;
5620               out[z++] = pal[color][0];
5621               out[z++] = pal[color][1];
5622               out[z++] = pal[color][2];
5623               if (target == 4) out[z++] = 255;
5624               if (i+1 == (int) s->img_x) break;
5625               if((--bit_offset) < 0) {
5626                  bit_offset = 7;
5627                  v = stbi__get8(s);
5628               }
5629            }
5630            stbi__skip(s, pad);
5631         }
5632      } else {
5633         for (j=0; j < (int) s->img_y; ++j) {
5634            for (i=0; i < (int) s->img_x; i += 2) {
5635               int v=stbi__get8(s),v2=0;
5636               if (info.bpp == 4) {
5637                  v2 = v & 15;
5638                  v >>= 4;
5639               }
5640               out[z++] = pal[v][0];
5641               out[z++] = pal[v][1];
5642               out[z++] = pal[v][2];
5643               if (target == 4) out[z++] = 255;
5644               if (i+1 == (int) s->img_x) break;
5645               v = (info.bpp == 8) ? stbi__get8(s) : v2;
5646               out[z++] = pal[v][0];
5647               out[z++] = pal[v][1];
5648               out[z++] = pal[v][2];
5649               if (target == 4) out[z++] = 255;
5650            }
5651            stbi__skip(s, pad);
5652         }
5653      }
5654   } else {
5655      int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
5656      int z = 0;
5657      int easy=0;
5658      stbi__skip(s, info.offset - info.extra_read - info.hsz);
5659      if (info.bpp == 24) width = 3 * s->img_x;
5660      else if (info.bpp == 16) width = 2*s->img_x;
5661      else /* bpp = 32 and pad = 0 */ width=0;
5662      pad = (-width) & 3;
5663      if (info.bpp == 24) {
5664         easy = 1;
5665      } else if (info.bpp == 32) {
5666         if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5667            easy = 2;
5668      }
5669      if (!easy) {
5670         if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5671         // right shift amt to put high bit in position #7
5672         rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
5673         gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
5674         bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
5675         ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
5676         if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5677      }
5678      for (j=0; j < (int) s->img_y; ++j) {
5679         if (easy) {
5680            for (i=0; i < (int) s->img_x; ++i) {
5681               unsigned char a;
5682               out[z+2] = stbi__get8(s);
5683               out[z+1] = stbi__get8(s);
5684               out[z+0] = stbi__get8(s);
5685               z += 3;
5686               a = (easy == 2 ? stbi__get8(s) : 255);
5687               all_a |= a;
5688               if (target == 4) out[z++] = a;
5689            }
5690         } else {
5691            int bpp = info.bpp;
5692            for (i=0; i < (int) s->img_x; ++i) {
5693               stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
5694               unsigned int a;
5695               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5696               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5697               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5698               a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5699               all_a |= a;
5700               if (target == 4) out[z++] = STBI__BYTECAST(a);
5701            }
5702         }
5703         stbi__skip(s, pad);
5704      }
5705   }
5706
5707   // if alpha channel is all 0s, replace with all 255s
5708   if (target == 4 && all_a == 0)
5709      for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
5710         out[i] = 255;
5711
5712   if (flip_vertically) {
5713      stbi_uc t;
5714      for (j=0; j < (int) s->img_y>>1; ++j) {
5715         stbi_uc *p1 = out +      j     *s->img_x*target;
5716         stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
5717         for (i=0; i < (int) s->img_x*target; ++i) {
5718            t = p1[i]; p1[i] = p2[i]; p2[i] = t;
5719         }
5720      }
5721   }
5722
5723   if (req_comp && req_comp != target) {
5724      out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5725      if (out == NULL) return out; // stbi__convert_format frees input on failure
5726   }
5727
5728   *x = s->img_x;
5729   *y = s->img_y;
5730   if (comp) *comp = s->img_n;
5731   return out;
5732}
5733#endif
5734
5735// Targa Truevision - TGA
5736// by Jonathan Dummer
5737#ifndef STBI_NO_TGA
5738// returns STBI_rgb or whatever, 0 on error
5739static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
5740{
5741   // only RGB or RGBA (incl. 16bit) or grey allowed
5742   if (is_rgb16) *is_rgb16 = 0;
5743   switch(bits_per_pixel) {
5744      case 8:  return STBI_grey;
5745      case 16: if(is_grey) return STBI_grey_alpha;
5746               // fallthrough
5747      case 15: if(is_rgb16) *is_rgb16 = 1;
5748               return STBI_rgb;
5749      case 24: // fallthrough
5750      case 32: return bits_per_pixel/8;
5751      default: return 0;
5752   }
5753}
5754
5755static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
5756{
5757    int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5758    int sz, tga_colormap_type;
5759    stbi__get8(s);                   // discard Offset
5760    tga_colormap_type = stbi__get8(s); // colormap type
5761    if( tga_colormap_type > 1 ) {
5762        stbi__rewind(s);
5763        return 0;      // only RGB or indexed allowed
5764    }
5765    tga_image_type = stbi__get8(s); // image type
5766    if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
5767        if (tga_image_type != 1 && tga_image_type != 9) {
5768            stbi__rewind(s);
5769            return 0;
5770        }
5771        stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5772        sz = stbi__get8(s);    //   check bits per palette color entry
5773        if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
5774            stbi__rewind(s);
5775            return 0;
5776        }
5777        stbi__skip(s,4);       // skip image x and y origin
5778        tga_colormap_bpp = sz;
5779    } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5780        if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
5781            stbi__rewind(s);
5782            return 0; // only RGB or grey allowed, +/- RLE
5783        }
5784        stbi__skip(s,9); // skip colormap specification and image x/y origin
5785        tga_colormap_bpp = 0;
5786    }
5787    tga_w = stbi__get16le(s);
5788    if( tga_w < 1 ) {
5789        stbi__rewind(s);
5790        return 0;   // test width
5791    }
5792    tga_h = stbi__get16le(s);
5793    if( tga_h < 1 ) {
5794        stbi__rewind(s);
5795        return 0;   // test height
5796    }
5797    tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5798    stbi__get8(s); // ignore alpha bits
5799    if (tga_colormap_bpp != 0) {
5800        if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5801            // when using a colormap, tga_bits_per_pixel is the size of the indexes
5802            // I don't think anything but 8 or 16bit indexes makes sense
5803            stbi__rewind(s);
5804            return 0;
5805        }
5806        tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5807    } else {
5808        tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5809    }
5810    if(!tga_comp) {
5811      stbi__rewind(s);
5812      return 0;
5813    }
5814    if (x) *x = tga_w;
5815    if (y) *y = tga_h;
5816    if (comp) *comp = tga_comp;
5817    return 1;                   // seems to have passed everything
5818}
5819
5820static int stbi__tga_test(stbi__context *s)
5821{
5822   int res = 0;
5823   int sz, tga_color_type;
5824   stbi__get8(s);      //   discard Offset
5825   tga_color_type = stbi__get8(s);   //   color type
5826   if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
5827   sz = stbi__get8(s);   //   image type
5828   if ( tga_color_type == 1 ) { // colormapped (paletted) image
5829      if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5830      stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5831      sz = stbi__get8(s);    //   check bits per palette color entry
5832      if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5833      stbi__skip(s,4);       // skip image x and y origin
5834   } else { // "normal" image w/o colormap
5835      if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
5836      stbi__skip(s,9); // skip colormap specification and image x/y origin
5837   }
5838   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
5839   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
5840   sz = stbi__get8(s);   //   bits per pixel
5841   if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
5842   if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5843
5844   res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5845
5846errorEnd:
5847   stbi__rewind(s);
5848   return res;
5849}
5850
5851// read 16bit value and convert to 24bit RGB
5852static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5853{
5854   stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
5855   stbi__uint16 fiveBitMask = 31;
5856   // we have 3 channels with 5bits each
5857   int r = (px >> 10) & fiveBitMask;
5858   int g = (px >> 5) & fiveBitMask;
5859   int b = px & fiveBitMask;
5860   // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5861   out[0] = (stbi_uc)((r * 255)/31);
5862   out[1] = (stbi_uc)((g * 255)/31);
5863   out[2] = (stbi_uc)((b * 255)/31);
5864
5865   // some people claim that the most significant bit might be used for alpha
5866   // (possibly if an alpha-bit is set in the "image descriptor byte")
5867   // but that only made 16bit test images completely translucent..
5868   // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5869}
5870
5871static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5872{
5873   //   read in the TGA header stuff
5874   int tga_offset = stbi__get8(s);
5875   int tga_indexed = stbi__get8(s);
5876   int tga_image_type = stbi__get8(s);
5877   int tga_is_RLE = 0;
5878   int tga_palette_start = stbi__get16le(s);
5879   int tga_palette_len = stbi__get16le(s);
5880   int tga_palette_bits = stbi__get8(s);
5881   int tga_x_origin = stbi__get16le(s);
5882   int tga_y_origin = stbi__get16le(s);
5883   int tga_width = stbi__get16le(s);
5884   int tga_height = stbi__get16le(s);
5885   int tga_bits_per_pixel = stbi__get8(s);
5886   int tga_comp, tga_rgb16=0;
5887   int tga_inverted = stbi__get8(s);
5888   // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5889   //   image data
5890   unsigned char *tga_data;
5891   unsigned char *tga_palette = NULL;
5892   int i, j;
5893   unsigned char raw_data[4] = {0};
5894   int RLE_count = 0;
5895   int RLE_repeating = 0;
5896   int read_next_pixel = 1;
5897   STBI_NOTUSED(ri);
5898   STBI_NOTUSED(tga_x_origin); // @TODO
5899   STBI_NOTUSED(tga_y_origin); // @TODO
5900
5901   if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5902   if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5903
5904   //   do a tiny bit of precessing
5905   if ( tga_image_type >= 8 )
5906   {
5907      tga_image_type -= 8;
5908      tga_is_RLE = 1;
5909   }
5910   tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5911
5912   //   If I'm paletted, then I'll use the number of bits from the palette
5913   if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5914   else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5915
5916   if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5917      return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5918
5919   //   tga info
5920   *x = tga_width;
5921   *y = tga_height;
5922   if (comp) *comp = tga_comp;
5923
5924   if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
5925      return stbi__errpuc("too large", "Corrupt TGA");
5926
5927   tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
5928   if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5929
5930   // skip to the data's starting position (offset usually = 0)
5931   stbi__skip(s, tga_offset );
5932
5933   if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
5934      for (i=0; i < tga_height; ++i) {
5935         int row = tga_inverted ? tga_height -i - 1 : i;
5936         stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5937         stbi__getn(s, tga_row, tga_width * tga_comp);
5938      }
5939   } else  {
5940      //   do I need to load a palette?
5941      if ( tga_indexed)
5942      {
5943         if (tga_palette_len == 0) {  /* you have to have at least one entry! */
5944            STBI_FREE(tga_data);
5945            return stbi__errpuc("bad palette", "Corrupt TGA");
5946         }
5947
5948         //   any data to skip? (offset usually = 0)
5949         stbi__skip(s, tga_palette_start );
5950         //   load the palette
5951         tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
5952         if (!tga_palette) {
5953            STBI_FREE(tga_data);
5954            return stbi__errpuc("outofmem", "Out of memory");
5955         }
5956         if (tga_rgb16) {
5957            stbi_uc *pal_entry = tga_palette;
5958            STBI_ASSERT(tga_comp == STBI_rgb);
5959            for (i=0; i < tga_palette_len; ++i) {
5960               stbi__tga_read_rgb16(s, pal_entry);
5961               pal_entry += tga_comp;
5962            }
5963         } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5964               STBI_FREE(tga_data);
5965               STBI_FREE(tga_palette);
5966               return stbi__errpuc("bad palette", "Corrupt TGA");
5967         }
5968      }
5969      //   load the data
5970      for (i=0; i < tga_width * tga_height; ++i)
5971      {
5972         //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5973         if ( tga_is_RLE )
5974         {
5975            if ( RLE_count == 0 )
5976            {
5977               //   yep, get the next byte as a RLE command
5978               int RLE_cmd = stbi__get8(s);
5979               RLE_count = 1 + (RLE_cmd & 127);
5980               RLE_repeating = RLE_cmd >> 7;
5981               read_next_pixel = 1;
5982            } else if ( !RLE_repeating )
5983            {
5984               read_next_pixel = 1;
5985            }
5986         } else
5987         {
5988            read_next_pixel = 1;
5989         }
5990         //   OK, if I need to read a pixel, do it now
5991         if ( read_next_pixel )
5992         {
5993            //   load however much data we did have
5994            if ( tga_indexed )
5995            {
5996               // read in index, then perform the lookup
5997               int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5998               if ( pal_idx >= tga_palette_len ) {
5999                  // invalid index
6000                  pal_idx = 0;
6001               }
6002               pal_idx *= tga_comp;
6003               for (j = 0; j < tga_comp; ++j) {
6004                  raw_data[j] = tga_palette[pal_idx+j];
6005               }
6006            } else if(tga_rgb16) {
6007               STBI_ASSERT(tga_comp == STBI_rgb);
6008               stbi__tga_read_rgb16(s, raw_data);
6009            } else {
6010               //   read in the data raw
6011               for (j = 0; j < tga_comp; ++j) {
6012                  raw_data[j] = stbi__get8(s);
6013               }
6014            }
6015            //   clear the reading flag for the next pixel
6016            read_next_pixel = 0;
6017         } // end of reading a pixel
6018
6019         // copy data
6020         for (j = 0; j < tga_comp; ++j)
6021           tga_data[i*tga_comp+j] = raw_data[j];
6022
6023         //   in case we're in RLE mode, keep counting down
6024         --RLE_count;
6025      }
6026      //   do I need to invert the image?
6027      if ( tga_inverted )
6028      {
6029         for (j = 0; j*2 < tga_height; ++j)
6030         {
6031            int index1 = j * tga_width * tga_comp;
6032            int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
6033            for (i = tga_width * tga_comp; i > 0; --i)
6034            {
6035               unsigned char temp = tga_data[index1];
6036               tga_data[index1] = tga_data[index2];
6037               tga_data[index2] = temp;
6038               ++index1;
6039               ++index2;
6040            }
6041         }
6042      }
6043      //   clear my palette, if I had one
6044      if ( tga_palette != NULL )
6045      {
6046         STBI_FREE( tga_palette );
6047      }
6048   }
6049
6050   // swap RGB - if the source data was RGB16, it already is in the right order
6051   if (tga_comp >= 3 && !tga_rgb16)
6052   {
6053      unsigned char* tga_pixel = tga_data;
6054      for (i=0; i < tga_width * tga_height; ++i)
6055      {
6056         unsigned char temp = tga_pixel[0];
6057         tga_pixel[0] = tga_pixel[2];
6058         tga_pixel[2] = temp;
6059         tga_pixel += tga_comp;
6060      }
6061   }
6062
6063   // convert to target component count
6064   if (req_comp && req_comp != tga_comp)
6065      tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
6066
6067   //   the things I do to get rid of an error message, and yet keep
6068   //   Microsoft's C compilers happy... [8^(
6069   tga_palette_start = tga_palette_len = tga_palette_bits =
6070         tga_x_origin = tga_y_origin = 0;
6071   STBI_NOTUSED(tga_palette_start);
6072   //   OK, done
6073   return tga_data;
6074}
6075#endif
6076
6077// *************************************************************************************************
6078// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
6079
6080#ifndef STBI_NO_PSD
6081static int stbi__psd_test(stbi__context *s)
6082{
6083   int r = (stbi__get32be(s) == 0x38425053);
6084   stbi__rewind(s);
6085   return r;
6086}
6087
6088static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
6089{
6090   int count, nleft, len;
6091
6092   count = 0;
6093   while ((nleft = pixelCount - count) > 0) {
6094      len = stbi__get8(s);
6095      if (len == 128) {
6096         // No-op.
6097      } else if (len < 128) {
6098         // Copy next len+1 bytes literally.
6099         len++;
6100         if (len > nleft) return 0; // corrupt data
6101         count += len;
6102         while (len) {
6103            *p = stbi__get8(s);
6104            p += 4;
6105            len--;
6106         }
6107      } else if (len > 128) {
6108         stbi_uc   val;
6109         // Next -len+1 bytes in the dest are replicated from next source byte.
6110         // (Interpret len as a negative 8-bit int.)
6111         len = 257 - len;
6112         if (len > nleft) return 0; // corrupt data
6113         val = stbi__get8(s);
6114         count += len;
6115         while (len) {
6116            *p = val;
6117            p += 4;
6118            len--;
6119         }
6120      }
6121   }
6122
6123   return 1;
6124}
6125
6126static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
6127{
6128   int pixelCount;
6129   int channelCount, compression;
6130   int channel, i;
6131   int bitdepth;
6132   int w,h;
6133   stbi_uc *out;
6134   STBI_NOTUSED(ri);
6135
6136   // Check identifier
6137   if (stbi__get32be(s) != 0x38425053)   // "8BPS"
6138      return stbi__errpuc("not PSD", "Corrupt PSD image");
6139
6140   // Check file type version.
6141   if (stbi__get16be(s) != 1)
6142      return stbi__errpuc("wrong version", "Unsupported version of PSD image");
6143
6144   // Skip 6 reserved bytes.
6145   stbi__skip(s, 6 );
6146
6147   // Read the number of channels (R, G, B, A, etc).
6148   channelCount = stbi__get16be(s);
6149   if (channelCount < 0 || channelCount > 16)
6150      return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
6151
6152   // Read the rows and columns of the image.
6153   h = stbi__get32be(s);
6154   w = stbi__get32be(s);
6155
6156   if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6157   if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6158
6159   // Make sure the depth is 8 bits.
6160   bitdepth = stbi__get16be(s);
6161   if (bitdepth != 8 && bitdepth != 16)
6162      return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
6163
6164   // Make sure the color mode is RGB.
6165   // Valid options are:
6166   //   0: Bitmap
6167   //   1: Grayscale
6168   //   2: Indexed color
6169   //   3: RGB color
6170   //   4: CMYK color
6171   //   7: Multichannel
6172   //   8: Duotone
6173   //   9: Lab color
6174   if (stbi__get16be(s) != 3)
6175      return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
6176
6177   // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
6178   stbi__skip(s,stbi__get32be(s) );
6179
6180   // Skip the image resources.  (resolution, pen tool paths, etc)
6181   stbi__skip(s, stbi__get32be(s) );
6182
6183   // Skip the reserved data.
6184   stbi__skip(s, stbi__get32be(s) );
6185
6186   // Find out if the data is compressed.
6187   // Known values:
6188   //   0: no compression
6189   //   1: RLE compressed
6190   compression = stbi__get16be(s);
6191   if (compression > 1)
6192      return stbi__errpuc("bad compression", "PSD has an unknown compression format");
6193
6194   // Check size
6195   if (!stbi__mad3sizes_valid(4, w, h, 0))
6196      return stbi__errpuc("too large", "Corrupt PSD");
6197
6198   // Create the destination image.
6199
6200   if (!compression && bitdepth == 16 && bpc == 16) {
6201      out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
6202      ri->bits_per_channel = 16;
6203   } else
6204      out = (stbi_uc *) stbi__malloc(4 * w*h);
6205
6206   if (!out) return stbi__errpuc("outofmem", "Out of memory");
6207   pixelCount = w*h;
6208
6209   // Initialize the data to zero.
6210   //memset( out, 0, pixelCount * 4 );
6211
6212   // Finally, the image data.
6213   if (compression) {
6214      // RLE as used by .PSD and .TIFF
6215      // Loop until you get the number of unpacked bytes you are expecting:
6216      //     Read the next source byte into n.
6217      //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
6218      //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
6219      //     Else if n is 128, noop.
6220      // Endloop
6221
6222      // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
6223      // which we're going to just skip.
6224      stbi__skip(s, h * channelCount * 2 );
6225
6226      // Read the RLE data by channel.
6227      for (channel = 0; channel < 4; channel++) {
6228         stbi_uc *p;
6229
6230         p = out+channel;
6231         if (channel >= channelCount) {
6232            // Fill this channel with default data.
6233            for (i = 0; i < pixelCount; i++, p += 4)
6234               *p = (channel == 3 ? 255 : 0);
6235         } else {
6236            // Read the RLE data.
6237            if (!stbi__psd_decode_rle(s, p, pixelCount)) {
6238               STBI_FREE(out);
6239               return stbi__errpuc("corrupt", "bad RLE data");
6240            }
6241         }
6242      }
6243
6244   } else {
6245      // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
6246      // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
6247
6248      // Read the data by channel.
6249      for (channel = 0; channel < 4; channel++) {
6250         if (channel >= channelCount) {
6251            // Fill this channel with default data.
6252            if (bitdepth == 16 && bpc == 16) {
6253               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6254               stbi__uint16 val = channel == 3 ? 65535 : 0;
6255               for (i = 0; i < pixelCount; i++, q += 4)
6256                  *q = val;
6257            } else {
6258               stbi_uc *p = out+channel;
6259               stbi_uc val = channel == 3 ? 255 : 0;
6260               for (i = 0; i < pixelCount; i++, p += 4)
6261                  *p = val;
6262            }
6263         } else {
6264            if (ri->bits_per_channel == 16) {    // output bpc
6265               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6266               for (i = 0; i < pixelCount; i++, q += 4)
6267                  *q = (stbi__uint16) stbi__get16be(s);
6268            } else {
6269               stbi_uc *p = out+channel;
6270               if (bitdepth == 16) {  // input bpc
6271                  for (i = 0; i < pixelCount; i++, p += 4)
6272                     *p = (stbi_uc) (stbi__get16be(s) >> 8);
6273               } else {
6274                  for (i = 0; i < pixelCount; i++, p += 4)
6275                     *p = stbi__get8(s);
6276               }
6277            }
6278         }
6279      }
6280   }
6281
6282   // remove weird white matte from PSD
6283   if (channelCount >= 4) {
6284      if (ri->bits_per_channel == 16) {
6285         for (i=0; i < w*h; ++i) {
6286            stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
6287            if (pixel[3] != 0 && pixel[3] != 65535) {
6288               float a = pixel[3] / 65535.0f;
6289               float ra = 1.0f / a;
6290               float inv_a = 65535.0f * (1 - ra);
6291               pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
6292               pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
6293               pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
6294            }
6295         }
6296      } else {
6297         for (i=0; i < w*h; ++i) {
6298            unsigned char *pixel = out + 4*i;
6299            if (pixel[3] != 0 && pixel[3] != 255) {
6300               float a = pixel[3] / 255.0f;
6301               float ra = 1.0f / a;
6302               float inv_a = 255.0f * (1 - ra);
6303               pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
6304               pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
6305               pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
6306            }
6307         }
6308      }
6309   }
6310
6311   // convert to desired output format
6312   if (req_comp && req_comp != 4) {
6313      if (ri->bits_per_channel == 16)
6314         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
6315      else
6316         out = stbi__convert_format(out, 4, req_comp, w, h);
6317      if (out == NULL) return out; // stbi__convert_format frees input on failure
6318   }
6319
6320   if (comp) *comp = 4;
6321   *y = h;
6322   *x = w;
6323
6324   return out;
6325}
6326#endif
6327
6328// *************************************************************************************************
6329// Softimage PIC loader
6330// by Tom Seddon
6331//
6332// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
6333// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
6334
6335#ifndef STBI_NO_PIC
6336static int stbi__pic_is4(stbi__context *s,const char *str)
6337{
6338   int i;
6339   for (i=0; i<4; ++i)
6340      if (stbi__get8(s) != (stbi_uc)str[i])
6341         return 0;
6342
6343   return 1;
6344}
6345
6346static int stbi__pic_test_core(stbi__context *s)
6347{
6348   int i;
6349
6350   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
6351      return 0;
6352
6353   for(i=0;i<84;++i)
6354      stbi__get8(s);
6355
6356   if (!stbi__pic_is4(s,"PICT"))
6357      return 0;
6358
6359   return 1;
6360}
6361
6362typedef struct
6363{
6364   stbi_uc size,type,channel;
6365} stbi__pic_packet;
6366
6367static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
6368{
6369   int mask=0x80, i;
6370
6371   for (i=0; i<4; ++i, mask>>=1) {
6372      if (channel & mask) {
6373         if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
6374         dest[i]=stbi__get8(s);
6375      }
6376   }
6377
6378   return dest;
6379}
6380
6381static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
6382{
6383   int mask=0x80,i;
6384
6385   for (i=0;i<4; ++i, mask>>=1)
6386      if (channel&mask)
6387         dest[i]=src[i];
6388}
6389
6390static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
6391{
6392   int act_comp=0,num_packets=0,y,chained;
6393   stbi__pic_packet packets[10];
6394
6395   // this will (should...) cater for even some bizarre stuff like having data
6396    // for the same channel in multiple packets.
6397   do {
6398      stbi__pic_packet *packet;
6399
6400      if (num_packets==sizeof(packets)/sizeof(packets[0]))
6401         return stbi__errpuc("bad format","too many packets");
6402
6403      packet = &packets[num_packets++];
6404
6405      chained = stbi__get8(s);
6406      packet->size    = stbi__get8(s);
6407      packet->type    = stbi__get8(s);
6408      packet->channel = stbi__get8(s);
6409
6410      act_comp |= packet->channel;
6411
6412      if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
6413      if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
6414   } while (chained);
6415
6416   *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
6417
6418   for(y=0; y<height; ++y) {
6419      int packet_idx;
6420
6421      for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
6422         stbi__pic_packet *packet = &packets[packet_idx];
6423         stbi_uc *dest = result+y*width*4;
6424
6425         switch (packet->type) {
6426            default:
6427               return stbi__errpuc("bad format","packet has bad compression type");
6428
6429            case 0: {//uncompressed
6430               int x;
6431
6432               for(x=0;x<width;++x, dest+=4)
6433                  if (!stbi__readval(s,packet->channel,dest))
6434                     return 0;
6435               break;
6436            }
6437
6438            case 1://Pure RLE
6439               {
6440                  int left=width, i;
6441
6442                  while (left>0) {
6443                     stbi_uc count,value[4];
6444
6445                     count=stbi__get8(s);
6446                     if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
6447
6448                     if (count > left)
6449                        count = (stbi_uc) left;
6450
6451                     if (!stbi__readval(s,packet->channel,value))  return 0;
6452
6453                     for(i=0; i<count; ++i,dest+=4)
6454                        stbi__copyval(packet->channel,dest,value);
6455                     left -= count;
6456                  }
6457               }
6458               break;
6459
6460            case 2: {//Mixed RLE
6461               int left=width;
6462               while (left>0) {
6463                  int count = stbi__get8(s), i;
6464                  if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
6465
6466                  if (count >= 128) { // Repeated
6467                     stbi_uc value[4];
6468
6469                     if (count==128)
6470                        count = stbi__get16be(s);
6471                     else
6472                        count -= 127;
6473                     if (count > left)
6474                        return stbi__errpuc("bad file","scanline overrun");
6475
6476                     if (!stbi__readval(s,packet->channel,value))
6477                        return 0;
6478
6479                     for(i=0;i<count;++i, dest += 4)
6480                        stbi__copyval(packet->channel,dest,value);
6481                  } else { // Raw
6482                     ++count;
6483                     if (count>left) return stbi__errpuc("bad file","scanline overrun");
6484
6485                     for(i=0;i<count;++i, dest+=4)
6486                        if (!stbi__readval(s,packet->channel,dest))
6487                           return 0;
6488                  }
6489                  left-=count;
6490               }
6491               break;
6492            }
6493         }
6494      }
6495   }
6496
6497   return result;
6498}
6499
6500static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
6501{
6502   stbi_uc *result;
6503   int i, x,y, internal_comp;
6504   STBI_NOTUSED(ri);
6505
6506   if (!comp) comp = &internal_comp;
6507
6508   for (i=0; i<92; ++i)
6509      stbi__get8(s);
6510
6511   x = stbi__get16be(s);
6512   y = stbi__get16be(s);
6513
6514   if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6515   if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6516
6517   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
6518   if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
6519
6520   stbi__get32be(s); //skip `ratio'
6521   stbi__get16be(s); //skip `fields'
6522   stbi__get16be(s); //skip `pad'
6523
6524   // intermediate buffer is RGBA
6525   result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
6526   if (!result) return stbi__errpuc("outofmem", "Out of memory");
6527   memset(result, 0xff, x*y*4);
6528
6529   if (!stbi__pic_load_core(s,x,y,comp, result)) {
6530      STBI_FREE(result);
6531      result=0;
6532   }
6533   *px = x;
6534   *py = y;
6535   if (req_comp == 0) req_comp = *comp;
6536   result=stbi__convert_format(result,4,req_comp,x,y);
6537
6538   return result;
6539}
6540
6541static int stbi__pic_test(stbi__context *s)
6542{
6543   int r = stbi__pic_test_core(s);
6544   stbi__rewind(s);
6545   return r;
6546}
6547#endif
6548
6549// *************************************************************************************************
6550// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6551
6552#ifndef STBI_NO_GIF
6553typedef struct
6554{
6555   stbi__int16 prefix;
6556   stbi_uc first;
6557   stbi_uc suffix;
6558} stbi__gif_lzw;
6559
6560typedef struct
6561{
6562   int w,h;
6563   stbi_uc *out;                 // output buffer (always 4 components)
6564   stbi_uc *background;          // The current "background" as far as a gif is concerned
6565   stbi_uc *history;
6566   int flags, bgindex, ratio, transparent, eflags;
6567   stbi_uc  pal[256][4];
6568   stbi_uc lpal[256][4];
6569   stbi__gif_lzw codes[8192];
6570   stbi_uc *color_table;
6571   int parse, step;
6572   int lflags;
6573   int start_x, start_y;
6574   int max_x, max_y;
6575   int cur_x, cur_y;
6576   int line_size;
6577   int delay;
6578} stbi__gif;
6579
6580static int stbi__gif_test_raw(stbi__context *s)
6581{
6582   int sz;
6583   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
6584   sz = stbi__get8(s);
6585   if (sz != '9' && sz != '7') return 0;
6586   if (stbi__get8(s) != 'a') return 0;
6587   return 1;
6588}
6589
6590static int stbi__gif_test(stbi__context *s)
6591{
6592   int r = stbi__gif_test_raw(s);
6593   stbi__rewind(s);
6594   return r;
6595}
6596
6597static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
6598{
6599   int i;
6600   for (i=0; i < num_entries; ++i) {
6601      pal[i][2] = stbi__get8(s);
6602      pal[i][1] = stbi__get8(s);
6603      pal[i][0] = stbi__get8(s);
6604      pal[i][3] = transp == i ? 0 : 255;
6605   }
6606}
6607
6608static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
6609{
6610   stbi_uc version;
6611   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6612      return stbi__err("not GIF", "Corrupt GIF");
6613
6614   version = stbi__get8(s);
6615   if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
6616   if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
6617
6618   stbi__g_failure_reason = "";
6619   g->w = stbi__get16le(s);
6620   g->h = stbi__get16le(s);
6621   g->flags = stbi__get8(s);
6622   g->bgindex = stbi__get8(s);
6623   g->ratio = stbi__get8(s);
6624   g->transparent = -1;
6625
6626   if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6627   if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6628
6629   if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
6630
6631   if (is_info) return 1;
6632
6633   if (g->flags & 0x80)
6634      stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
6635
6636   return 1;
6637}
6638
6639static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
6640{
6641   stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6642   if (!g) return stbi__err("outofmem", "Out of memory");
6643   if (!stbi__gif_header(s, g, comp, 1)) {
6644      STBI_FREE(g);
6645      stbi__rewind( s );
6646      return 0;
6647   }
6648   if (x) *x = g->w;
6649   if (y) *y = g->h;
6650   STBI_FREE(g);
6651   return 1;
6652}
6653
6654static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
6655{
6656   stbi_uc *p, *c;
6657   int idx;
6658
6659   // recurse to decode the prefixes, since the linked-list is backwards,
6660   // and working backwards through an interleaved image would be nasty
6661   if (g->codes[code].prefix >= 0)
6662      stbi__out_gif_code(g, g->codes[code].prefix);
6663
6664   if (g->cur_y >= g->max_y) return;
6665
6666   idx = g->cur_x + g->cur_y;
6667   p = &g->out[idx];
6668   g->history[idx / 4] = 1;
6669
6670   c = &g->color_table[g->codes[code].suffix * 4];
6671   if (c[3] > 128) { // don't render transparent pixels;
6672      p[0] = c[2];
6673      p[1] = c[1];
6674      p[2] = c[0];
6675      p[3] = c[3];
6676   }
6677   g->cur_x += 4;
6678
6679   if (g->cur_x >= g->max_x) {
6680      g->cur_x = g->start_x;
6681      g->cur_y += g->step;
6682
6683      while (g->cur_y >= g->max_y && g->parse > 0) {
6684         g->step = (1 << g->parse) * g->line_size;
6685         g->cur_y = g->start_y + (g->step >> 1);
6686         --g->parse;
6687      }
6688   }
6689}
6690
6691static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
6692{
6693   stbi_uc lzw_cs;
6694   stbi__int32 len, init_code;
6695   stbi__uint32 first;
6696   stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6697   stbi__gif_lzw *p;
6698
6699   lzw_cs = stbi__get8(s);
6700   if (lzw_cs > 12) return NULL;
6701   clear = 1 << lzw_cs;
6702   first = 1;
6703   codesize = lzw_cs + 1;
6704   codemask = (1 << codesize) - 1;
6705   bits = 0;
6706   valid_bits = 0;
6707   for (init_code = 0; init_code < clear; init_code++) {
6708      g->codes[init_code].prefix = -1;
6709      g->codes[init_code].first = (stbi_uc) init_code;
6710      g->codes[init_code].suffix = (stbi_uc) init_code;
6711   }
6712
6713   // support no starting clear code
6714   avail = clear+2;
6715   oldcode = -1;
6716
6717   len = 0;
6718   for(;;) {
6719      if (valid_bits < codesize) {
6720         if (len == 0) {
6721            len = stbi__get8(s); // start new block
6722            if (len == 0)
6723               return g->out;
6724         }
6725         --len;
6726         bits |= (stbi__int32) stbi__get8(s) << valid_bits;
6727         valid_bits += 8;
6728      } else {
6729         stbi__int32 code = bits & codemask;
6730         bits >>= codesize;
6731         valid_bits -= codesize;
6732         // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6733         if (code == clear) {  // clear code
6734            codesize = lzw_cs + 1;
6735            codemask = (1 << codesize) - 1;
6736            avail = clear + 2;
6737            oldcode = -1;
6738            first = 0;
6739         } else if (code == clear + 1) { // end of stream code
6740            stbi__skip(s, len);
6741            while ((len = stbi__get8(s)) > 0)
6742               stbi__skip(s,len);
6743            return g->out;
6744         } else if (code <= avail) {
6745            if (first) {
6746               return stbi__errpuc("no clear code", "Corrupt GIF");
6747            }
6748
6749            if (oldcode >= 0) {
6750               p = &g->codes[avail++];
6751               if (avail > 8192) {
6752                  return stbi__errpuc("too many codes", "Corrupt GIF");
6753               }
6754
6755               p->prefix = (stbi__int16) oldcode;
6756               p->first = g->codes[oldcode].first;
6757               p->suffix = (code == avail) ? p->first : g->codes[code].first;
6758            } else if (code == avail)
6759               return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6760
6761            stbi__out_gif_code(g, (stbi__uint16) code);
6762
6763            if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6764               codesize++;
6765               codemask = (1 << codesize) - 1;
6766            }
6767
6768            oldcode = code;
6769         } else {
6770            return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6771         }
6772      }
6773   }
6774}
6775
6776// this function is designed to support animated gifs, although stb_image doesn't support it
6777// two back is the image from two frames ago, used for a very specific disposal format
6778static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
6779{
6780   int dispose;
6781   int first_frame;
6782   int pi;
6783   int pcount;
6784   STBI_NOTUSED(req_comp);
6785
6786   // on first frame, any non-written pixels get the background colour (non-transparent)
6787   first_frame = 0;
6788   if (g->out == 0) {
6789      if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
6790      if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
6791         return stbi__errpuc("too large", "GIF image is too large");
6792      pcount = g->w * g->h;
6793      g->out = (stbi_uc *) stbi__malloc(4 * pcount);
6794      g->background = (stbi_uc *) stbi__malloc(4 * pcount);
6795      g->history = (stbi_uc *) stbi__malloc(pcount);
6796      if (!g->out || !g->background || !g->history)
6797         return stbi__errpuc("outofmem", "Out of memory");
6798
6799      // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
6800      // background colour is only used for pixels that are not rendered first frame, after that "background"
6801      // color refers to the color that was there the previous frame.
6802      memset(g->out, 0x00, 4 * pcount);
6803      memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
6804      memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
6805      first_frame = 1;
6806   } else {
6807      // second frame - how do we dispose of the previous one?
6808      dispose = (g->eflags & 0x1C) >> 2;
6809      pcount = g->w * g->h;
6810
6811      if ((dispose == 3) && (two_back == 0)) {
6812         dispose = 2; // if I don't have an image to revert back to, default to the old background
6813      }
6814
6815      if (dispose == 3) { // use previous graphic
6816         for (pi = 0; pi < pcount; ++pi) {
6817            if (g->history[pi]) {
6818               memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
6819            }
6820         }
6821      } else if (dispose == 2) {
6822         // restore what was changed last frame to background before that frame;
6823         for (pi = 0; pi < pcount; ++pi) {
6824            if (g->history[pi]) {
6825               memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
6826            }
6827         }
6828      } else {
6829         // This is a non-disposal case eithe way, so just
6830         // leave the pixels as is, and they will become the new background
6831         // 1: do not dispose
6832         // 0:  not specified.
6833      }
6834
6835      // background is what out is after the undoing of the previou frame;
6836      memcpy( g->background, g->out, 4 * g->w * g->h );
6837   }
6838
6839   // clear my history;
6840   memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
6841
6842   for (;;) {
6843      int tag = stbi__get8(s);
6844      switch (tag) {
6845         case 0x2C: /* Image Descriptor */
6846         {
6847            stbi__int32 x, y, w, h;
6848            stbi_uc *o;
6849
6850            x = stbi__get16le(s);
6851            y = stbi__get16le(s);
6852            w = stbi__get16le(s);
6853            h = stbi__get16le(s);
6854            if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6855               return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6856
6857            g->line_size = g->w * 4;
6858            g->start_x = x * 4;
6859            g->start_y = y * g->line_size;
6860            g->max_x   = g->start_x + w * 4;
6861            g->max_y   = g->start_y + h * g->line_size;
6862            g->cur_x   = g->start_x;
6863            g->cur_y   = g->start_y;
6864
6865            // if the width of the specified rectangle is 0, that means
6866            // we may not see *any* pixels or the image is malformed;
6867            // to make sure this is caught, move the current y down to
6868            // max_y (which is what out_gif_code checks).
6869            if (w == 0)
6870               g->cur_y = g->max_y;
6871
6872            g->lflags = stbi__get8(s);
6873
6874            if (g->lflags & 0x40) {
6875               g->step = 8 * g->line_size; // first interlaced spacing
6876               g->parse = 3;
6877            } else {
6878               g->step = g->line_size;
6879               g->parse = 0;
6880            }
6881
6882            if (g->lflags & 0x80) {
6883               stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6884               g->color_table = (stbi_uc *) g->lpal;
6885            } else if (g->flags & 0x80) {
6886               g->color_table = (stbi_uc *) g->pal;
6887            } else
6888               return stbi__errpuc("missing color table", "Corrupt GIF");
6889
6890            o = stbi__process_gif_raster(s, g);
6891            if (!o) return NULL;
6892
6893            // if this was the first frame,
6894            pcount = g->w * g->h;
6895            if (first_frame && (g->bgindex > 0)) {
6896               // if first frame, any pixel not drawn to gets the background color
6897               for (pi = 0; pi < pcount; ++pi) {
6898                  if (g->history[pi] == 0) {
6899                     g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
6900                     memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
6901                  }
6902               }
6903            }
6904
6905            return o;
6906         }
6907
6908         case 0x21: // Comment Extension.
6909         {
6910            int len;
6911            int ext = stbi__get8(s);
6912            if (ext == 0xF9) { // Graphic Control Extension.
6913               len = stbi__get8(s);
6914               if (len == 4) {
6915                  g->eflags = stbi__get8(s);
6916                  g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
6917
6918                  // unset old transparent
6919                  if (g->transparent >= 0) {
6920                     g->pal[g->transparent][3] = 255;
6921                  }
6922                  if (g->eflags & 0x01) {
6923                     g->transparent = stbi__get8(s);
6924                     if (g->transparent >= 0) {
6925                        g->pal[g->transparent][3] = 0;
6926                     }
6927                  } else {
6928                     // don't need transparent
6929                     stbi__skip(s, 1);
6930                     g->transparent = -1;
6931                  }
6932               } else {
6933                  stbi__skip(s, len);
6934                  break;
6935               }
6936            }
6937            while ((len = stbi__get8(s)) != 0) {
6938               stbi__skip(s, len);
6939            }
6940            break;
6941         }
6942
6943         case 0x3B: // gif stream termination code
6944            return (stbi_uc *) s; // using '1' causes warning on some compilers
6945
6946         default:
6947            return stbi__errpuc("unknown code", "Corrupt GIF");
6948      }
6949   }
6950}
6951
6952static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
6953{
6954   STBI_FREE(g->out);
6955   STBI_FREE(g->history);
6956   STBI_FREE(g->background);
6957
6958   if (out) STBI_FREE(out);
6959   if (delays && *delays) STBI_FREE(*delays);
6960   return stbi__errpuc("outofmem", "Out of memory");
6961}
6962
6963static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
6964{
6965   if (stbi__gif_test(s)) {
6966      int layers = 0;
6967      stbi_uc *u = 0;
6968      stbi_uc *out = 0;
6969      stbi_uc *two_back = 0;
6970      stbi__gif g;
6971      int stride;
6972      int out_size = 0;
6973      int delays_size = 0;
6974
6975      STBI_NOTUSED(out_size);
6976      STBI_NOTUSED(delays_size);
6977
6978      memset(&g, 0, sizeof(g));
6979      if (delays) {
6980         *delays = 0;
6981      }
6982
6983      do {
6984         u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
6985         if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6986
6987         if (u) {
6988            *x = g.w;
6989            *y = g.h;
6990            ++layers;
6991            stride = g.w * g.h * 4;
6992
6993            if (out) {
6994               void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
6995               if (!tmp)
6996                  return stbi__load_gif_main_outofmem(&g, out, delays);
6997               else {
6998                   out = (stbi_uc*) tmp;
6999                   out_size = layers * stride;
7000               }
7001
7002               if (delays) {
7003                  int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
7004                  if (!new_delays)
7005                     return stbi__load_gif_main_outofmem(&g, out, delays);
7006                  *delays = new_delays;
7007                  delays_size = layers * sizeof(int);
7008               }
7009            } else {
7010               out = (stbi_uc*)stbi__malloc( layers * stride );
7011               if (!out)
7012                  return stbi__load_gif_main_outofmem(&g, out, delays);
7013               out_size = layers * stride;
7014               if (delays) {
7015                  *delays = (int*) stbi__malloc( layers * sizeof(int) );
7016                  if (!*delays)
7017                     return stbi__load_gif_main_outofmem(&g, out, delays);
7018                  delays_size = layers * sizeof(int);
7019               }
7020            }
7021            memcpy( out + ((layers - 1) * stride), u, stride );
7022            if (layers >= 2) {
7023               two_back = out - 2 * stride;
7024            }
7025
7026            if (delays) {
7027               (*delays)[layers - 1U] = g.delay;
7028            }
7029         }
7030      } while (u != 0);
7031
7032      // free temp buffer;
7033      STBI_FREE(g.out);
7034      STBI_FREE(g.history);
7035      STBI_FREE(g.background);
7036
7037      // do the final conversion after loading everything;
7038      if (req_comp && req_comp != 4)
7039         out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
7040
7041      *z = layers;
7042      return out;
7043   } else {
7044      return stbi__errpuc("not GIF", "Image was not as a gif type.");
7045   }
7046}
7047
7048static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7049{
7050   stbi_uc *u = 0;
7051   stbi__gif g;
7052   memset(&g, 0, sizeof(g));
7053   STBI_NOTUSED(ri);
7054
7055   u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
7056   if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
7057   if (u) {
7058      *x = g.w;
7059      *y = g.h;
7060
7061      // moved conversion to after successful load so that the same
7062      // can be done for multiple frames.
7063      if (req_comp && req_comp != 4)
7064         u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
7065   } else if (g.out) {
7066      // if there was an error and we allocated an image buffer, free it!
7067      STBI_FREE(g.out);
7068   }
7069
7070   // free buffers needed for multiple frame loading;
7071   STBI_FREE(g.history);
7072   STBI_FREE(g.background);
7073
7074   return u;
7075}
7076
7077static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
7078{
7079   return stbi__gif_info_raw(s,x,y,comp);
7080}
7081#endif
7082
7083// *************************************************************************************************
7084// Radiance RGBE HDR loader
7085// originally by Nicolas Schulz
7086#ifndef STBI_NO_HDR
7087static int stbi__hdr_test_core(stbi__context *s, const char *signature)
7088{
7089   int i;
7090   for (i=0; signature[i]; ++i)
7091      if (stbi__get8(s) != signature[i])
7092          return 0;
7093   stbi__rewind(s);
7094   return 1;
7095}
7096
7097static int stbi__hdr_test(stbi__context* s)
7098{
7099   int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
7100   stbi__rewind(s);
7101   if(!r) {
7102       r = stbi__hdr_test_core(s, "#?RGBE\n");
7103       stbi__rewind(s);
7104   }
7105   return r;
7106}
7107
7108#define STBI__HDR_BUFLEN  1024
7109static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
7110{
7111   int len=0;
7112   char c = '\0';
7113
7114   c = (char) stbi__get8(z);
7115
7116   while (!stbi__at_eof(z) && c != '\n') {
7117      buffer[len++] = c;
7118      if (len == STBI__HDR_BUFLEN-1) {
7119         // flush to end of line
7120         while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
7121            ;
7122         break;
7123      }
7124      c = (char) stbi__get8(z);
7125   }
7126
7127   buffer[len] = 0;
7128   return buffer;
7129}
7130
7131static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
7132{
7133   if ( input[3] != 0 ) {
7134      float f1;
7135      // Exponent
7136      f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
7137      if (req_comp <= 2)
7138         output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
7139      else {
7140         output[0] = input[0] * f1;
7141         output[1] = input[1] * f1;
7142         output[2] = input[2] * f1;
7143      }
7144      if (req_comp == 2) output[1] = 1;
7145      if (req_comp == 4) output[3] = 1;
7146   } else {
7147      switch (req_comp) {
7148         case 4: output[3] = 1; /* fallthrough */
7149         case 3: output[0] = output[1] = output[2] = 0;
7150                 break;
7151         case 2: output[1] = 1; /* fallthrough */
7152         case 1: output[0] = 0;
7153                 break;
7154      }
7155   }
7156}
7157
7158static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7159{
7160   char buffer[STBI__HDR_BUFLEN];
7161   char *token;
7162   int valid = 0;
7163   int width, height;
7164   stbi_uc *scanline;
7165   float *hdr_data;
7166   int len;
7167   unsigned char count, value;
7168   int i, j, k, c1,c2, z;
7169   const char *headerToken;
7170   STBI_NOTUSED(ri);
7171
7172   // Check identifier
7173   headerToken = stbi__hdr_gettoken(s,buffer);
7174   if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
7175      return stbi__errpf("not HDR", "Corrupt HDR image");
7176
7177   // Parse header
7178   for(;;) {
7179      token = stbi__hdr_gettoken(s,buffer);
7180      if (token[0] == 0) break;
7181      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7182   }
7183
7184   if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
7185
7186   // Parse width and height
7187   // can't use sscanf() if we're not using stdio!
7188   token = stbi__hdr_gettoken(s,buffer);
7189   if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7190   token += 3;
7191   height = (int) strtol(token, &token, 10);
7192   while (*token == ' ') ++token;
7193   if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7194   token += 3;
7195   width = (int) strtol(token, NULL, 10);
7196
7197   if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7198   if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7199
7200   *x = width;
7201   *y = height;
7202
7203   if (comp) *comp = 3;
7204   if (req_comp == 0) req_comp = 3;
7205
7206   if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
7207      return stbi__errpf("too large", "HDR image is too large");
7208
7209   // Read data
7210   hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
7211   if (!hdr_data)
7212      return stbi__errpf("outofmem", "Out of memory");
7213
7214   // Load image data
7215   // image data is stored as some number of sca
7216   if ( width < 8 || width >= 32768) {
7217      // Read flat data
7218      for (j=0; j < height; ++j) {
7219         for (i=0; i < width; ++i) {
7220            stbi_uc rgbe[4];
7221           main_decode_loop:
7222            stbi__getn(s, rgbe, 4);
7223            stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
7224         }
7225      }
7226   } else {
7227      // Read RLE-encoded data
7228      scanline = NULL;
7229
7230      for (j = 0; j < height; ++j) {
7231         c1 = stbi__get8(s);
7232         c2 = stbi__get8(s);
7233         len = stbi__get8(s);
7234         if (c1 != 2 || c2 != 2 || (len & 0x80)) {
7235            // not run-length encoded, so we have to actually use THIS data as a decoded
7236            // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
7237            stbi_uc rgbe[4];
7238            rgbe[0] = (stbi_uc) c1;
7239            rgbe[1] = (stbi_uc) c2;
7240            rgbe[2] = (stbi_uc) len;
7241            rgbe[3] = (stbi_uc) stbi__get8(s);
7242            stbi__hdr_convert(hdr_data, rgbe, req_comp);
7243            i = 1;
7244            j = 0;
7245            STBI_FREE(scanline);
7246            goto main_decode_loop; // yes, this makes no sense
7247         }
7248         len <<= 8;
7249         len |= stbi__get8(s);
7250         if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
7251         if (scanline == NULL) {
7252            scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
7253            if (!scanline) {
7254               STBI_FREE(hdr_data);
7255               return stbi__errpf("outofmem", "Out of memory");
7256            }
7257         }
7258
7259         for (k = 0; k < 4; ++k) {
7260            int nleft;
7261            i = 0;
7262            while ((nleft = width - i) > 0) {
7263               count = stbi__get8(s);
7264               if (count > 128) {
7265                  // Run
7266                  value = stbi__get8(s);
7267                  count -= 128;
7268                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7269                  for (z = 0; z < count; ++z)
7270                     scanline[i++ * 4 + k] = value;
7271               } else {
7272                  // Dump
7273                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7274                  for (z = 0; z < count; ++z)
7275                     scanline[i++ * 4 + k] = stbi__get8(s);
7276               }
7277            }
7278         }
7279         for (i=0; i < width; ++i)
7280            stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
7281      }
7282      if (scanline)
7283         STBI_FREE(scanline);
7284   }
7285
7286   return hdr_data;
7287}
7288
7289static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
7290{
7291   char buffer[STBI__HDR_BUFLEN];
7292   char *token;
7293   int valid = 0;
7294   int dummy;
7295
7296   if (!x) x = &dummy;
7297   if (!y) y = &dummy;
7298   if (!comp) comp = &dummy;
7299
7300   if (stbi__hdr_test(s) == 0) {
7301       stbi__rewind( s );
7302       return 0;
7303   }
7304
7305   for(;;) {
7306      token = stbi__hdr_gettoken(s,buffer);
7307      if (token[0] == 0) break;
7308      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7309   }
7310
7311   if (!valid) {
7312       stbi__rewind( s );
7313       return 0;
7314   }
7315   token = stbi__hdr_gettoken(s,buffer);
7316   if (strncmp(token, "-Y ", 3)) {
7317       stbi__rewind( s );
7318       return 0;
7319   }
7320   token += 3;
7321   *y = (int) strtol(token, &token, 10);
7322   while (*token == ' ') ++token;
7323   if (strncmp(token, "+X ", 3)) {
7324       stbi__rewind( s );
7325       return 0;
7326   }
7327   token += 3;
7328   *x = (int) strtol(token, NULL, 10);
7329   *comp = 3;
7330   return 1;
7331}
7332#endif // STBI_NO_HDR
7333
7334#ifndef STBI_NO_BMP
7335static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
7336{
7337   void *p;
7338   stbi__bmp_data info;
7339
7340   info.all_a = 255;
7341   p = stbi__bmp_parse_header(s, &info);
7342   if (p == NULL) {
7343      stbi__rewind( s );
7344      return 0;
7345   }
7346   if (x) *x = s->img_x;
7347   if (y) *y = s->img_y;
7348   if (comp) {
7349      if (info.bpp == 24 && info.ma == 0xff000000)
7350         *comp = 3;
7351      else
7352         *comp = info.ma ? 4 : 3;
7353   }
7354   return 1;
7355}
7356#endif
7357
7358#ifndef STBI_NO_PSD
7359static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
7360{
7361   int channelCount, dummy, depth;
7362   if (!x) x = &dummy;
7363   if (!y) y = &dummy;
7364   if (!comp) comp = &dummy;
7365   if (stbi__get32be(s) != 0x38425053) {
7366       stbi__rewind( s );
7367       return 0;
7368   }
7369   if (stbi__get16be(s) != 1) {
7370       stbi__rewind( s );
7371       return 0;
7372   }
7373   stbi__skip(s, 6);
7374   channelCount = stbi__get16be(s);
7375   if (channelCount < 0 || channelCount > 16) {
7376       stbi__rewind( s );
7377       return 0;
7378   }
7379   *y = stbi__get32be(s);
7380   *x = stbi__get32be(s);
7381   depth = stbi__get16be(s);
7382   if (depth != 8 && depth != 16) {
7383       stbi__rewind( s );
7384       return 0;
7385   }
7386   if (stbi__get16be(s) != 3) {
7387       stbi__rewind( s );
7388       return 0;
7389   }
7390   *comp = 4;
7391   return 1;
7392}
7393
7394static int stbi__psd_is16(stbi__context *s)
7395{
7396   int channelCount, depth;
7397   if (stbi__get32be(s) != 0x38425053) {
7398       stbi__rewind( s );
7399       return 0;
7400   }
7401   if (stbi__get16be(s) != 1) {
7402       stbi__rewind( s );
7403       return 0;
7404   }
7405   stbi__skip(s, 6);
7406   channelCount = stbi__get16be(s);
7407   if (channelCount < 0 || channelCount > 16) {
7408       stbi__rewind( s );
7409       return 0;
7410   }
7411   STBI_NOTUSED(stbi__get32be(s));
7412   STBI_NOTUSED(stbi__get32be(s));
7413   depth = stbi__get16be(s);
7414   if (depth != 16) {
7415       stbi__rewind( s );
7416       return 0;
7417   }
7418   return 1;
7419}
7420#endif
7421
7422#ifndef STBI_NO_PIC
7423static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
7424{
7425   int act_comp=0,num_packets=0,chained,dummy;
7426   stbi__pic_packet packets[10];
7427
7428   if (!x) x = &dummy;
7429   if (!y) y = &dummy;
7430   if (!comp) comp = &dummy;
7431
7432   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
7433      stbi__rewind(s);
7434      return 0;
7435   }
7436
7437   stbi__skip(s, 88);
7438
7439   *x = stbi__get16be(s);
7440   *y = stbi__get16be(s);
7441   if (stbi__at_eof(s)) {
7442      stbi__rewind( s);
7443      return 0;
7444   }
7445   if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
7446      stbi__rewind( s );
7447      return 0;
7448   }
7449
7450   stbi__skip(s, 8);
7451
7452   do {
7453      stbi__pic_packet *packet;
7454
7455      if (num_packets==sizeof(packets)/sizeof(packets[0]))
7456         return 0;
7457
7458      packet = &packets[num_packets++];
7459      chained = stbi__get8(s);
7460      packet->size    = stbi__get8(s);
7461      packet->type    = stbi__get8(s);
7462      packet->channel = stbi__get8(s);
7463      act_comp |= packet->channel;
7464
7465      if (stbi__at_eof(s)) {
7466          stbi__rewind( s );
7467          return 0;
7468      }
7469      if (packet->size != 8) {
7470          stbi__rewind( s );
7471          return 0;
7472      }
7473   } while (chained);
7474
7475   *comp = (act_comp & 0x10 ? 4 : 3);
7476
7477   return 1;
7478}
7479#endif
7480
7481// *************************************************************************************************
7482// Portable Gray Map and Portable Pixel Map loader
7483// by Ken Miller
7484//
7485// PGM: http://netpbm.sourceforge.net/doc/pgm.html
7486// PPM: http://netpbm.sourceforge.net/doc/ppm.html
7487//
7488// Known limitations:
7489//    Does not support comments in the header section
7490//    Does not support ASCII image data (formats P2 and P3)
7491
7492#ifndef STBI_NO_PNM
7493
7494static int      stbi__pnm_test(stbi__context *s)
7495{
7496   char p, t;
7497   p = (char) stbi__get8(s);
7498   t = (char) stbi__get8(s);
7499   if (p != 'P' || (t != '5' && t != '6')) {
7500       stbi__rewind( s );
7501       return 0;
7502   }
7503   return 1;
7504}
7505
7506static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7507{
7508   stbi_uc *out;
7509   STBI_NOTUSED(ri);
7510
7511   ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
7512   if (ri->bits_per_channel == 0)
7513      return 0;
7514
7515   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7516   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7517
7518   *x = s->img_x;
7519   *y = s->img_y;
7520   if (comp) *comp = s->img_n;
7521
7522   if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
7523      return stbi__errpuc("too large", "PNM too large");
7524
7525   out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
7526   if (!out) return stbi__errpuc("outofmem", "Out of memory");
7527   if (!stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8))) {
7528      STBI_FREE(out);
7529      return stbi__errpuc("bad PNM", "PNM file truncated");
7530   }
7531
7532   if (req_comp && req_comp != s->img_n) {
7533      if (ri->bits_per_channel == 16) {
7534         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, s->img_n, req_comp, s->img_x, s->img_y);
7535      } else {
7536         out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
7537      }
7538      if (out == NULL) return out; // stbi__convert_format frees input on failure
7539   }
7540   return out;
7541}
7542
7543static int      stbi__pnm_isspace(char c)
7544{
7545   return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
7546}
7547
7548static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
7549{
7550   for (;;) {
7551      while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
7552         *c = (char) stbi__get8(s);
7553
7554      if (stbi__at_eof(s) || *c != '#')
7555         break;
7556
7557      while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
7558         *c = (char) stbi__get8(s);
7559   }
7560}
7561
7562static int      stbi__pnm_isdigit(char c)
7563{
7564   return c >= '0' && c <= '9';
7565}
7566
7567static int      stbi__pnm_getinteger(stbi__context *s, char *c)
7568{
7569   int value = 0;
7570
7571   while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
7572      value = value*10 + (*c - '0');
7573      *c = (char) stbi__get8(s);
7574      if((value > 214748364) || (value == 214748364 && *c > '7'))
7575          return stbi__err("integer parse overflow", "Parsing an integer in the PPM header overflowed a 32-bit int");
7576   }
7577
7578   return value;
7579}
7580
7581static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
7582{
7583   int maxv, dummy;
7584   char c, p, t;
7585
7586   if (!x) x = &dummy;
7587   if (!y) y = &dummy;
7588   if (!comp) comp = &dummy;
7589
7590   stbi__rewind(s);
7591
7592   // Get identifier
7593   p = (char) stbi__get8(s);
7594   t = (char) stbi__get8(s);
7595   if (p != 'P' || (t != '5' && t != '6')) {
7596       stbi__rewind(s);
7597       return 0;
7598   }
7599
7600   *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
7601
7602   c = (char) stbi__get8(s);
7603   stbi__pnm_skip_whitespace(s, &c);
7604
7605   *x = stbi__pnm_getinteger(s, &c); // read width
7606   if(*x == 0)
7607       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
7608   stbi__pnm_skip_whitespace(s, &c);
7609
7610   *y = stbi__pnm_getinteger(s, &c); // read height
7611   if (*y == 0)
7612       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
7613   stbi__pnm_skip_whitespace(s, &c);
7614
7615   maxv = stbi__pnm_getinteger(s, &c);  // read max value
7616   if (maxv > 65535)
7617      return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
7618   else if (maxv > 255)
7619      return 16;
7620   else
7621      return 8;
7622}
7623
7624static int stbi__pnm_is16(stbi__context *s)
7625{
7626   if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
7627	   return 1;
7628   return 0;
7629}
7630#endif
7631
7632static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
7633{
7634   #ifndef STBI_NO_JPEG
7635   if (stbi__jpeg_info(s, x, y, comp)) return 1;
7636   #endif
7637
7638   #ifndef STBI_NO_PNG
7639   if (stbi__png_info(s, x, y, comp))  return 1;
7640   #endif
7641
7642   #ifndef STBI_NO_GIF
7643   if (stbi__gif_info(s, x, y, comp))  return 1;
7644   #endif
7645
7646   #ifndef STBI_NO_BMP
7647   if (stbi__bmp_info(s, x, y, comp))  return 1;
7648   #endif
7649
7650   #ifndef STBI_NO_PSD
7651   if (stbi__psd_info(s, x, y, comp))  return 1;
7652   #endif
7653
7654   #ifndef STBI_NO_PIC
7655   if (stbi__pic_info(s, x, y, comp))  return 1;
7656   #endif
7657
7658   #ifndef STBI_NO_PNM
7659   if (stbi__pnm_info(s, x, y, comp))  return 1;
7660   #endif
7661
7662   #ifndef STBI_NO_HDR
7663   if (stbi__hdr_info(s, x, y, comp))  return 1;
7664   #endif
7665
7666   // test tga last because it's a crappy test!
7667   #ifndef STBI_NO_TGA
7668   if (stbi__tga_info(s, x, y, comp))
7669       return 1;
7670   #endif
7671   return stbi__err("unknown image type", "Image not of any known type, or corrupt");
7672}
7673
7674static int stbi__is_16_main(stbi__context *s)
7675{
7676   #ifndef STBI_NO_PNG
7677   if (stbi__png_is16(s))  return 1;
7678   #endif
7679
7680   #ifndef STBI_NO_PSD
7681   if (stbi__psd_is16(s))  return 1;
7682   #endif
7683
7684   #ifndef STBI_NO_PNM
7685   if (stbi__pnm_is16(s))  return 1;
7686   #endif
7687   return 0;
7688}
7689
7690#ifndef STBI_NO_STDIO
7691STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
7692{
7693    FILE *f = stbi__fopen(filename, "rb");
7694    int result;
7695    if (!f) return stbi__err("can't fopen", "Unable to open file");
7696    result = stbi_info_from_file(f, x, y, comp);
7697    fclose(f);
7698    return result;
7699}
7700
7701STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
7702{
7703   int r;
7704   stbi__context s;
7705   long pos = ftell(f);
7706   stbi__start_file(&s, f);
7707   r = stbi__info_main(&s,x,y,comp);
7708   fseek(f,pos,SEEK_SET);
7709   return r;
7710}
7711
7712STBIDEF int stbi_is_16_bit(char const *filename)
7713{
7714    FILE *f = stbi__fopen(filename, "rb");
7715    int result;
7716    if (!f) return stbi__err("can't fopen", "Unable to open file");
7717    result = stbi_is_16_bit_from_file(f);
7718    fclose(f);
7719    return result;
7720}
7721
7722STBIDEF int stbi_is_16_bit_from_file(FILE *f)
7723{
7724   int r;
7725   stbi__context s;
7726   long pos = ftell(f);
7727   stbi__start_file(&s, f);
7728   r = stbi__is_16_main(&s);
7729   fseek(f,pos,SEEK_SET);
7730   return r;
7731}
7732#endif // !STBI_NO_STDIO
7733
7734STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
7735{
7736   stbi__context s;
7737   stbi__start_mem(&s,buffer,len);
7738   return stbi__info_main(&s,x,y,comp);
7739}
7740
7741STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
7742{
7743   stbi__context s;
7744   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7745   return stbi__info_main(&s,x,y,comp);
7746}
7747
7748STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
7749{
7750   stbi__context s;
7751   stbi__start_mem(&s,buffer,len);
7752   return stbi__is_16_main(&s);
7753}
7754
7755STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
7756{
7757   stbi__context s;
7758   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7759   return stbi__is_16_main(&s);
7760}
7761
7762#endif // STB_IMAGE_IMPLEMENTATION
7763
7764/*
7765   revision history:
7766      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
7767      2.19  (2018-02-11) fix warning
7768      2.18  (2018-01-30) fix warnings
7769      2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
7770                         1-bit BMP
7771                         *_is_16_bit api
7772                         avoid warnings
7773      2.16  (2017-07-23) all functions have 16-bit variants;
7774                         STBI_NO_STDIO works again;
7775                         compilation fixes;
7776                         fix rounding in unpremultiply;
7777                         optimize vertical flip;
7778                         disable raw_len validation;
7779                         documentation fixes
7780      2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
7781                         warning fixes; disable run-time SSE detection on gcc;
7782                         uniform handling of optional "return" values;
7783                         thread-safe initialization of zlib tables
7784      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
7785      2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
7786      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
7787      2.11  (2016-04-02) allocate large structures on the stack
7788                         remove white matting for transparent PSD
7789                         fix reported channel count for PNG & BMP
7790                         re-enable SSE2 in non-gcc 64-bit
7791                         support RGB-formatted JPEG
7792                         read 16-bit PNGs (only as 8-bit)
7793      2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
7794      2.09  (2016-01-16) allow comments in PNM files
7795                         16-bit-per-pixel TGA (not bit-per-component)
7796                         info() for TGA could break due to .hdr handling
7797                         info() for BMP to shares code instead of sloppy parse
7798                         can use STBI_REALLOC_SIZED if allocator doesn't support realloc
7799                         code cleanup
7800      2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
7801      2.07  (2015-09-13) fix compiler warnings
7802                         partial animated GIF support
7803                         limited 16-bpc PSD support
7804                         #ifdef unused functions
7805                         bug with < 92 byte PIC,PNM,HDR,TGA
7806      2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
7807      2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
7808      2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
7809      2.03  (2015-04-12) extra corruption checking (mmozeiko)
7810                         stbi_set_flip_vertically_on_load (nguillemot)
7811                         fix NEON support; fix mingw support
7812      2.02  (2015-01-19) fix incorrect assert, fix warning
7813      2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
7814      2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
7815      2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
7816                         progressive JPEG (stb)
7817                         PGM/PPM support (Ken Miller)
7818                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
7819                         GIF bugfix -- seemingly never worked
7820                         STBI_NO_*, STBI_ONLY_*
7821      1.48  (2014-12-14) fix incorrectly-named assert()
7822      1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
7823                         optimize PNG (ryg)
7824                         fix bug in interlaced PNG with user-specified channel count (stb)
7825      1.46  (2014-08-26)
7826              fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
7827      1.45  (2014-08-16)
7828              fix MSVC-ARM internal compiler error by wrapping malloc
7829      1.44  (2014-08-07)
7830              various warning fixes from Ronny Chevalier
7831      1.43  (2014-07-15)
7832              fix MSVC-only compiler problem in code changed in 1.42
7833      1.42  (2014-07-09)
7834              don't define _CRT_SECURE_NO_WARNINGS (affects user code)
7835              fixes to stbi__cleanup_jpeg path
7836              added STBI_ASSERT to avoid requiring assert.h
7837      1.41  (2014-06-25)
7838              fix search&replace from 1.36 that messed up comments/error messages
7839      1.40  (2014-06-22)
7840              fix gcc struct-initialization warning
7841      1.39  (2014-06-15)
7842              fix to TGA optimization when req_comp != number of components in TGA;
7843              fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7844              add support for BMP version 5 (more ignored fields)
7845      1.38  (2014-06-06)
7846              suppress MSVC warnings on integer casts truncating values
7847              fix accidental rename of 'skip' field of I/O
7848      1.37  (2014-06-04)
7849              remove duplicate typedef
7850      1.36  (2014-06-03)
7851              convert to header file single-file library
7852              if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7853      1.35  (2014-05-27)
7854              various warnings
7855              fix broken STBI_SIMD path
7856              fix bug where stbi_load_from_file no longer left file pointer in correct place
7857              fix broken non-easy path for 32-bit BMP (possibly never used)
7858              TGA optimization by Arseny Kapoulkine
7859      1.34  (unknown)
7860              use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
7861      1.33  (2011-07-14)
7862              make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
7863      1.32  (2011-07-13)
7864              support for "info" function for all supported filetypes (SpartanJ)
7865      1.31  (2011-06-20)
7866              a few more leak fixes, bug in PNG handling (SpartanJ)
7867      1.30  (2011-06-11)
7868              added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
7869              removed deprecated format-specific test/load functions
7870              removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
7871              error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7872              fix inefficiency in decoding 32-bit BMP (David Woo)
7873      1.29  (2010-08-16)
7874              various warning fixes from Aurelien Pocheville
7875      1.28  (2010-08-01)
7876              fix bug in GIF palette transparency (SpartanJ)
7877      1.27  (2010-08-01)
7878              cast-to-stbi_uc to fix warnings
7879      1.26  (2010-07-24)
7880              fix bug in file buffering for PNG reported by SpartanJ
7881      1.25  (2010-07-17)
7882              refix trans_data warning (Won Chun)
7883      1.24  (2010-07-12)
7884              perf improvements reading from files on platforms with lock-heavy fgetc()
7885              minor perf improvements for jpeg
7886              deprecated type-specific functions so we'll get feedback if they're needed
7887              attempt to fix trans_data warning (Won Chun)
7888      1.23    fixed bug in iPhone support
7889      1.22  (2010-07-10)
7890              removed image *writing* support
7891              stbi_info support from Jetro Lauha
7892              GIF support from Jean-Marc Lienher
7893              iPhone PNG-extensions from James Brown
7894              warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7895      1.21    fix use of 'stbi_uc' in header (reported by jon blow)
7896      1.20    added support for Softimage PIC, by Tom Seddon
7897      1.19    bug in interlaced PNG corruption check (found by ryg)
7898      1.18  (2008-08-02)
7899              fix a threading bug (local mutable static)
7900      1.17    support interlaced PNG
7901      1.16    major bugfix - stbi__convert_format converted one too many pixels
7902      1.15    initialize some fields for thread safety
7903      1.14    fix threadsafe conversion bug
7904              header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7905      1.13    threadsafe
7906      1.12    const qualifiers in the API
7907      1.11    Support installable IDCT, colorspace conversion routines
7908      1.10    Fixes for 64-bit (don't use "unsigned long")
7909              optimized upsampling by Fabian "ryg" Giesen
7910      1.09    Fix format-conversion for PSD code (bad global variables!)
7911      1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7912      1.07    attempt to fix C++ warning/errors again
7913      1.06    attempt to fix C++ warning/errors again
7914      1.05    fix TGA loading to return correct *comp and use good luminance calc
7915      1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
7916      1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7917      1.02    support for (subset of) HDR files, float interface for preferred access to them
7918      1.01    fix bug: possible bug in handling right-side up bmps... not sure
7919              fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7920      1.00    interface to zlib that skips zlib header
7921      0.99    correct handling of alpha in palette
7922      0.98    TGA loader by lonesock; dynamically add loaders (untested)
7923      0.97    jpeg errors on too large a file; also catch another malloc failure
7924      0.96    fix detection of invalid v value - particleman@mollyrocket forum
7925      0.95    during header scan, seek to markers in case of padding
7926      0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7927      0.93    handle jpegtran output; verbose errors
7928      0.92    read 4,8,16,24,32-bit BMP files of several formats
7929      0.91    output 24-bit Windows 3.0 BMP files
7930      0.90    fix a few more warnings; bump version number to approach 1.0
7931      0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
7932      0.60    fix compiling as c++
7933      0.59    fix warnings: merge Dave Moore's -Wall fixes
7934      0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
7935      0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7936      0.56    fix bug: zlib uncompressed mode len vs. nlen
7937      0.55    fix bug: restart_interval not initialized to 0
7938      0.54    allow NULL for 'int *comp'
7939      0.53    fix bug in png 3->4; speedup png decoding
7940      0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7941      0.51    obey req_comp requests, 1-component jpegs return as 1-component,
7942              on 'test' only check type, not whether we support this variant
7943      0.50  (2006-11-19)
7944              first released version
7945*/
7946
7947
7948/*
7949------------------------------------------------------------------------------
7950This software is available under 2 licenses -- choose whichever you prefer.
7951------------------------------------------------------------------------------
7952ALTERNATIVE A - MIT License
7953Copyright (c) 2017 Sean Barrett
7954Permission is hereby granted, free of charge, to any person obtaining a copy of
7955this software and associated documentation files (the "Software"), to deal in
7956the Software without restriction, including without limitation the rights to
7957use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7958of the Software, and to permit persons to whom the Software is furnished to do
7959so, subject to the following conditions:
7960The above copyright notice and this permission notice shall be included in all
7961copies or substantial portions of the Software.
7962THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7963IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7964FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7965AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
7966LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
7967OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
7968SOFTWARE.
7969------------------------------------------------------------------------------
7970ALTERNATIVE B - Public Domain (www.unlicense.org)
7971This is free and unencumbered software released into the public domain.
7972Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
7973software, either in source code form or as a compiled binary, for any purpose,
7974commercial or non-commercial, and by any means.
7975In jurisdictions that recognize copyright laws, the author or authors of this
7976software dedicate any and all copyright interest in the software to the public
7977domain. We make this dedication for the benefit of the public at large and to
7978the detriment of our heirs and successors. We intend this dedication to be an
7979overt act of relinquishment in perpetuity of all present and future rights to
7980this software under copyright law.
7981THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7982IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7983FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7984AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
7985ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
7986WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
7987------------------------------------------------------------------------------
7988*/