type location content
TODO llama.cpp/.github/workflows/build.yml:1041
disabled for now, consider adding tests for all CPU variants instead
TODO llama.cpp/.github/workflows/build.yml:1079
Remove GGML_CUDA_CUB_3DOT2 flag once CCCL 3.2 is bundled within CTK and that CTK version is used in this project
TODO llama.cpp/.github/workflows/build.yml:1124
Remove GGML_CUDA_CUB_3DOT2 flag once CCCL 3.2 is bundled within CTK and that CTK version is used in this project
TODO llama.cpp/.github/workflows/build.yml:1168
add ssl support ; we will also need to modify win-build-sycl.bat to accept user-specified args
FIXME llama.cpp/.github/workflows/build.yml:1392
test on devices"
TODO llama.cpp/.github/workflows/build.yml:1461
simplify the following workflows using a matrix
TODO llama.cpp/.github/workflows/build.yml:1462
run lighter CI on PRs and the full CI only on master (if needed)
TODO llama.cpp/.github/workflows/release.yml:400
Remove GGML_CUDA_CUB_3DOT2 flag once CCCL 3.2 is bundled within CTK and that CTK version is used in this project
TODO llama.cpp/CMakeLists.txt:42
analyze performance impact, see https://spidermonkey.dev/blog/2025/01/15/is-memory64-actually-worth-using
TODO llama.cpp/CONTRIBUTING.md:149
abbreviations usage)_
TODO llama.cpp/CONTRIBUTING.md:153
add guidelines with examples and apply them to the codebase)_
TODO llama.cpp/ci/run.sh:55
Remove GGML_CUDA_CUB_3DOT2 flag once CCCL 3.2 is bundled within CTK and that CTK version is used in this project
TODO llama.cpp/ci/run.sh:334
this hangs for some reason ...
TODO llama.cpp/common/CMakeLists.txt:113
use list(APPEND LLAMA_COMMON_EXTRA_LIBS ...)
TODO llama.cpp/common/arg.cpp:178
detect this based on current console
TODO llama.cpp/common/arg.cpp:667
maybe convert enum llama_example to string
TODO llama.cpp/common/arg.cpp:867
support arg with 2 values
TODO llama.cpp/common/chat-parser-xml-toolcall.cpp:506
Delete this when json_partial adds top-level support for null/true/false
TODO llama.cpp/common/chat-parser-xml-toolcall.cpp:652
Note that form.allow_toolcall_in_think is not tested yet. If anyone confirms it works, this comment can be removed.
TODO llama.cpp/common/chat-parser.cpp:1401
Tool calling
TODO llama.cpp/common/chat-parser.h:22
rename to params
TODO llama.cpp/common/chat.cpp:133
these can become expensive for long messages - how to optimize?
TODO llama.cpp/common/chat.cpp:200
this is ugly, refactor it somehow
TODO llama.cpp/common/chat.cpp:812
do we need to merge, or replacing is fine?
TODO llama.cpp/common/chat.cpp:818
merge properly instead of overwriting (matching old behavior)
TODO llama.cpp/common/chat.cpp:836
improve this later
TODO llama.cpp/common/chat.cpp:2401
if (has_raw_python)
TODO llama.cpp/common/chat.cpp:3228
support that mix in handlers below.
TODO llama.cpp/common/chat.h:155
refactor this to "bool enable_thinking"
TODO llama.cpp/common/chat.h:179
refactor this to "bool parse_reasoning"
TODO llama.cpp/common/common.cpp:101
windows + arm64 + mingw64
TODO llama.cpp/common/common.cpp:381
windows + arm64 + mingw64
TODO llama.cpp/common/common.cpp:997
move to common/sampling
TODO llama.cpp/common/common.cpp:1117
fix naming
TODO llama.cpp/common/common.h:521
support threadpool)
TODO llama.cpp/common/common.h:829
repace embd_norm with an enum
TODO llama.cpp/common/console.cpp:1013
maybe support multiline history entries?
TODO llama.cpp/common/download.cpp:360
maybe retry only on certain codes
TODO llama.cpp/common/download.cpp:436
use actual GET status?
TODO llama.cpp/common/download.cpp:747
cache the manifest response so that it appears in the model list
TODO llama.cpp/common/download.cpp:848
get GGUF size, not manifest size
TODO llama.cpp/common/jinja/lexer.cpp:213
handle lstrip/rstrip for comments? (not important for now)
FIXME llama.cpp/common/jinja/parser.cpp:424
tests can also be expressed like this: if x is eq 3
TODO llama.cpp/common/jinja/runtime.h:585
probably allow print value_none as "None" string? currently this breaks some templates
TODO llama.cpp/common/jinja/value.cpp:289
make sure this is the same behavior as Python's strftime
FIXME llama.cpp/common/jinja/value.cpp:575
Support non-specified delimiter (split on consecutive (no leading or trailing) whitespace)
FIXME llama.cpp/common/jinja/value.cpp:599
Support non-specified delimiter (split on consecutive (no leading or trailing) whitespace)
FIXME llama.cpp/common/jinja/value.cpp:916
sorting is currently always case sensitive
FIXME llama.cpp/common/jinja/value.cpp:1027
sorting is currently always case sensitive
TODO llama.cpp/common/jinja/value.cpp:1166
not sure if this is the right behavior
TODO llama.cpp/common/jinja/value.cpp:1220
avoid circular references
TODO llama.cpp/common/jinja/value.cpp:1307
avoid circular references
TODO llama.cpp/common/jinja/value.h:156
C++20 <=> operator
TODO llama.cpp/common/json-partial.cpp:311
handle more unclosed top-level primitive if the stack was empty but we got an error (e.g. "tru", "\"", etc...)
TODO llama.cpp/common/json-partial.h:3
use json_fwd.hpp when possible
TODO llama.cpp/common/json-schema-to-grammar.cpp:971
support minimum, maximum, exclusiveMinimum, exclusiveMaximum at least for zero
TODO llama.cpp/common/peg-parser.cpp:1357
Implement more comprehensive grammar generation for raw strings.
TODO llama.cpp/common/preset.cpp:85
maybe throw an error instead?
TODO llama.cpp/common/preset.h:31
maybe implement to_env() if needed
TODO llama.cpp/common/sampling.cpp:12
deduplicate with llama-impl.h
TODO llama.cpp/common/sampling.cpp:397
measure grammar performance
TODO llama.cpp/common/sampling.cpp:471
simplify
TODO llama.cpp/common/sampling.cpp:617
compute this from the vocab
TODO llama.cpp/common/sampling.h:32
measure grammar performance
TODO llama.cpp/common/speculative.cpp:125
track performance of most recent calls
TODO llama.cpp/common/speculative.cpp:171
optimize or pass from outside?
TODO llama.cpp/common/speculative.cpp:452
implement
TODO llama.cpp/common/speculative.cpp:735
noop
TODO llama.cpp/convert_hf_to_gguf.py:555
why do we squeeze here?
TODO llama.cpp/convert_hf_to_gguf.py:614
use Q4_K and Q6_K
TODO llama.cpp/convert_hf_to_gguf.py:854
Handle "sliding_attention" similarly when models start implementing it
TODO llama.cpp/convert_hf_to_gguf.py:966
should these be marked as UNUSED instead? (maybe not)
TODO llama.cpp/convert_hf_to_gguf.py:2357
how to determine special FIM tokens automatically?
TODO llama.cpp/convert_hf_to_gguf.py:2971
remove this once everyone has migrated to newer version of llama.cpp
TODO llama.cpp/convert_hf_to_gguf.py:3184
multiply by the scale directly instead of inverting it twice
TODO llama.cpp/convert_hf_to_gguf.py:5481
this is a hack, should be fixed
TODO llama.cpp/convert_hf_to_gguf.py:6071
these special tokens should be exported only for the CodeGemma family
TODO llama.cpp/convert_hf_to_gguf.py:6575
implement self.prediction_coefs.weight.clamp_(...)
TODO llama.cpp/convert_hf_to_gguf.py:7073
does this really matter?
TODO llama.cpp/convert_hf_to_gguf.py:7997
mimo v2 does not indicate the number of next-token-prediction layers, therefore we cannot do the same way as GLM4_MOE
TODO llama.cpp/convert_hf_to_gguf.py:9315
Extend this if the prefix(es) need to be configurable
TODO llama.cpp/convert_hf_to_gguf.py:9892
remove this once image support is implemented for Chameleon
TODO llama.cpp/convert_hf_to_gguf.py:10471
remove once MXFP4 is supported more generally
TODO llama.cpp/convert_hf_to_gguf.py:10941
remove this once everyone migrates to newer version of llama.cpp
TODO llama.cpp/convert_hf_to_gguf.py:11526
uncomment U64, U32, and U16, ref: https://github.com/pytorch/pytorch/issues/58734
TODO llama.cpp/convert_hf_to_gguf_update.py:55
generate tokenizer tests for llama.cpp
TODO llama.cpp/convert_hf_to_gguf_update.py:81
this string has to exercise as much pre-tokenizer functionality as possible
TODO llama.cpp/convert_hf_to_gguf_update.py:85
add models here, base models preferred
TODO llama.cpp/convert_lora_to_gguf.py:64
add ellipsis in the type signature
TODO llama.cpp/convert_lora_to_gguf.py:99
make sure this is correct
TODO llama.cpp/convert_lora_to_gguf.py:167
support higher dimensional A shapes bigger than 1
TODO llama.cpp/convert_lora_to_gguf.py:173
compose the above two
TODO llama.cpp/examples/convert_legacy_llama.py:133
match this with `llama_ftype`
TODO llama.cpp/examples/convert_legacy_llama.py:134
rename to LLAMAFileType
TODO llama.cpp/examples/convert_legacy_llama.py:135
move to `gguf.py`
TODO llama.cpp/examples/convert_legacy_llama.py:209
verify this
TODO llama.cpp/examples/convert_legacy_llama.py:351
reuse (probably move to gguf.py?)
FIXME llama.cpp/examples/convert_legacy_llama.py:1266
Respect --vocab-dir?
TODO llama.cpp/examples/gguf-hash/deps/xxhash/xxhash.h:798
Update to correct value when its been specified.
TODO llama.cpp/examples/gguf-hash/deps/xxhash/xxhash.h:3920
IBM XL */
FIXME llama.cpp/examples/gguf-hash/deps/xxhash/xxhash.h:4670
Clang's output is still _much_ faster -- On an AMD Ryzen 3600,
TODO llama.cpp/examples/json_schema_to_grammar.py:218
support "uri", "email" string formats
TODO llama.cpp/examples/json_schema_to_grammar.py:694
support minimum, maximum, exclusiveMinimum, exclusiveMaximum at least for zero
TODO llama.cpp/examples/parallel/parallel.cpp:507
print sampling/grammar timings for all clients
TODO llama.cpp/examples/pydantic_models_to_grammar.py:20
fix this
TODO llama.cpp/examples/retrieval/retrieval.cpp:8
remove me
TODO llama.cpp/examples/speculative-simple/speculative-simple.cpp:51
simplify this logic
TODO llama.cpp/examples/speculative/speculative.cpp:423
simplify
TODO llama.cpp/examples/speculative/speculative.cpp:629
print sampling/grammar timings for all drafts
TODO llama.cpp/ggml/CMakeLists.txt:90
mark all options as advanced when not GGML_STANDALONE
TODO llama.cpp/ggml/include/ggml-metal.h:42
remove in the future
TODO llama.cpp/ggml/include/ggml.h:190
support for clang
TODO llama.cpp/ggml/include/ggml.h:249
convert to enum https://github.com/ggml-org/llama.cpp/pull/16187#discussion_r2388538726
TODO llama.cpp/ggml/include/ggml.h:749
temporary until model loading of ggml examples is refactored
TODO llama.cpp/ggml/include/ggml.h:1550
when we start computing gradient, make a copy instead of view
TODO llama.cpp/ggml/include/ggml.h:1557
when we start computing gradient, make a copy instead of view
TODO llama.cpp/ggml/include/ggml.h:1570
when we start computing gradient, make a copy instead of view
TODO llama.cpp/ggml/include/ggml.h:1955
this is very likely wrong for some cases! - needs more testing
TODO llama.cpp/ggml/include/ggml.h:2346
needs to be adapted to ggml_flash_attn_ext
TODO llama.cpp/ggml/include/ggml.h:2459
currently only lower, right, non-unitriangular variant is implemented
TODO llama.cpp/ggml/include/ggml.h:2723
currently, only a few functions are in the base ggml API, while the rest are in the CPU backend
TODO llama.cpp/ggml/src/CMakeLists.txt:78
should not be set globally
TODO llama.cpp/ggml/src/CMakeLists.txt:103
probably these flags need to be tweaked on some architectures
TODO llama.cpp/ggml/src/ggml-alloc.c:738
better way to add external dependencies
FIXME llama.cpp/ggml/src/ggml-backend-reg.cpp:163
backends cannot be safely unloaded without a function to destroy all the backend resources,
FIXME llama.cpp/ggml/src/ggml-backend.cpp:182
add a generic callback to the buffer interface
FIXME llama.cpp/ggml/src/ggml-backend.cpp:1199
count the number of inputs instead of only checking when full
TODO llama.cpp/ggml/src/ggml-backend.cpp:1567
add public function to facilitate this, since applications do not have direct access to the backend interface
TODO llama.cpp/ggml/src/ggml-backend.cpp:1609
pass backend to the callback, then the user can decide if they want to synchronize
FIXME llama.cpp/ggml/src/ggml-backend.cpp:1658
needs to be size*2 to account for leafs (do it in graph_split instead)
TODO llama.cpp/ggml/src/ggml-blas/ggml-blas.cpp:411
find the optimal value
TODO llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp:1073
performace is low.
TODO llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp:2264
check theta_scale_length and position_length.
TODO llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp:2341
acl_yarn_ramp_tensor use rope cache.
TODO llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp:2812
n_dims < ne0
TODO llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp:2839
ne0 != n_dims in mode2
TODO llama.cpp/ggml/src/ggml-cann/aclnn_ops.h:883
If `ne12 > 1`, grouped multiplication and memory copying is used for efficiency.
TODO llama.cpp/ggml/src/ggml-cann/common.h:619
each stream should have a memory pool.
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:173
add more device info later.
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:1104
cann backend doesn't support quantized yet. Just leave the code
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:1208
need handle tensor which has paddings.
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:1229
refer to cann(#6017), it use thread's default stream.
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:1311
Support 310p P2P copy
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:1438
quantized type?
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2016
Support 310p P2P copy
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2040
this event is not effective with acl graph mode, change to use aclrtSynchronizeStream
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2096
support broadcast for ADD + RMS_NORM
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2205
Optimize here. Currently, we can only
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2354
support GGML_TYPE_BF16
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2369
Support rope_dim < ne00(dim)
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2441
add circular padding support for cann, see https://github.com/ggml-org/llama.cpp/pull/16985
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2474
support bias != 0.0f
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2476
support attention sinks [TAG_ATTN_SINKS]
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2498
support attention sinks [TAG_ATTN_SINKS]
TODO llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2507
padding to support
TODO llama.cpp/ggml/src/ggml-common.h:1087
fix name to kvalues_iq4_nl
TODO llama.cpp/ggml/src/ggml-cpu/CMakeLists.txt:503
Separation to determine activation of VX/VXE/VXE2
TODO llama.cpp/ggml/src/ggml-cpu/amx/amx.cpp:152
not sure if correct (https://github.com/ggml-org/llama.cpp/pull/16315)
TODO llama.cpp/ggml/src/ggml-cpu/amx/common.h:83
fix padding for vnni format
TODO llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp:510
this is reference impl!
TODO llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp:2426
performance improvement: merge quant A
TODO llama.cpp/ggml/src/ggml-cpu/arch/wasm/quants.c:382
check if unrolling this is better
TODO llama.cpp/ggml/src/ggml-cpu/arch/wasm/quants.c:475
check if unrolling this is better
FIXME llama.cpp/ggml/src/ggml-cpu/arch/x86/cpu-feats.cpp:264
this does not check for OS support
TODO llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c:1110
can _mm256_mulhi_epu16 be faster even if 16-bits?
TODO llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp:114
Use the 'traits' lookup table (for type conversion fns), instead of a mass of 'if' conditions with long templates
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h:170
double-check these work correctly
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h:521
move to ggml-threading
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:123
add support for explicit memory order
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:130
add support for explicit memory order
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:137
add support for explicit memory order
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:1206
this is a bit of a hack, we should probably have a better way to handle this
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:1263
extract to "extra_op"
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:1477
this is a bit of a hack, we should probably have a better way to handle this
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2159
Windows etc.
FIXME llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2280
get_rows can use additional threads, but the cost of launching additional threads
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2431
support > 64 CPUs
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2524
there seems to be no way to set lower prio on Apple platforms
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2547
this may not work on BSD, to be verified
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2893
this can become (n_tasks-1)
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2896
this can become (n_tasks-1)
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:2899
this can become (n_tasks-1)
FIXME llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp:136
deep copy
TODO llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp:665
move to ggml-base
FIXME llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp:303
this should check for __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1708
support for transposed / permuted tensors
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1712
maybe this is not optimal?
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1752
support for transposed / permuted tensors
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1756
maybe this is not optimal?
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1797
templateify the implemenation and support for I64
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1832
support for transposed / permuted tensors
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1850
maybe this is not optimal?
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1913
smarter multi-theading
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1956
smarter multi-theading
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:1999
smarter multi-theading
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:2042
smarter multi-theading
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:3729
optimize
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:3798
optimize
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:3970
optimize
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:4070
optimize
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:4410
add x parameter to ggml_vec_scale_f32 and remove this memcpy
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:5086
handle transposed/permuted matrices
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:5165
handle transposed/permuted matrices
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:5253
is this supposed to be ceil instead of floor?
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:5378
handle transposed/permuted matrices
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:7713
optimize
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:8556
on ARM, native f16 should be faster
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:9214
transpose the output for smaller strides for big batches?
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:9331
maybe unroll more?
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:9419
what happens when (d_state % svcntw()) != 0?
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:9495
optimize / multi-thread
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:9562
optimize / multi-thread
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:10403
Write SVE code and RVV code
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:10658
handle transposed/permuted matrices
TODO llama.cpp/ggml/src/ggml-cpu/ops.cpp:10756
handle transposed/permuted matrices
TODO llama.cpp/ggml/src/ggml-cpu/quants.c:151
add WASM SIMD
TODO llama.cpp/ggml/src/ggml-cpu/repack.cpp:2379
this branch seems wrong
TODO llama.cpp/ggml/src/ggml-cpu/repack.cpp:2500
generalise.
TODO llama.cpp/ggml/src/ggml-cpu/repack.cpp:2541
needs to be revisited
TODO llama.cpp/ggml/src/ggml-cpu/repack.cpp:2805
General batched mul mat for 4D tensors
TODO llama.cpp/ggml/src/ggml-cpu/simd-mappings.h:468
is this optimal ?
TODO llama.cpp/ggml/src/ggml-cpu/simd-mappings.h:568
is this optimal ?
TODO llama.cpp/ggml/src/ggml-cpu/simd-mappings.h:862
Does this work?
TODO llama.cpp/ggml/src/ggml-cpu/simd-mappings.h:886
is this optimal ?
TODO llama.cpp/ggml/src/ggml-cpu/simd-mappings.h:978
is this optimal ?
TODO llama.cpp/ggml/src/ggml-cpu/unary-ops.cpp:135
Use the 'traits' lookup table (for type conversion fns), instead of a mass of 'if' conditions with long templates
TODO llama.cpp/ggml/src/ggml-cpu/vec.cpp:475
optimize to process the remaining elements in groups using the smaller vector sizes from AVX2 and SSE
TODO llama.cpp/ggml/src/ggml-cpu/vec.h:609
Write SVE code
TODO llama.cpp/ggml/src/ggml-cpu/vec.h:672
Write SVE code
TODO llama.cpp/ggml/src/ggml-cpu/vec.h:950
optimize performance
TODO llama.cpp/ggml/src/ggml-cuda/CMakeLists.txt:60
Remove once CCCL 3.2 has been released and bundled with CUDA Toolkit
TODO llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:190
might need to bail out if the HTP is stuck on something
TODO llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:205
handle errors
TODO llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:208
update profiling implementation, currently only works for opt_opsync mode
TODO llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:1825
support broadcast for ne[2 and 3]
TODO llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:1981
add support for non-contigiuos tensors
TODO llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:2000
add support for non-contigiuos tensors
FIXME llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:2047
add support for sinks
FIXME llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:2166
add support for GGML_TYPE_F16 for src0
TODO llama.cpp/ggml/src/ggml-hexagon/ggml-hexagon.cpp:2727
the current version might do incorrect reodering in cases where quantized src0
TODO llama.cpp/ggml/src/ggml-hexagon/htp/hex-dma.h:32
technically we don't need these and could use Q6_dmstart/wait/etc instead
FIXME llama.cpp/ggml/src/ggml-hexagon/htp/matmul-ops.c:930
might need to handle zero as a special case (see ggml-cpu code)
FIXME llama.cpp/ggml/src/ggml-hexagon/htp/matmul-ops.c:962
might need to handle zero as a special case (see ggml-cpu code)
FIXME llama.cpp/ggml/src/ggml-hexagon/htp/matmul-ops.c:1044
might need to handle zero as a special case (see ggml-cpu code)
FIXME llama.cpp/ggml/src/ggml-hexagon/htp/matmul-ops.c:1085
might need to handle zero as a special case (see ggml-cpu code)
FIXME llama.cpp/ggml/src/ggml-hexagon/htp/matmul-ops.c:1186
might need to handle zero as a special case (see ggml-cpu code)
FIXME llama.cpp/ggml/src/ggml-hexagon/htp/matmul-ops.c:1241
might need to handle zero as a special case (see ggml-cpu code)
TODO llama.cpp/ggml/src/ggml-hexagon/htp/rope-ops.c:334
use simd to speed up the remaining elements copy
TODO llama.cpp/ggml/src/ggml-hip/CMakeLists.txt:86
do not use CUDA definitions for HIP
TODO llama.cpp/ggml/src/ggml-impl.h:72
move to ggml.h? (won't be able to inline)
TODO llama.cpp/ggml/src/ggml-impl.h:603
Consider allowing GGML_OP_NONE nodes in between
FIXME llama.cpp/ggml/src/ggml-metal/CMakeLists.txt:103
only add to the ggml-metal target?
TODO llama.cpp/ggml/src/ggml-metal/ggml-metal-impl.h:9
for optimal performance, become function of the device and work size
TODO llama.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:56
this can be removed when the allocator starts filtering them earlier
TODO llama.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:632
make a simpler cpy_bytes kernel
TODO llama.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:1629
relax this constraint in the future
TODO llama.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:1816
helper function
TODO llama.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:1836
determine the optimal parameters based on grid utilization
TODO llama.cpp/ggml/src/ggml-musa/CMakeLists.txt:73
do not use CUDA definitions for MUSA
TODO llama.cpp/ggml/src/ggml-musa/CMakeLists.txt:107
mudnn has not provided static libraries yet
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:2918
initialize them for non SMALL_PATH path, or remove them.
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3268
add support
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3270
implement BF16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, IQ4_NL support (https://github.com/ggml-org/llama.cpp/pull/14661)")
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3374
add circular padding support for opencl, see https://github.com/ggml-org/llama.cpp/pull/16985
FIXME llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3771
if any unexpected results are seen, double check the offset -
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3916
use preallocated images instead of sub-buffer then image
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:5146
find the optimal values for these
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:8515
remove duplicate definitions of image description + format -- move to top
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9052
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9091
add block_q4_0 variant.
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9110
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9147
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9209
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9245
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9282
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9319
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9358
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9396
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9428
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9466
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9499
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9655
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9699
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9735
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9879
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:9918
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:10258
Unknown GPU");
TODO llama.cpp/ggml/src/ggml-rpc/ggml-rpc.cpp:481
currently the output_size is always known, do we need support for commands with variable output size?
TODO llama.cpp/ggml/src/ggml-rpc/ggml-rpc.cpp:789
cache the alloc responses to avoid extra RPC calls?
TODO llama.cpp/ggml/src/ggml-rpc/ggml-rpc.cpp:1932
obtain value from the server
TODO llama.cpp/ggml/src/ggml-rpc/ggml-rpc.cpp:1970
call the remote backend and cache the results
TODO llama.cpp/ggml/src/ggml-sycl/common.hpp:84
adapt to hardwares
TODO llama.cpp/ggml/src/ggml-sycl/common.hpp:87
currently, it's not used for XMX really.
TODO llama.cpp/ggml/src/ggml-sycl/convert.cpp:517
Downsample logic is separated from the kernel, a rewrite is desirable
TODO llama.cpp/ggml/src/ggml-sycl/getrows.cpp:180
Refactor and remove duplicates */
TODO llama.cpp/ggml/src/ggml-sycl/getrows.cpp:211
k-quants
FIXME llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:863
do not crash if SYCL Buffer alloc fails
FIXME llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:1118
this is not thread safe
FIXME llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:1187
this is a hack to avoid having to implement a new buffer type
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:1202
return device.maxBufferLength
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:2561
check that src0->buffer->buft is a split buffer type, replace GGML_BACKEND_TYPE_GPU_SPLIT check
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:2965
see https://github.com/ggml-org/llama.cpp/pull/13155
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3211
accuracy issues in MMQ
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3536
Refactor and cleanup of mul mat dispatching.
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3913
more efficient implementation
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:4459
update for the new
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:4641
support GGML_TYPE_BF16
FIXME llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:4642
keep a list of supported types to avoid breaking the backend when a new type is added
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:4646
The configuration below needs more work to be supported with oneDNN
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:4652
This specific configuration can fail with oneDNN and needs more debugging
TODO llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:4846
add circular padding support for syscl, see https://github.com/ggml-org/llama.cpp/pull/16985
TODO llama.cpp/ggml/src/ggml-sycl/softmax.cpp:67
noncontigous inputs/outputs
TODO llama.cpp/ggml/src/ggml-sycl/sycl_hw.cpp:3
currently not used
TODO llama.cpp/ggml/src/ggml-sycl/sycl_hw.hpp:13
currently not used
TODO llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:3167
We're no longer benefitting from the async compiles (shaders are
TODO llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:5258
Use pointer or reference to avoid copy
TODO llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:6520
staging_offset is not used
TODO llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:14371
enable async and synchronize
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:486
error handling
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:723
handle multiple pipeline names
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:2361
optional, needed?
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:2365
optional, implement this
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:2367
optional, think it coordinates with .init_tensor
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:2453
for now, return maxBufferSize as both free and total memory
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:2868
track need for these toggles: https://issues.chromium.org/issues/42251215
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:2965
Maybe WebGPU needs a "fast" mode where you can request compilers skip adding checks like these,
TODO llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:3150
support non-contiguous tensors, e.g. for MOE_EXPERT_REDUCE
TODO llama.cpp/ggml/src/ggml-zdnn/ggml-zdnn.cpp:22
implement support for quantized types
TODO llama.cpp/ggml/src/ggml-zdnn/ggml-zdnn.cpp:609
make thread-safe
TODO llama.cpp/ggml/src/ggml-zdnn/mmf.cpp:70
Remove in the future as we are currently DLF16 -> FP32 then in the next op, FP32 -> DLF16 again. Inefficient.
TODO llama.cpp/ggml/src/ggml-zdnn/utils.cpp:71
Consider adding a ggml check.
TODO llama.cpp/ggml/src/ggml-zdnn/utils.cpp:72
If tensor = 4D, use ZDNN_NCHW by default.
TODO llama.cpp/ggml/src/ggml-zdnn/utils.cpp:73
If tensor = 2D, use ZDNN_NHWC by default.
FIXME llama.cpp/ggml/src/ggml.c:10
required here for quantization functions
TODO llama.cpp/ggml/src/ggml.c:1723
this should not be needed as long as we don't rely on aligned SIMD loads
TODO llama.cpp/ggml/src/ggml.c:1992
support less-strict constraint
TODO llama.cpp/ggml/src/ggml.c:3788
implement non F32 return
TODO llama.cpp/ggml/src/ggml.c:3812
implement non F32 return
TODO llama.cpp/ggml/src/ggml.c:4320
when implement backward, fix this:
TODO llama.cpp/ggml/src/ggml.c:4923
implement antialias for modes other than bilinear
TODO llama.cpp/ggml/src/ggml.c:5264
check if vT can be multiplied by (k*qT)
TODO llama.cpp/ggml/src/ggml.c:5341
adapt to ggml_flash_attn_ext() changes");
TODO llama.cpp/ggml/src/ggml.c:5344
check if vT can be multiplied by (k*qT)
TODO llama.cpp/ggml/src/ggml.c:5417
maybe support other strides than 1?
TODO llama.cpp/ggml/src/ggml.c:6093
support other variants
TODO llama.cpp/ggml/src/ggml.c:6287
should probably be sum instead of mean
TODO llama.cpp/ggml/src/ggml.c:6799
this branch isn't accessible anymore, maybe move this to ggml_build_forward_expand
FIXME llama.cpp/ggml/src/ggml.c:7421
use ggml-backend to obtain the tensor data
TODO llama.cpp/gguf-py/gguf/constants.py:3666
add GGMLFileType from ggml_ftype in ggml.h
TODO llama.cpp/gguf-py/gguf/constants.py:3746
need help with 64-bit types in Python
FIXME llama.cpp/gguf-py/gguf/gguf_reader.py:73
When/if _get_field_parts() support multi-dimensional arrays, this must do so too
TODO llama.cpp/gguf-py/gguf/gguf_reader.py:205
add option to generate error on duplicate keys
FIXME llama.cpp/gguf-py/gguf/gguf_reader.py:243
Handle multi-dimensional arrays properly instead of flattening
TODO llama.cpp/gguf-py/gguf/gguf_writer.py:425
cleaner way to get the first key
TODO llama.cpp/gguf-py/gguf/lazy.py:49
make this even more comprehensive
TODO llama.cpp/gguf-py/gguf/lazy.py:101
dict and set
TODO llama.cpp/gguf-py/gguf/lazy.py:122
maybe handle tensors in kwargs too
TODO llama.cpp/gguf-py/gguf/lazy.py:228
__array_function__
TODO llama.cpp/gguf-py/gguf/metadata.py:72
load adapter_config.json when possible, it usually contains the base model of the LoRA adapter
TODO llama.cpp/gguf-py/gguf/metadata.py:325
should word-based size labels always be removed instead?
TODO llama.cpp/gguf-py/gguf/metadata.py:354
should the basename version always be excluded?
TODO llama.cpp/gguf-py/gguf/tensor_mapping.py:1210
these do not belong to block_mappings_cfg - move them to mappings_cfg
TODO llama.cpp/gguf-py/gguf/utility.py:87
handle request errors (maybe with limited retries?)
TODO llama.cpp/gguf-py/gguf/vocab.py:163
internally store as the new format instead of converting to old
FIXME llama.cpp/gguf-py/gguf/vocab.py:369
Verify that added tokens here _cannot_ overlap with the main vocab.
TODO llama.cpp/gguf-py/tests/test_metadata.py:110
hf suffix which could be ignored but isn't
TODO llama.cpp/gguf-py/tests/test_metadata.py:142
DPO in the name
TODO llama.cpp/gguf-py/tests/test_metadata.py:151
should "base" be a 'finetune' or 'size_label'?
TODO llama.cpp/gguf-py/tests/test_quants.py:107
is a column-wise sum of squares appropriate?
TODO llama.cpp/include/llama.h:57
show sample usage
TODO llama.cpp/include/llama.h:90
remove, required until per token attributes are available from GGUF file
TODO llama.cpp/include/llama.h:197
simplify (https://github.com/ggml-org/llama.cpp/pull/9294#pullrequestreview-2286561979)
TODO llama.cpp/include/llama.h:205
consider SoA
TODO llama.cpp/include/llama.h:239
rename this to "output"
TODO llama.cpp/include/llama.h:417
update API to start accepting pointers to params structs (https://github.com/ggml-org/llama.cpp/discussions/9172)
TODO llama.cpp/include/llama.h:532
rename to llama_get_pooling_type
TODO llama.cpp/include/llama.h:955
rename to avoid confusion with llama_get_embeddings()
TODO llama.cpp/include/llama.h:979
deprecate in favor of llama_get_logits_ith() (ref: https://github.com/ggml-org/llama.cpp/pull/14853#issuecomment-3113143522)
TODO llama.cpp/include/llama.h:994
deprecate in favor of llama_get_embeddings_ith() (ref: https://github.com/ggml-org/llama.cpp/pull/14853#issuecomment-3113143522)
TODO llama.cpp/include/llama.h:1469
extend in the future
TODO llama.cpp/scripts/check-requirements.sh:172
the check is failing for some reason:
TODO llama.cpp/src/llama-adapter.cpp:288
add support for norm vector
TODO llama.cpp/src/llama-adapter.cpp:297
a more general solution for non-CPU extra buft should be imlpemented in the future
TODO llama.cpp/src/llama-adapter.h:11
pimpl
TODO llama.cpp/src/llama-batch.h:32
whole_seqs for embeddings?
TODO llama.cpp/src/llama-batch.h:113
support embeddings if needed in the future
TODO llama.cpp/src/llama-batch.h:131
this is more of a temporary solution until we have a better way to handle multiple positions per token/embd
TODO llama.cpp/src/llama-context.cpp:105
start reading the actual value of mscale and handle the case where it is not 1.0f
TODO llama.cpp/src/llama-context.cpp:305
move these checks to ggml_backend_sched
TODO llama.cpp/src/llama-context.cpp:320
should we ignore ACCEL types too?
TODO llama.cpp/src/llama-context.cpp:436
instead of the tensor names, use a map to keep track of which (FA) tensors belong to which layer
FIXME llama.cpp/src/llama-context.cpp:444
fa_device_mismatch logic is wrong for --no-kv-offload, but this is broken anyways
TODO llama.cpp/src/llama-context.cpp:500
not sure if the following graph would be worster case for multi-stream KV caches:
FIXME llama.cpp/src/llama-context.cpp:548
if multiple single tokens are evaluated without a synchronization,
TODO llama.cpp/src/llama-context.cpp:645
change the mctx->apply() to return information if a graph reserve is needed
TODO llama.cpp/src/llama-context.cpp:722
use output_resolve_row()
TODO llama.cpp/src/llama-context.cpp:773
use output_resolve_row()
TODO llama.cpp/src/llama-context.cpp:987
not sure yet if we want to reserve here
TODO llama.cpp/src/llama-context.cpp:1112
should we reserve?
TODO llama.cpp/src/llama-context.cpp:1203
add new split mode where we pad the input sequences so that ubatch.equal_seqs == true
TODO llama.cpp/src/llama-context.cpp:1213
this clear of the buffer can easily be forgotten - need something better
TODO llama.cpp/src/llama-context.cpp:1235
this is a tmp solution until we have a proper way to support enc-dec models
TODO llama.cpp/src/llama-context.cpp:1317
hacky solution
TODO llama.cpp/src/llama-context.cpp:1493
avoid this workaround in the future
TODO llama.cpp/src/llama-context.cpp:1543
this clear of the buffer can easily be forgotten - need something better
TODO llama.cpp/src/llama-context.cpp:1783
is there something more efficient which also minimizes swaps?
TODO llama.cpp/src/llama-context.cpp:1833
hacky enc-dec support
TODO llama.cpp/src/llama-context.cpp:1864
also consider shrinking the buffer
TODO llama.cpp/src/llama-context.cpp:1873
not needed?
TODO llama.cpp/src/llama-context.cpp:2039
not sure if needed, might simplify in the future by removing this
FIXME llama.cpp/src/llama-context.cpp:2144
fix in ggml_backend_sched
TODO llama.cpp/src/llama-context.cpp:2491
add more model-specific info which should prevent loading the session file if not identical
TODO llama.cpp/src/llama-context.cpp:2549
handle sampling buffers and samplers state ?
TODO llama.cpp/src/llama-context.cpp:2574
add more info which needs to be identical but which is not verified otherwise
TODO llama.cpp/src/llama-context.cpp:2638
handle sampling buffers and samplers state ?
TODO llama.cpp/src/llama-context.cpp:2831
handle this error");
TODO llama.cpp/src/llama-context.cpp:2948
better default
TODO llama.cpp/src/llama-context.h:188
more flexible combinations of logical/physical batch size and context size
TODO llama.cpp/src/llama-context.h:251
read/write lora adapters and cvec
TODO llama.cpp/src/llama-context.h:268
tmp for handling cross-attention - need something better probably
TODO llama.cpp/src/llama-grammar.h:71
remove, needed for tests atm
TODO llama.cpp/src/llama-grammar.h:133
shared ptr
TODO llama.cpp/src/llama-grammar.h:178
move the API below as member functions of llama_grammar
TODO llama.cpp/src/llama-graph.cpp:99
use ubatch->n_seqs instead of failing
TODO llama.cpp/src/llama-graph.cpp:404
need to move this to the unified cache and check there
TODO llama.cpp/src/llama-graph.cpp:453
need to move this to the unified cache and check there
TODO llama.cpp/src/llama-graph.cpp:456
need to move this to the unified cache and check there
TODO llama.cpp/src/llama-graph.cpp:474
use ubatch->n_seqs instead of failing
TODO llama.cpp/src/llama-graph.cpp:522
need to move this to the unified cache and check there
TODO llama.cpp/src/llama-graph.cpp:538
Hybrid input classes are a bit redundant.
TODO llama.cpp/src/llama-graph.cpp:626
need to move this to the unified cache and check there
TODO llama.cpp/src/llama-graph.cpp:635
need to move this to the unified cache and check there
TODO llama.cpp/src/llama-graph.cpp:1266
Use scalar div instead when/if implemented
TODO llama.cpp/src/llama-graph.cpp:1376
move to hparams?
TODO llama.cpp/src/llama-graph.cpp:1392
add support for gated squared relu
TODO llama.cpp/src/llama-graph.cpp:1612
needs more work to be correct, for now just use the tensor shape
TODO llama.cpp/src/llama-graph.cpp:1856
if ubatch.equal_seqs() == true, we can split the three tensors below into ubatch.n_seqs_unq streams
TODO llama.cpp/src/llama-graph.cpp:1927
remove
TODO llama.cpp/src/llama-graph.cpp:2183
maybe separate the inner implementation into a separate function
TODO llama.cpp/src/llama-graph.cpp:2588
Call llama_sampler_accept_ggml after all samplers have been applied.
TODO llama.cpp/src/llama-graph.h:58
tmp - need something better to pass the data from the encoder to the decoder
TODO llama.cpp/src/llama-graph.h:61
this needs more work to be correct, for now copy the embeddings data to host memory
TODO llama.cpp/src/llama-graph.h:738
needed by build_attn_mha, figure out a way to remove?
TODO llama.cpp/src/llama-graph.h:897
remove
TODO llama.cpp/src/llama-graph.h:951
move this implementation to llama_memory_recurrent.
TODO llama.cpp/src/llama-graph.h:1020
better name
TODO llama.cpp/src/llama-hparams.cpp:149
maybe support other convolution strides than 1
TODO llama.cpp/src/llama-hparams.h:292
think of a better place for this function
TODO llama.cpp/src/llama-hparams.h:293
pack the SWA params in a struct?
TODO llama.cpp/src/llama-impl.h:64
rename to llama_format ?
TODO llama.cpp/src/llama-kv-cache-iswa.cpp:206
if we fail again, we should attempt different splitting strategies
TODO llama.cpp/src/llama-kv-cache.cpp:1086
add ggml helper function for this?
TODO llama.cpp/src/llama-kv-cache.cpp:1485
support multiple streams");
TODO llama.cpp/src/llama-kv-cache.cpp:1489
use ubatch->n_seqs instead of failing
TODO llama.cpp/src/llama-kv-cache.cpp:1760
we also need to save llama_kv_cell_ext when apply_ubatch() support loading it
TODO llama.cpp/src/llama-kv-cache.cpp:1912
we cannot yet restore llama_kv_cell_ext as the apply_ubatch() does not support it yet
TODO llama.cpp/src/llama-kv-cells.h:31
add unit tests
TODO llama.cpp/src/llama-memory-hybrid-iswa.cpp:76
non-sequential equal split can be done if using unified KV cache
TODO llama.cpp/src/llama-memory-hybrid-iswa.cpp:95
will the recurrent cache be in an undefined context at this point?
TODO llama.cpp/src/llama-memory-hybrid.cpp:76
non-sequential equal split can be done if using unified KV cache
TODO llama.cpp/src/llama-memory-hybrid.cpp:95
will the recurrent cache be in an undefined context at this point?
TODO llama.cpp/src/llama-memory-recurrent.cpp:390
non-sequential equal split can be done if using unified KV cache
TODO llama.cpp/src/llama-memory-recurrent.cpp:430
optimize
TODO llama.cpp/src/llama-memory-recurrent.cpp:482
would it be possible to resize the cache instead?
TODO llama.cpp/src/llama-memory-recurrent.cpp:623
bake-in src refcounts in the cell metadata
TODO llama.cpp/src/llama-memory-recurrent.cpp:931
llama_memory_recurrent should have a notion of max sequences
TODO llama.cpp/src/llama-memory-recurrent.h:15
extract the cache state used for graph computation into llama_memory_recurrent_context_i
TODO llama.cpp/src/llama-memory-recurrent.h:78
optimize for recurrent state needs
TODO llama.cpp/src/llama-memory-recurrent.h:178
extract all the state like `head` and `n` here
TODO llama.cpp/src/llama-mmap.cpp:43
consider moving to llama-impl.h if needed in more places
TODO llama.cpp/src/llama-model-loader.cpp:496
this is not very clever - figure out something better
TODO llama.cpp/src/llama-model-loader.cpp:659
make optional
TODO llama.cpp/src/llama-model-saver.cpp:202
implement split file support
TODO llama.cpp/src/llama-model-saver.cpp:247
implement LoRA support
TODO llama.cpp/src/llama-model.cpp:593
Handle SWA metadata similarly when models start implementing it
TODO llama.cpp/src/llama-model.cpp:853
become GGUF KV parameter
TODO llama.cpp/src/llama-model.cpp:876
become GGUF KV parameter
TODO llama.cpp/src/llama-model.cpp:995
become GGUF KV parameter
TODO llama.cpp/src/llama-model.cpp:1200
fix conversion scripts to correctly populate `n_swa` and `n_swa_pattern`
TODO llama.cpp/src/llama-model.cpp:1515
Jamba layers are a bit heterogenous, so naming this is hard.
TODO llama.cpp/src/llama-model.cpp:1815
when MTP is implemented, this should probably be updated if needed
TODO llama.cpp/src/llama-model.cpp:1883
add variants */
TODO llama.cpp/src/llama-model.cpp:2157
when MTP is implemented, this should probably be updated if needed
TODO llama.cpp/src/llama-model.cpp:2488
maybe add n_attn_temp_floor_scale as a separate KV?
TODO llama.cpp/src/llama-model.cpp:2893
move to a separate function
FIXME llama.cpp/src/llama-model.cpp:7464
workaround for CPU backend buft having a NULL device
TODO llama.cpp/src/llama-model.cpp:8572
move reranking logic here and generalize
TODO llama.cpp/src/llama-model.h:546
move this to new llm_arch_model_i interface
TODO llama.cpp/src/llama-model.h:549
move this to new llm_arch_model_i interface
TODO llama.cpp/src/llama-model.h:562
remove
TODO llama.cpp/src/llama-quant.cpp:181
avoid hardcoded tensor names - use the TN_* constants
TODO llama.cpp/src/llama-quant.cpp:313
explore better strategies
TODO llama.cpp/src/llama-quant.cpp:320
explore better strategies
TODO llama.cpp/src/llama-quant.cpp:589
use LLM_KV
TODO llama.cpp/src/llama-quant.cpp:590
use LLM_KV
TODO llama.cpp/src/llama-quant.cpp:654
avoid hardcoded tensor names - use the TN_* constants
TODO llama.cpp/src/llama-quant.cpp:867
use a symmetric type instead
TODO llama.cpp/src/llama-quant.cpp:985
temporary sanity check that the F16 -> MXFP4 is lossless
TODO llama.cpp/src/llama-sampler.cpp:2548
remove trigger_words support.
TODO llama.cpp/src/llama-vocab.cpp:246
there are a lot of common parts between spm and bpe tokenizers, should be refactored and reused
TODO llama.cpp/src/llama-vocab.cpp:730
reduce string copies by using cpts_offs array
TODO llama.cpp/src/llama-vocab.cpp:1578
should we set all of these to LLAMA_TOKEN_NULL?
TODO llama.cpp/src/llama-vocab.cpp:2131
remove, required until per token attributes are available from GGUF file
TODO llama.cpp/src/llama-vocab.cpp:2230
convert scripts should provide these tokens through the KV metadata LLM_KV_TOKENIZER_...
TODO llama.cpp/src/llama-vocab.cpp:2497
workaround for o200k_harmony and solar-open tokenizer: the "<|end|>" token should not be EOG
TODO llama.cpp/src/llama-vocab.cpp:2574
Extract attributes from GGUF file.
TODO llama.cpp/src/llama-vocab.cpp:3271
where do these characters come from?
FIXME llama.cpp/src/models/bitnet.cpp:153
do not use model.tok_embd directly, duplicate as model.output
TODO llama.cpp/src/models/chameleon.cpp:161
this suppresses the output of image tokens, which is required to enable text-only outputs.
TODO llama.cpp/src/models/gemma3.cpp:19
is causal == true correct? might need some changes
TODO llama.cpp/src/models/gemma3n-iswa.cpp:22
is causal == true correct? might need some changes
TODO llama.cpp/src/models/gemma3n-iswa.cpp:209
move this to right after the last KV layer
TODO llama.cpp/src/models/gemma3n-iswa.cpp:261
verify if this is the correct behavior in transformers implementation
TODO llama.cpp/src/models/graph-context-mamba.cpp:131
skip computing output earlier for unused tokens
TODO llama.cpp/src/models/graph-context-mamba.cpp:244
use semistructured matrices to implement state-space duality
TODO llama.cpp/src/models/graph-context-mamba.cpp:260
skip computing output earlier for unused tokens
TODO llama.cpp/src/models/grovemoe.cpp:100
Only do the expert selection and weights once
TODO llama.cpp/src/models/kimi-linear.cpp:428
can this ever be false?
TODO llama.cpp/src/models/minicpm3.cpp:4
if the model varies, these parameters need to be read from the model
TODO llama.cpp/src/models/minicpm3.cpp:145
is this correct?
TODO llama.cpp/src/models/models.h:6
remove in follow-up PR - move to .cpp files
TODO llama.cpp/src/unicode.h:7
reimplement this structure in endian-independent way
TODO llama.cpp/tests/CMakeLists.txt:156
disabled on loongarch64 because the ggml-ci node lacks Python 3.8
TODO llama.cpp/tests/CMakeLists.txt:171
disabled due to slowness
TODO llama.cpp/tests/CMakeLists.txt:232
repair known memory leaks
TODO llama.cpp/tests/test-backend-ops.cpp:2289
Make a template or something
TODO llama.cpp/tests/test-backend-ops.cpp:3132
implement
TODO llama.cpp/tests/test-backend-ops.cpp:4621
add test with a non-contiguous view as input ; this case is needed for build_rope_2d in clip.cpp
TODO llama.cpp/tests/test-backend-ops.cpp:6145
this branch should become a separate test case parameter instead of hardcoding this for these head shapes
TODO llama.cpp/tests/test-backend-ops.cpp:6965
implement for all backends
TODO llama.cpp/tests/test-backend-ops.cpp:6977
or "other"
TODO llama.cpp/tests/test-backend-ops.cpp:6988
implement for all backends
TODO llama.cpp/tests/test-backend-ops.cpp:7486
add after WebGPU is fixed
TODO llama.cpp/tests/test-backend-ops.cpp:8908
better value for n_threads
TODO llama.cpp/tests/test-backend-sampler.cpp:734
biasing too much here makes the Vulkan sampling fail - should be investigated further
TODO llama.cpp/tests/test-chat-template.cpp:625
llama_chat_format_single will be deprecated, remove these tests later
TODO llama.cpp/tests/test-chat.cpp:121
extract to common helper (copied from test-grammar-integration.cpp)
TODO llama.cpp/tests/test-grammar-integration.cpp:1414
The following line should fail, but currently it passes. `exclusiveMinimum` is not supported, as it would likely be too difficult to implement.
TODO llama.cpp/tests/test-grammar-integration.cpp:1421
The following line should fail, but currently it passes. `uniqueItems` is not supported, as it would likely be too difficult to implement.
TODO llama.cpp/tests/test-grammar-llguidance.cpp:1083
The following line should fail, but currently it passes. `uniqueItems` is not supported, as it would likely be too difficult to implement.
TODO llama.cpp/tests/test-grammar-parser.cpp:7
shold not include libllama sources
TODO llama.cpp/tests/test-json-partial.cpp:153
detect the true/false/null literal was complete
FIXME llama.cpp/tests/test-quantize-fns.cpp:63
why is done twice?
TODO llama.cpp/tests/test-regex-partial.cpp:265
((?:b)?a*+).* ??
TODO llama.cpp/tools/cli/cli.cpp:68
show progress
TODO llama.cpp/tools/cli/cli.cpp:75
reduce some copies here in the future
TODO llama.cpp/tools/cli/cli.cpp:152
support remote files in the future (http, https, etc)
TODO llama.cpp/tools/cli/cli.cpp:198
maybe support it later?
TODO llama.cpp/tools/cli/cli.cpp:212
avoid using atexit() here by making `console` a singleton
TODO llama.cpp/tools/completion/completion.cpp:916
one inconvenient of current chat template implementation is that we can't distinguish between user input and special tokens (prefix/postfix)
TODO llama.cpp/tools/cvector-generator/cvector-generator.cpp:211
get rid of malloc if possible
TODO llama.cpp/tools/cvector-generator/cvector-generator.cpp:241
get rid of this malloc if possible
TODO llama.cpp/tools/cvector-generator/cvector-generator.cpp:287
customize padding token
TODO llama.cpp/tools/cvector-generator/pca.hpp:72
enable Metal support when support for GGML_OP_SQRT is added
TODO llama.cpp/tools/cvector-generator/pca.hpp:139
buf_size must be able to scale with params.n_batch
TODO llama.cpp/tools/export-lora/export-lora.cpp:193
remove this when we can support merging subset of adapters. Ref: https://github.com/ggml-org/llama.cpp/pull/8607#discussion_r1686027777
TODO llama.cpp/tools/export-lora/export-lora.cpp:303
add support for quantized lora
TODO llama.cpp/tools/gguf-split/gguf-split.cpp:350
detect OS and use copy_file_range() here for better performance
TODO llama.cpp/tools/imatrix/imatrix.cpp:678
extract into its own method; this is also used by the GGUF-based format
TODO llama.cpp/tools/imatrix/imatrix.cpp:814
extract into its own method; this is also used by the legacy format
TODO llama.cpp/tools/imatrix/imatrix.cpp:1006
only get outputs when (params.process_output || params.compute_ppl)
TODO llama.cpp/tools/mtmd/clip-graph.h:100
there was a more efficient which relies on ggml_view and ggml_rope_ext_inplace, but the rope inplace does not work well with non-contiguous tensors ; we should fix that and revert back to the original implementation in https://github.com/ggml-org/llama.cpp/pull/13065
TODO llama.cpp/tools/mtmd/clip-impl.h:204
improve this later
TODO llama.cpp/tools/mtmd/clip-model.h:99
support warmup size for custom token numbers
TODO llama.cpp/tools/mtmd/clip-model.h:239
rename it to fc (fully connected layer)
TODO llama.cpp/tools/mtmd/clip.cpp:345
q/k norm requires row size == n_embd, while here it's d_head
TODO llama.cpp/tools/mtmd/clip.cpp:646
there was a more efficient which relies on ggml_view and ggml_rope_ext_inplace, but the rope inplace does not work well with non-contiguous tensors ; we should fix that and revert back to the original implementation in https://github.com/ggml-org/llama.cpp/pull/13065
TODO llama.cpp/tools/mtmd/clip.cpp:1131
verify the image_min_tokens
TODO llama.cpp/tools/mtmd/clip.cpp:1142
check kimivl preprocessor for exact values
TODO llama.cpp/tools/mtmd/clip.cpp:1464
this is a hack to support Yi-type llava
TODO llama.cpp/tools/mtmd/clip.cpp:2143
we don't support audio for Gemma 3N, but GGUF contains audio tensors
TODO llama.cpp/tools/mtmd/clip.cpp:2269
define the behavior for add_padding = false
TODO llama.cpp/tools/mtmd/clip.cpp:2631
this is only used by minicpmv, maybe remove it
TODO llama.cpp/tools/mtmd/clip.cpp:3994
remove this function
TODO llama.cpp/tools/mtmd/clip.cpp:4002
remove this function
TODO llama.cpp/tools/mtmd/clip.h:61
should be enum, not string
TODO llama.cpp/tools/mtmd/mtmd-audio.cpp:381
Handle short audio differently or return error
TODO llama.cpp/tools/mtmd/mtmd-audio.cpp:400
probably unnecessary here? (or better doing it in g_cache?)
TODO llama.cpp/tools/mtmd/mtmd-audio.cpp:412
handle these checks better
TODO llama.cpp/tools/mtmd/mtmd-audio.cpp:520
maybe handle this better
TODO llama.cpp/tools/mtmd/mtmd-cli.cpp:84
support for --system-prompt with /clear command
TODO llama.cpp/tools/mtmd/mtmd.cpp:702
maybe support batching, but this may come with memory cost
TODO llama.cpp/tools/mtmd/mtmd.h:187
deprecate
TODO llama.cpp/tools/mtmd/mtmd.h:190
deprecate
TODO llama.cpp/tools/mtmd/mtmd.h:192
deprecate
TODO llama.cpp/tools/mtmd/mtmd.h:217
deprecate
TODO llama.cpp/tools/perplexity/perplexity.cpp:869
this could be made smaller; it's currently the worst-case size
TODO llama.cpp/tools/perplexity/perplexity.cpp:905
don't evaluate the last token of each sequence
TODO llama.cpp/tools/perplexity/perplexity.cpp:1145
the last token of each of the sequences don't need to be evaluated
TODO llama.cpp/tools/perplexity/perplexity.cpp:1167
this could be made smaller; it's currently the worst-case size
TODO llama.cpp/tools/perplexity/perplexity.cpp:1199
end before the last token, no need to predict past the end of the sequences
FIXME llama.cpp/tools/perplexity/perplexity.cpp:1244
this uses the wrong first logits when not skipping the choice word
TODO llama.cpp/tools/perplexity/perplexity.cpp:1575
don't evaluate the last token of each sequence
TODO llama.cpp/tools/quantize/quantize.cpp:75
share with imatrix.cpp
TODO llama.cpp/tools/quantize/quantize.cpp:587
list multiple datasets when there are more than one
TODO llama.cpp/tools/server/server-common.cpp:157
use the base64::decode from base64.hpp)
TODO llama.cpp/tools/server/server-common.cpp:939
add audio_url support by reusing handle_media()
TODO llama.cpp/tools/server/server-common.cpp:1003
test this properly */
TODO llama.cpp/tools/server/server-common.cpp:1049
The response format of this option is not yet OAI-compatible, but seems like no one really using it; We may need to fix it in the future
TODO llama.cpp/tools/server/server-common.cpp:1705
reuse llama_detokenize
TODO llama.cpp/tools/server/server-common.cpp:1847
optimize this block by reducing memory allocations and movement
TODO llama.cpp/tools/server/server-common.cpp:1868
make project name an input
TODO llama.cpp/tools/server/server-common.cpp:1897
current filename
TODO llama.cpp/tools/server/server-common.cpp:1904
configurable?)
TODO llama.cpp/tools/server/server-common.h:152
server_tokens should be copyable - remove this:
TODO llama.cpp/tools/server/server-common.h:303
move it to server-task.cpp
TODO llama.cpp/tools/server/server-common.h:310
move it to server-task.cpp
TODO llama.cpp/tools/server/server-common.h:346
move these to server-task.cpp
TODO llama.cpp/tools/server/server-context.cpp:51
change to unique_ptrs for consistency:
TODO llama.cpp/tools/server/server-context.cpp:59
move members that belong to the task (such as `generated_text`, `has_new_line`) to task_results_state
TODO llama.cpp/tools/server/server-context.cpp:997
mtmd does not support prompt cache
TODO llama.cpp/tools/server/server-context.cpp:1021
improve logic
TODO llama.cpp/tools/server/server-context.cpp:1087
This will error out if a user requests two aloras, but only
TODO llama.cpp/tools/server/server-context.cpp:1154
speculative decoding requires multiple samples per batch - not supported yet
TODO llama.cpp/tools/server/server-context.cpp:1157
getting post/pre sampling logits is not yet supported with backend sampling
TODO llama.cpp/tools/server/server-context.cpp:1160
tmp until backend sampling is fully implemented
TODO llama.cpp/tools/server/server-context.cpp:1256
improve by not doing it more than once for each new line
TODO llama.cpp/tools/server/server-context.cpp:1339
optimize this with min-p optimization
TODO llama.cpp/tools/server/server-context.cpp:1957
simplify and improve
TODO llama.cpp/tools/server/server-context.cpp:2042
rework to have a single draft llama_context shared across all slots [TAG_SERVER_SPEC_REWORK]
TODO llama.cpp/tools/server/server-context.cpp:2127
maybe move branch to outside of this loop in the future
TODO llama.cpp/tools/server/server-context.cpp:2164
support memory-less logits computation
TODO llama.cpp/tools/server/server-context.cpp:2337
support can be added in the future when corresponding vision models get released
TODO llama.cpp/tools/server/server-context.cpp:2476
try to make this conditional on the context or the memory module, instead of the model type
TODO llama.cpp/tools/server/server-context.cpp:2627
try to terminate only the largest active slot/sequence and continue with the rest
TODO llama.cpp/tools/server/server-context.cpp:2637
update slot state based on llama_memory_seq_pos_min() and llama_memory_seq_pos_max()
TODO llama.cpp/tools/server/server-context.cpp:2641
handle ret == 2 (abort) when we start aborting
TODO llama.cpp/tools/server/server-context.cpp:2768
set it here instead of doing inside populate_token_probs
TODO llama.cpp/tools/server/server-context.cpp:2826
set result.probs
TODO llama.cpp/tools/server/server-context.cpp:2963
this log can become very long, put it behind a flag or think about a more compact format
TODO llama.cpp/tools/server/server-context.cpp:2977
this is inaccurate due to child tasks
TODO llama.cpp/tools/server/server-context.cpp:3223
get rid of this dynamic_cast
TODO llama.cpp/tools/server/server-context.cpp:3328
get rid of this dynamic_cast
TODO llama.cpp/tools/server/server-context.cpp:3531
this could maybe be multimodal.
TODO llama.cpp/tools/server/server-http.cpp:357
maybe handle sink.write unsuccessful? for now, we rely on is_connection_closed()
TODO llama.cpp/tools/server/server-http.h:23
move this to a virtual function once we have proper polymorphism support
TODO llama.cpp/tools/server/server-models.cpp:7
remove this once we use HTTP client from download.h
TODO llama.cpp/tools/server/server-models.cpp:153
maybe validate preset before rendering ?
TODO llama.cpp/tools/server/server-models.cpp:196
allow refreshing cached model list
TODO llama.cpp/tools/server/server-models.cpp:800
add support for this on web UI
TODO llama.cpp/tools/server/server-models.cpp:886
add other fields, may require reading GGUF metadata
TODO llama.cpp/tools/server/server-models.h:24
also add downloading state when the logic is added
TODO llama.cpp/tools/server/server-task.cpp:65
deduplicate?
TODO llama.cpp/tools/server/server-task.cpp:123
deduplicate?
TODO llama.cpp/tools/server/server-task.cpp:213
implement
TODO llama.cpp/tools/server/server-task.cpp:279
add more sanity checks for the input parameters
TODO llama.cpp/tools/server/server-task.cpp:413
we may want to throw errors here, in case "el" is incorrect
TODO llama.cpp/tools/server/server-task.cpp:1902
for some reason we can't copy server_tokens, so we have to do this workaround
TODO llama.cpp/tools/server/server-task.h:11
prevent including the whole server-common.h as we only use server_tokens
TODO llama.cpp/tools/server/server-task.h:31
change this to more generic "response_format" to replace the "format_response_*" in server-common
TODO llama.cpp/tools/server/server-task.h:63
implement
TODO llama.cpp/tools/server/server-task.h:500
somehow reuse server_metrics in the future, instead of duplicating the fields
TODO llama.cpp/tools/server/server.cpp:268
refactor in common/console
TODO llama.cpp/tools/server/tests/unit/test_chat_completion.py:254
should not be a valid case
TODO llama.cpp/tools/server/tests/unit/test_completion.py:163
remove this once test_cache_vs_nocache_prompt is fixed
TODO llama.cpp/tools/server/tests/unit/test_completion.py:181
remove this once test_cache_vs_nocache_prompt is fixed
TODO llama.cpp/tools/server/tests/unit/test_completion.py:201
remove this once test_cache_vs_nocache_prompt is fixed
FIXME llama.cpp/tools/server/tests/unit/test_completion.py:369
the result is not deterministic when using other slot than slot 0
TODO llama.cpp/tools/server/tests/unit/test_lora.py:59
remove this once test_cache_vs_nocache_prompt is fixed
TODO llama.cpp/tools/server/tests/unit/test_lora.py:82
find & add other lora adapters for this model
TODO llama.cpp/tools/server/tests/unit/test_lora.py:108
remove this once test_cache_vs_nocache_prompt is fixed
TODO llama.cpp/tools/server/tests/unit/test_tool_call.py:422
fix these (wrong results, either didn't respect decimal instruction or got wrong value)
TODO llama.cpp/tools/server/webui/src/lib/stores/models.svelte.ts:458
Remove this polling once llama-server properly waits for the operation
TODO llama.cpp/tools/tokenize/tokenize.cpp:2
start using log.h
TODO llama.cpp/tools/tokenize/tokenize.cpp:10
remove me
TODO llama.cpp/tools/tokenize/tokenize.cpp:78
potential opportunity to roll common stuff into common/console.cpp
TODO llama.cpp/tools/tokenize/tokenize.cpp:180
reporting invalid_utf8 would be useful on non-Windows too.
TODO llama.cpp/tools/tts/convert_pt_to_hf.py:4
this script is LLM-generated and probably very inefficient and should be rewritten
TODO llama.cpp/tools/tts/tts-outetts.py:148
load from json
TODO llama.cpp/tools/tts/tts-outetts.py:181
tokenization is slow for some reason - here is pre-tokenized input
TODO llama.cpp/tools/tts/tts.cpp:200
not optimized at all
TODO llama.cpp/tools/tts/tts.cpp:273
can be done once
TODO llama.cpp/tools/tts/tts.cpp:1022
all logits?
TODO nonstd.h:76
%s\n", __FILE__, __LINE__, message); \
TODO termbox2.h:2416
Assert global.back.(width,height) == global.front.(width,height)
TODO termbox2.h:2540
iswprint ch?
TODO termbox2.h:2662
\r, \t, \v, \f, etc?
TODO termbox2.h:2948
Reorder TB_CAP_* so more critical caps come first.
TODO termbox2.h:3497
Harden against errors encountered mid-resize
TODO termbox2.h:4048
iswprint ch?