mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-13 13:52:22 +01:00
5b8023d935
This change uses a custom malloc() implementation to transactionally capture to a file dynamic memory created during the loading process. That includes (1) the malloc() allocation for mem_buffer and (2) all the C++ STL objects. On my $1000 personal computer, this change lets me run ./main to generate a single token (-n 1) using the float16 7B model (~12gb size) in one second. In order to do that, there's a one time cost where a 13gb file needs to be generated. This change rocks but it shouldn't be necessary to do something this heroic. We should instead change the file format, so that tensors don't need reshaping and realignment in order to be loaded.
25 lines
232 B
Plaintext
25 lines
232 B
Plaintext
*.o
|
|
*.a
|
|
.cache/
|
|
.vs/
|
|
.vscode/
|
|
.DS_Store
|
|
|
|
build/
|
|
build-em/
|
|
build-debug/
|
|
build-release/
|
|
build-static/
|
|
build-no-accel/
|
|
build-sanitize-addr/
|
|
build-sanitize-thread/
|
|
|
|
models/*
|
|
|
|
/main
|
|
/quantize
|
|
/magic.dat
|
|
|
|
arm_neon.h
|
|
compile_commands.json
|