Commit Graph

6 Commits

Author SHA1 Message Date
Radoslav Gerganov
bde7cd3cd9
llama : offload to RPC in addition to other backends (#7640)
* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-03 20:03:26 +03:00
Radoslav Gerganov
2b737caae1
rpc : resource management rework (#7562)
* rpc : resource management rework

* address review comments
2024-05-28 18:13:36 +03:00
Radoslav Gerganov
db10f01310
rpc : track allocated buffers (#7411)
* rpc : track allocated buffers

ref: #7407

* rpc : pack rpc_tensor tightly
2024-05-20 16:36:55 +03:00
Radoslav Gerganov
f4bd8b3d26
rpc : set SO_REUSEADDR for the server socket (#7320)
ref: #7293
2024-05-17 17:25:44 +03:00
Radoslav Gerganov
3b3963c55c rpc : add command line arg for specifying backend memory
ref: #7293
2024-05-16 09:58:29 +03:00
Radoslav Gerganov
5e31828d3e
ggml : add RPC backend (#6829)
* ggml : add RPC backend

The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).

* set TCP_NODELAY

* add CI workflows

* Address review comments

* fix warning

* implement llama_max_devices() for RPC

* Address review comments

* Address review comments

* wrap sockfd into a struct

* implement get_alignment and get_max_size

* add get_device_memory

* fix warning

* win32 support

* add README

* readme : trim trailing whitespace

* Address review comments

* win32 fix

* Address review comments

* fix compile warnings on macos
2024-05-14 14:27:19 +03:00