Commit Graph

833 Commits

Author SHA1 Message Date
Georgi Gerganov
0492363137
mpi : fix after master merge 2023-07-09 22:23:04 +03:00
Georgi Gerganov
81c5ddd532
Merge branch 'mpi' into refactor-mpi 2023-07-09 22:20:14 +03:00
Evan Miller
03cc12be0d [mpi] continue-on-error: true 2023-07-09 15:10:43 -04:00
Evan Miller
4a9a4748e9 Add OpenMPI to GH action 2023-07-09 15:05:58 -04:00
Evan Miller
0f557c2ac4 Merge branch 'master' into mpi 2023-07-09 15:02:19 -04:00
Georgi Gerganov
9da9d26c70
mpi : minor 2023-07-09 18:38:32 +03:00
Georgi Gerganov
beadbf3380
mpi : fix inference 2023-07-09 18:26:20 +03:00
Georgi Gerganov
ef37dd14e7
mpi : fix output tensor after MPI compute (still not working) 2023-07-09 17:01:08 +03:00
Georgi Gerganov
c717c5185f
mpi : various fixes - communication now works but results are wrong 2023-07-09 16:40:16 +03:00
Georgi Gerganov
01abb3b3b9
mpi : move all MPI logic into ggml-mpi
Not tested yet
2023-07-09 16:04:27 +03:00
Georgi Gerganov
e339d35579
mpi : add names for layer inputs + prep ggml_mpi_graph_compute() 2023-07-09 14:42:36 +03:00
Georgi Gerganov
3232db628c
mpi : trying to move more MPI stuff into ggml-mpi (WIP) (#2099) 2023-07-09 14:08:53 +03:00
oobabooga
1d16309969
llama : remove "first token must be BOS" restriction (#2153) 2023-07-09 11:59:53 +03:00
Nigel Bosch
db4047ad5c
main : escape prompt prefix/suffix (#2151) 2023-07-09 11:56:18 +03:00
JackJollimore
18780e0a5e
readme : update Termux instructions (#2147)
The file pathing is significant when running models inside of Termux on Android devices. llama.cpp performance is improved with loading a .bin from the $HOME directory.
2023-07-09 11:20:43 +03:00
clyang
3bbc1a11f0
ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115)
* Fix buidling with Intel MKL but ask for "cblas.h" issue

* Use angle brackets to indicate the system library
2023-07-09 11:12:20 +03:00
rankaiyx
2492a53fd0
readme : add more docs indexes (#2127)
* Update README.md to add more docs indexes

* Update README.md to add more docs indexes
2023-07-09 10:38:42 +03:00
Johannes Gäßler
64639555ff
Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) 2023-07-08 20:01:44 +02:00
Johannes Gäßler
061f5f8d21
CUDA: add __restrict__ to mul mat vec kernels (#2140) 2023-07-08 00:25:15 +02:00
dylan
84525e7962
docker : add support for CUDA in docker (#1461)
Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-07 21:25:25 +03:00
Georgi Gerganov
a7e20edf22
ci : switch threads to 1 (#2138) 2023-07-07 21:23:57 +03:00
Qingyou Meng
1d656d6360
ggml : change ggml_graph_compute() API to not require context (#1999)
* ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

* rewrite: no longer consider backward compitability; plan and make_plan

* minor: rename ctx as plan; const

* remove ggml_graph_compute from tests/test-grad0.c, but current change breaks backward

* add static ggml_graph_compute_sugar()

* minor: update comments

* reusable buffers

* ggml : more consistent naming + metal fixes

* ggml : fix docs

* tests : disable grad / opt + minor naming changes

* ggml : add ggml_graph_compute_with_ctx()

- backwards compatible API
- deduplicates a lot of copy-paste

* ci : enable test-grad0

* examples : factor out plan allocation into a helper function

* llama : factor out plan stuff into a helper function

* ci : fix env

* llama : fix duplicate symbols + refactor example benchmark

* ggml : remove obsolete assert + refactor n_tasks section

* ggml : fix indentation in switch

* llama : avoid unnecessary bool

* ggml : remove comments from source file and match order in header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-07 19:24:01 +03:00
Georgi Gerganov
7242140283 ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) 2023-07-07 18:37:10 +03:00
Aarni Koskela
3e08ae99ce
convert.py: add mapping for safetensors bf16 (#1598)
Fixes #1473
2023-07-07 09:12:49 -04:00
Evan Miller
ef61acfbf5 Add info to README 2023-07-07 09:02:23 -04:00
Howard Su
481f793acc
Fix opencl by wrap #if-else-endif with \n (#2086) 2023-07-07 05:34:18 +02:00
Evan Miller
55207ba2b8 Add GH workflow, fix test 2023-07-06 21:40:18 -04:00
Evan Miller
1f0a2cfeda Update CMakeLists.txt 2023-07-06 21:25:34 -04:00
Evan Miller
06a239343c PR comments 2023-07-06 20:18:41 -04:00
Evan Miller
32deabfdc8 Merge branch 'master' into mpi 2023-07-06 19:04:50 -04:00
Georgi Gerganov
dfd9fce6d6
ggml : fix restrict usage 2023-07-06 19:41:31 +03:00
Judd
36680f6e40
convert : update for baichuan (#2081)
1. guess n_layers;
2. relax warnings on context size;
3. add a note that its derivations are also supported.

Co-authored-by: Judd <foldl@boxvest.com>
2023-07-06 19:23:49 +03:00
tslmy
a17a2683d8
alpaca.sh : update model file name (#2074)
The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in https://github.com/ggerganov/llama.cpp/issues/382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model.
2023-07-06 19:17:50 +03:00
Tobias Lütke
31cfbb1013
Expose generation timings from server & update completions.js (#2116)
* use javascript generators as much cleaner API

Also add ways to access completion as promise and EventSource

* export llama_timings as struct and expose them in server

* update readme, update baked includes

* llama : uniform variable names + struct init

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-05 16:51:13 -04:00
Jesse Jojo Johnson
983b555e9d
Update Server Instructions (#2113)
* Update server instructions for web front end
* Update server README
* Remove duplicate OAI instructions
* Fix duplicate text

---------

Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com>
2023-07-05 21:03:19 +03:00
Georgi Gerganov
ec326d350c
ggml : fix bug introduced in #1237 2023-07-05 20:44:11 +03:00
Georgi Gerganov
1b6efeab82
tests : fix test-grad0 2023-07-05 20:20:25 +03:00
Stephan Walter
1b107b8550
ggml : generalize quantize_fns for simpler FP16 handling (#1237)
* Generalize quantize_fns for simpler FP16 handling

* Remove call to ggml_cuda_mul_mat_get_wsize

* ci : disable FMA for mac os actions

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-05 19:13:06 +03:00
Jesse Jojo Johnson
8567c76b53
Update server instructions for web front end (#2103)
Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com>
2023-07-05 18:13:35 +03:00
Johannes Gäßler
924dd22fd3
Quantized dot products for CUDA mul mat vec (#2067) 2023-07-05 14:19:42 +02:00
Howard Su
051c70dcd5
llama: Don't double count the sampling time (#2107) 2023-07-05 18:31:23 +08:00
Johannes Gäßler
9e4475f5cf
Fixed OpenCL offloading prints (#2082) 2023-07-05 08:58:05 +02:00
Nigel Bosch
7f0e9a775e
embd-input: Fix input embedding example unsigned int seed (#2105) 2023-07-05 07:33:33 +08:00
Georgi Gerganov
b472f3fca5
readme : add link web chat PR 2023-07-04 22:25:22 +03:00
Georgi Gerganov
ed9a54e512
ggml : sync latest (new ops, macros, refactoring) (#2106)
- add ggml_argmax()
- add ggml_tanh()
- add ggml_elu()
- refactor ggml_conv_1d() and variants
- refactor ggml_conv_2d() and variants
- add helper macros to reduce code duplication in ggml.c
2023-07-04 21:54:11 +03:00
jwj7140
f257fd2550
Add an API example using server.cpp similar to OAI. (#2009)
* add api_like_OAI.py
* add evaluated token count to server
* add /v1/ endpoints binding
2023-07-04 21:06:12 +03:00
Tobias Lütke
7ee76e45af
Simple webchat for server (#1998)
* expose simple web interface on root domain

* embed index and add --path for choosing static dir

* allow server to multithread

because web browsers send a lot of garbage requests we want the server
to multithread when serving 404s for favicon's etc. To avoid blowing up
llama we just take a mutex when it's invoked.


* let's try this with the xxd tool instead and see if msvc is happier with that

* enable server in Makefiles

* add /completion.js file to make it easy to use the server from js

* slightly nicer css

* rework state management into session, expose historyTemplate to settings

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04 16:05:27 +02:00
Henri Vasserman
acc111caf9
Allow old Make to build server. (#2098)
Also make server build by default.

Tested with Make 3.82
2023-07-04 15:38:04 +03:00
ZhouYuChen
23c7c6fc91
Update Makefile: clean simple (#2097) 2023-07-04 14:15:16 +02:00
Evan Miller
042c5b278f wrap includes 2023-07-04 00:13:20 -04:00