Georgi Gerganov e76d630df1
llama : grouped-query attention + LLaMAv2 70B support (#2276)
* CUDA: GQA implementation

* llama : support for GQA and LLaMAv2 70B

ggml-ci

* py : fix hparams parsing (if-else blocks)

ggml-ci

* py : oh boy ..

ggml-ci

* help : fix gqa value for 70B

ggml-ci

---------

Co-authored-by: JohannesGaessler <johannesg@5d6.de>
2023-07-23 15:09:47 +03:00
..
2023-07-19 10:01:11 +03:00
2023-07-19 10:01:11 +03:00
2023-07-19 10:01:11 +03:00
2023-07-19 10:01:11 +03:00
2023-07-19 10:01:11 +03:00
2023-03-29 20:21:09 +03:00
2023-03-25 21:51:41 +02:00
2023-07-23 14:16:48 +03:00