llama.cpp/.devops/llama-cli-cuda.Dockerfile

ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=12.6.0
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
# Target the CUDA runtime image
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# CUDA architecture to build for (defaults to all supported archs)
ARG CUDA_DOCKER_ARCH=default

RUN apt-get update && \
    apt-get install -y build-essential git cmake

WORKDIR /app

COPY . .

# Use the default CUDA archs if not specified
RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
        export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
    fi && \
    cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
    cmake --build build --config Release --target llama-cli -j$(nproc) && \
    mkdir -p /app/lib && \
    find build -name "*.so" -exec cp {} /app/lib \;

FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime

RUN apt-get update && \
    apt-get install -y libgomp1

COPY --from=build /app/lib/ /
COPY --from=build /app/build/bin/llama-cli /

ENTRYPOINT [ "/llama-cli" ]
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00			`ARG UBUNTU_VERSION=22.04`
			`# This needs to generally match the container host's environment.`
docker : update CUDA images (#9213) 2024-08-28 13:20:36 +02:00			`ARG CUDA_VERSION=12.6.0`
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00			`# Target the CUDA build image`
			`ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}`
			`# Target the CUDA runtime image`
			`ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}`

build : Fix docker build warnings (#8535) (#8537) 2024-07-17 20:21:55 +02:00			`FROM ${BASE_CUDA_DEV_CONTAINER} AS build`
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00
docker : update CUDA images (#9213) 2024-08-28 13:20:36 +02:00			`# CUDA architecture to build for (defaults to all supported archs)`
			`ARG CUDA_DOCKER_ARCH=default`
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00
			`RUN apt-get update && \`
docker : update CUDA images (#9213) 2024-08-28 13:20:36 +02:00			`apt-get install -y build-essential git cmake`
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00
			`WORKDIR /app`

			`COPY . .`

docker : update CUDA images (#9213) 2024-08-28 13:20:36 +02:00			`# Use the default CUDA archs if not specified`
			`RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \`
			`export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \`
			`fi && \`
docker: use GGML_NATIVE=OFF (#10368) 2024-11-18 00:21:53 +01:00			`cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \`
ggml : build backends as libraries (#10256) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com> 2024-11-14 18:04:35 +01:00			`cmake --build build --config Release --target llama-cli -j$(nproc) && \`
			`mkdir -p /app/lib && \`
			`find build -name "*.so" -exec cp {} /app/lib \;`
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00
build : Fix docker build warnings (#8535) (#8537) 2024-07-17 20:21:55 +02:00			`FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime`
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00
docker : add openmp lib (#7780) 2024-06-06 07:17:21 +02:00			`RUN apt-get update && \`
			`apt-get install -y libgomp1`

ggml : build backends as libraries (#10256) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com> 2024-11-14 18:04:35 +01:00			`COPY --from=build /app/lib/ /`
			`COPY --from=build /app/build/bin/llama-cli /`
docker : add support for CUDA in docker (#1461) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-07 20:25:25 +02:00
`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit e474ef1df481fd8936cd7d098e3065d7de378930. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com> 2024-06-13 01:41:52 +02:00			`ENTRYPOINT [ "/llama-cli" ]`