text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2025-02-05 16:30:44 +01:00

History

oobabooga f55e85e28a Fix multimodal with model loaded through AutoGPTQ		2023-06-06 19:42:40 -03:00
..
llava.py	Fix multimodal with model loaded through AutoGPTQ	2023-06-06 19:42:40 -03:00
pipelines.py	Generalize multimodality (llava/minigpt4 7b and 13b now supported) (#1741 )	2023-05-09 20:18:02 -03:00
README.md	Generalize multimodality (llava/minigpt4 7b and 13b now supported) (#1741 )	2023-05-09 20:18:02 -03:00

README.md

LLaVA pipeline

This module provides 2 pipelines:

llava-7b - for use with LLaVA v0 7B model (finetuned LLaMa 7B)
llava-13b - for use with LLaVA v0 13B model (finetuned LLaMa 13B)

LLaVA uses CLIP openai/clip-vit-large-patch14 as the vision model, and then a single linear layer. For 13B the projector weights are in liuhaotian/LLaVA-13b-delta-v0, and for 7B they are in liuhaotian/LLaVA-7b-delta-v0.

The supported parameter combinations for both the vision model, and the projector are: CUDA/32bit, CUDA/16bit, CPU/32bit