Gradio

Exllama: Chat with exl2 repo

Chatbot

ExLlama V2

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs. Supports paged attention via Flash Attention

Mistral Instruct 7B v3 Meta Llama 3 70B Instruct

Message

Response

System message

Max tokens

1 4096

Temperature

0.1 4

Top-p

0.1 1

Top-k

0 100

Repetition penalty

0 2

·

Built with Gradio logo

·