Exllama: Chat with exl2 repo
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs. Supports paged attention via Flash Attention