mode Local LLM

ollama-docker-deploy-expert-mode

Production self-host Ollama in Docker/Compose with GPU passthrough, model preload, reverse proxy auth, and multi-GPU

KindMode

CategoryLocal LLM

Installnpx -y github:anubhavg-icpl/vibe add ollama-docker-deploy-expert-mode

LicenseCC BY-NC-SA 4.0

Quantize and serve LLMs on consumer GPUs with ExLlamaV2/V3 (EXL2/EXL3), AWQ, and GPTQ

Convert HF safetensors to GGUF, run llama-imatrix, choose K-quants vs IQ-quants, and quantize models for llama.cpp

Use Jan.ai open-source desktop assistant as a local LLM hub, OpenAI-compatible server on port 1337, and MCP host

Run LiteLLM as a unified gateway over local + cloud LLMs with router config, virtual keys, budgets, fallbacks, and Redis caching

Build, run, and tune llama.cpp for local LLM inference across CUDA, ROCm, Metal, Vulkan, and SYCL

Run llama.cpp's HTTP server with OpenAI-compatible endpoints, slots, multimodal, and reverse proxies

More in Local LLM