mode Local LLM

mlx-apple-silicon-expert-mode

Run, quantize, fine-tune (LoRA/QLoRA), and serve LLMs and VLMs natively on Apple Silicon with MLX and mlx-lm

KindMode

CategoryLocal LLM

Installnpx -y github:anubhavg-icpl/vibe add mlx-apple-silicon-expert-mode

LicenseCC BY-NC-SA 4.0

Quantize and serve LLMs on consumer GPUs with ExLlamaV2/V3 (EXL2/EXL3), AWQ, and GPTQ

Convert HF safetensors to GGUF, run llama-imatrix, choose K-quants vs IQ-quants, and quantize models for llama.cpp

Use Jan.ai open-source desktop assistant as a local LLM hub, OpenAI-compatible server on port 1337, and MCP host

Run LiteLLM as a unified gateway over local + cloud LLMs with router config, virtual keys, budgets, fallbacks, and Redis caching

Build, run, and tune llama.cpp for local LLM inference across CUDA, ROCm, Metal, Vulkan, and SYCL

Run llama.cpp's HTTP server with OpenAI-compatible endpoints, slots, multimodal, and reverse proxies

More in Local LLM