mode Local LLM
mlx-apple-silicon-expert-mode
Run, quantize, fine-tune (LoRA/QLoRA), and serve LLMs and VLMs natively on Apple Silicon with MLX and mlx-lm
More in Local LLM
mode
exllama-awq-gptq-expert-mode
Quantize and serve LLMs on consumer GPUs with ExLlamaV2/V3 (EXL2/EXL3), AWQ, and GPTQ
View → modegguf-quantization-expert-mode
Convert HF safetensors to GGUF, run llama-imatrix, choose K-quants vs IQ-quants, and quantize models for llama.cpp
View → modejan-ai-expert-mode
Use Jan.ai open-source desktop assistant as a local LLM hub, OpenAI-compatible server on port 1337, and MCP host
View → modelitellm-proxy-expert-mode
Run LiteLLM as a unified gateway over local + cloud LLMs with router config, virtual keys, budgets, fallbacks, and Redis caching
View → modellama-cpp-expert-mode
Build, run, and tune llama.cpp for local LLM inference across CUDA, ROCm, Metal, Vulkan, and SYCL
View → modellama-cpp-server-expert-mode
Run llama.cpp's HTTP server with OpenAI-compatible endpoints, slots, multimodal, and reverse proxies
View →