Abstract: Post-training quantization (PTQ) is an effective solution for deploying deep neural networks on edge devices with limited resources. PTQ is especially attractive because it does not require ...
We tried out Google’s new family of multi-modal models with variants compact enough to work on local devices. They work well.
Abstract: In recent years, extreme quantization methods-particularly one-bit quantization-have garnered significant attention in signal processing and data acquisition systems. While one-bit ...
Beats Q8_0 perplexity at half the size -- and even beats F16. APEX outperforms Unsloth Dynamic 2.0 (UD) quantizations on perplexity, HellaSwag, and inference speed while being 2x smaller: APEX ...