All collections
Efficient LLMs & Quantization
Techniques for compressing and accelerating large language models: quantization (GPTQ, AWQ), pruning, distillation, speculative decoding, and hardware-aware architectures for edge deployment.
0 papers
No papers match your filters.