Collections
Multimodal Foundation Models
Large-scale vision-language models covering pretraining, instruction tuning, and alignment. Includes image-text, video-text, audio-visual models and their evaluation on multimodal benchmarks.
0 papers
Embodied AI & Robot Learning
Robot foundation models, manipulation, locomotion, and scene understanding. Includes imitation learning from human demonstrations, sim-to-real transfer, and language-conditioned robot policies.
0 papers
Reasoning & Mathematical Problem Solving
LLM reasoning capabilities — chain-of-thought, process reward models, tree search, and formal verification. Covers competition math, theorem proving, and code reasoning benchmarks.
0 papers
Efficient LLMs & Quantization
Techniques for compressing and accelerating large language models: quantization (GPTQ, AWQ), pruning, distillation, speculative decoding, and hardware-aware architectures for edge deployment.
0 papers
3D Reconstruction & Gaussian Splatting
Neural scene representations from 3D Gaussian Splatting to NeRF variants and feed-forward reconstruction models. Includes dynamic scenes, large-scale mapping, and real-time rendering.
0 papers
Healthcare VLM & Medical Imaging
Vision-Language Models for radiology, pathology, ophthalmology, and clinical decision support. Covers report generation, zero-shot diagnosis, and instruction-tuned medical foundation models.
0 papers
Bioinformatics & Genomics AI
Foundation models and deep learning applied to protein structure prediction, genomics, drug discovery, and single-cell analysis. Includes AlphaFold successors, RNA language models, and multi-omics integration.
0 papers
LLM Survey 2024
72 papers