Papers
12094 papers
ICLR2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Summary pending...
preference optimizationthe principle of optimism/pessimismRLHF theory
ICLR2025
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Summary pending...
LLMAgentLLM-based Agent
ICLR2025
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Summary pending...
Hierarchical RLReinforcement LearningLLMs
ICLR2025
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Summary pending...
Large Lanuage Model PruningProbe Pruning
ICLR2025
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Summary pending...
BenchmarkEvaluationLarge Language Model
ICLR2025
Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors
Summary pending...
machine-generted text detection; evade detection; fine-tuning
ICLR2025
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling
Summary pending...
Learning-Guided OptimizationRolling Horizon OptimizationFlexible Job Shop Scheduling
ICLR2025
ImProver: Agent-Based Automated Proof Optimization
Summary pending...
Automated Proof OptimizationNeural Theorem ProvingFormal Mathematics
ICLR2025
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
Summary pending...
Large Language ModelsMathematical Problem-SolvingVerifiers
ICLR2025
Reward Learning from Multiple Feedback Types
Summary pending...
Reinforcement LearningRLHFMachine Learning
ICLR2025
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Summary pending...
Fine-tuningtransfer learningfoundation models
ICLR2025
When narrower is better: the narrow width limit of Bayesian parallel branching neural networks
Summary pending...
Bayesian NetworksGaussian ProcessKernel Renormalization
ICLR2025
Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration
Summary pending...
explore-exploitstochastic Hopfield networkThompson sampling
ICLR2025
Discovering Influential Neuron Path in Vision Transformers
Summary pending...
ExplainabilityVision TransformerNeuron
ICLR2025
Learning-Augmented Frequent Directions
Summary pending...
learning-augmented algorithmsalgorithms with predictionsdata streams
ICLR2025
Can Watermarks be Used to Detect LLM IP Infringement For Free?
Summary pending...
large language modelswatermarkmodel copyright
ICLR2025
More Experts Than Galaxies: Conditionally-Overlapping Experts with Biologically-Inspired Fixed Routing
Summary pending...
Deep learningMixture of ExpertsModularity
ICLR2025
In-context Time Series Predictor
Summary pending...
Time Series ForecastingIn-context LearningTransformer
ICLR2025
Uncertainty Herding: One Active Learning Method for All Label Budgets
Summary pending...
Active learning
ICLR2025
Efficient Biological Data Acquisition through Inference Set Design
Summary pending...
Active LearningData AcquisitionML for Drug Discovery