Papers

12094 papers

ICLR2025

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Summary pending...

preference optimizationthe principle of optimism/pessimismRLHF theory
ICLR2025

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Summary pending...

LLMAgentLLM-based Agent
ICLR2025

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Summary pending...

Hierarchical RLReinforcement LearningLLMs
ICLR2025

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing

Summary pending...

Large Lanuage Model PruningProbe Pruning
ICLR2025

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Summary pending...

BenchmarkEvaluationLarge Language Model
ICLR2025

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

Summary pending...

machine-generted text detection; evade detection; fine-tuning
ICLR2025

Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling

Summary pending...

Learning-Guided OptimizationRolling Horizon OptimizationFlexible Job Shop Scheduling
ICLR2025

ImProver: Agent-Based Automated Proof Optimization

Summary pending...

Automated Proof OptimizationNeural Theorem ProvingFormal Mathematics
ICLR2025

Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models

Summary pending...

Large Language ModelsMathematical Problem-SolvingVerifiers
ICLR2025

Reward Learning from Multiple Feedback Types

Summary pending...

Reinforcement LearningRLHFMachine Learning
ICLR2025

Directional Gradient Projection for Robust Fine-Tuning of Foundation Models

Summary pending...

Fine-tuningtransfer learningfoundation models
ICLR2025

When narrower is better: the narrow width limit of Bayesian parallel branching neural networks

Summary pending...

Bayesian NetworksGaussian ProcessKernel Renormalization
ICLR2025

Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration

Summary pending...

explore-exploitstochastic Hopfield networkThompson sampling
ICLR2025

Discovering Influential Neuron Path in Vision Transformers

Summary pending...

ExplainabilityVision TransformerNeuron
ICLR2025

Learning-Augmented Frequent Directions

Summary pending...

learning-augmented algorithmsalgorithms with predictionsdata streams
ICLR2025

Can Watermarks be Used to Detect LLM IP Infringement For Free?

Summary pending...

large language modelswatermarkmodel copyright
ICLR2025

More Experts Than Galaxies: Conditionally-Overlapping Experts with Biologically-Inspired Fixed Routing

Summary pending...

Deep learningMixture of ExpertsModularity
ICLR2025

In-context Time Series Predictor

Summary pending...

Time Series ForecastingIn-context LearningTransformer
ICLR2025

Uncertainty Herding: One Active Learning Method for All Label Budgets

Summary pending...

Active learning
ICLR2025

Efficient Biological Data Acquisition through Inference Set Design

Summary pending...

Active LearningData AcquisitionML for Drug Discovery