Papers

12094 papers

EMNLP2023

API-Bank: A Benchmark for Tool-Augmented LLMs

Summary pending...

Exhasutive Review on [Search Workflows](https://github.com/xinzhel/LLM-Search)

Summary pending...

NeurIPS2023

Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change)

Summary pending...

NeurIPS2023

AdaPlanner: Adaptive Planning from Feedback with Language Models

Summary pending...

NeurIPS2023

Self-Refine: Iterative Refinement with Self-Feedback

Summary pending...

NeurIPS2023

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

Summary pending...

High Quality

Summary pending...

ACL2023

MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting

Summary pending...

NeurIPS2023

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

Summary pending...

EMNLP2023

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

Summary pending...

Making Large Language Models into World Models with Precondition and Effect Knowledge

Summary pending...

TaskBench: Benchmarking Large Language Models for Task Automation

Summary pending...

MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use

Summary pending...

Learning From Mistakes Makes LLM Better Reasoner

Summary pending...

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

Summary pending...

NeurIPS2023

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

Summary pending...

NeurIPS2023

On the Planning Abilities of Large Language Models - A Critical Investigation

Summary pending...

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Summary pending...

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Summary pending...

TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents

Summary pending...