- [2025.10.16] π Our paper has been accepted by NeurIPS 2025 Efficient Reasoning Workshop!
- [2025.10.13] πΈ Excited to have a tutorial video for AgentFlow covered by Discover AI on YouTube!
- [2025.10.10] π Our X post received 1K+ likes! Feel free to check out the post and join the discussion! π¬
- [2025.10.08] π₯ We are honored to be featured as π€ HuggingFace Daily Paper #2.
AgentFlow is a trainable, tool-integrated agentic framework designed to overcome the scalability and generalization limits of todayβs tool-augmented reasoning approaches.
Unlike prevailing approaches such as Search-R1 which train a single LLM to interleave reasoning steps with tool calls, AgentFlow introduces a modular agentic system with four specialized modules: π§ Planner, π Executor, β Verifier, and βοΈ Generator.
For effective planning and tool use, the framework directly optimizes planner agent within the system in an online fashion using Flow-based Group Refined Policy Optimization (Flow-GRPO), achieving superior performance across diverse domains with improved tool-calling reliability and long-horizon reasoning capabilities.
Excited to have a tutorial video for AgentFlow covered by Discover AI on YouTube!
- π§© Modular Agentic System β Four specialized agent modules (Planner, Executor, Verifier, Generator) that coordinate via evolving memory and integrated tools across multiple turns.
- π Multi-Tool Integration β Seamlessly connect with diverse tool ecosystems, including
base_generator
,python_coder
,google_search
,wikipedia_search
,web_search
, and more. - π― Flow-GRPO Algorithm β Enables in-the-flow agent optimization for long-horizon reasoning tasks with sparse rewards.
- π Proven Results β AgentFlow (7B Backbone) beats top baselines on 10 benchmarks, with +14.9% search, +14.0% agentic, +14.5% math, +4.1% science, even outperforming ~200B-parameter GPT-4o.
AgentFlow (Qwen-2.5-7B-Instruct Backbone) outperforms top baselines on 10 benchmarks:
- +14.9% on search
- +14.0% on agentic reasoning
- +14.5% on math
- +4.1% on science
π‘ Even surpasses larger proprietary models like GPT-4o (~200B).
- Improved planning and decision-making
- Enhanced tool-calling reliability
- Positive scaling trends with model size & reasoning turns
Explore more in our paper or project page.
- βοΈ Setup
- β‘ Quick Start on AgentFlow Inference
- π₯ Quick Start on AgentFlow Flow-GRPO Training
- π― AgentFlow Benchmark
- π§© Use Your Own Model in AgentFlow
- π€ Core Contributors
- π Advisors
- π Acknowledgements
- π Contributing
bash setup.sh
source .venv/bin/activate
# (Optional) Install `parallel` for running benchmark experiments in parallel:
sudo apt-get update
sudo apt-get install parallel
Copy the .env.template
file from agentflow/.env.template
and rename it to .env
, then place it in the agentflow/
folder. Update the following variables with your own API keys:
OPENAI_API_KEY
(for judging reasponse)GOOGLE_API_KEY
(for Google Search tool)DASHSCOPE_API_KEY
(for calling Qwen-2.5-7B-Instruct as engine for agents and tools)TOGETHER_API_KEY
(alternative for calling Qwen-2.5-7B-Instruct as engine for agents and tools - recommended for international users)- More ways: serve Qwen2.5-7B-instruct model with vLLM (details refer to
serve_vllm_local.md
).
Please check API Key Setup Guide for detailed instructions on how to obtain these keys.
cp agentflow/.env.template agentflow/.env
# Then edit agentflow/.env with your API keys
AgentFlow provides a modular agentic system with four specialized modules (planner, executor, verifier, generator) that coordinate through evolving memory and a toolkit over multiple turns to solve complex reasoning tasks.
To quickly experience the system in action, run the command below (donβt forget to set up your API key):
python quick_start.py
Here is the content of quick_start.py
:
# Import the solver
from agentflow.agentflow.solver import construct_solver
# Set the LLM engine name
llm_engine_name = "dashscope"
# Construct the solver
solver = construct_solver(llm_engine_name=llm_engine_name)
# Solve the user query
output = solver.solve("What is the capital of France?")
print(output["direct_output"])
For effective planning and tool use, the framework directly optimizes the planner agent within the system in an online fashion using Flow-GRPO. Below is a quick start for training.
Before diving in, we recommend verifying that AgentFlow's tools, LLM engines, and network configuration are properly set up. See test_env.md for detailed testing instructions.
We mix two datasets for training: NQ (Natural Questions) for agentic search and DeepMath-103K for mathematical reasoning.
# train data
python data/get_train_data.py
# validation data
python data/aime24_data.py
After that, data dir should be:
data/
βββ train/
β βββ combined_train.parquet (182,190 samples)
βββ val/
β βββ aime24.parquet (30 samples)
βββ aime24_data.py
βββ get_train_data.py
Start agentflow training using Flow-GRPO with tmux:
# Create tmux session and start agentflow service (Window 0)
tmux new-session -s agentflow
bash train/serve_with_logs.sh
# Create new window (Ctrl+B then C) and start training (Window 1)
bash train/train_with_logs.sh
Configuration:
All training hyperparameters are in train/config.yaml
(model settings, tools, RL parameters, resources, etc.)
Logging: We provide a comprehensive logging to monitor training. See logs.md for more details.
Serve the trained planner model with VLLM (here we deploy our 7B Flow-GRPO planner model):
bash scripts/serve_vllm.sh
Run inference on benchmark tasks:
cd test
bash exp/run_all_models_all_datasets.sh
You can find more benchmarking details in benchmark.md.
AgentFlow supports different LLM engines for each agent module. See llm_engine.md for supported models and factory.py
for the corresponding model_string
configuration:
Planner Agent:
- Modify the
llm_engine_name
parameter intest/exp/run_all_models_all_datasets.sh
Other Agents (Executor, Verifier, Generator):
- By default, these agents use a fixed LLM engine (Qwen-2.5-7B-Instruct via DashScope)
- To use your own model, modify
self.llm_engine_fixed
inagentflow/agentflow/models/planner.py:19
:
self.llm_engine_fixed = create_llm_engine(model_string="your-engine", is_multimodal=False, temperature=temperature)
and
- Modify the
llm_engine_name
parameter in the Executor instantiation fromagentflow/agentflow/solver.py:232
:
# Instantiate Executor
executor = Executor(
# llm_engine_name=llm_engine_name,
llm_engine_name="dashscope",
root_cache_dir=root_cache_dir,
verbose=verbose,
# base_url=base_url,
temperature=temperature
)
- For detailed information on supported engines and
model_string
formats, seellm_engine.md
![]() Zhuofeng Li |
![]() Haoxiang Zhang |
![]() Pan Lu |
James Zou |
Yejin Choi |
Yu Zhang |
We thank the following open-source projects:
- verl for the excellent RL framework design.
- vLLM for fast LLM inference support.
- Verl-Tool and agent-lightning for their early-stage exploration in agentic RL Training.
We thank Lambda for GPU support!
We are truly looking forward to open-source contributions to AgentFlow! If youβre interested in contributing, collaborating, or reporting issues, please feel free to open an issue or submit a pull request (PR). You can also reach us at zhuofengli12345@gmail.com, isaacpfino@gmail.com, lupantech@gmail.com or join our Slack community: AgentFlow.
We are also looking forward to your feedback and suggestions!
@article{li2025flow,
title={In-the-Flow Agentic System Optimization for Effective Planning and Tool Use},
author={Li, Zhuofeng and Zhang, Haoxiang and Han, Seungju and Liu, Sheng and Xie, Jianwen and Zhang, Yu and Choi, Yejin and Zou, James and Lu, Pan},
journal={arXiv preprint arXiv:2510.05592},
year={2025}
}