KEMBAR78
GitHub - X-PLUG/MobileAgent: Mobile-Agent: The Powerful GUI Agent Family
Skip to content

X-PLUG/MobileAgent

Repository files navigation

Mobile-Agent: The Powerful GUI Agent Family by Tongyi Lab, Alibaba Group

MobileAgent | Trendshift

👏 Welcome to try Mobile-Agent-v3 via our Modelscope online demo or Bailian online demo!

❗️We provide the limited-time free Mobile-Agent-v3 API on Bailian for quick experience. View the documentation.

🤗 GUI-Owl-32B | GUI-Owl-32B | 🤗 GUI-Owl-7B | GUI-Owl-7B

📢News

  • [2025.9.24]🔥🔥 We've released the demo on ModelScope that's based on Wuying Cloud Desktop and Phone. No need to deploy models locally or prepare devices, just input your instruction to experience Mobile-Agent-v3! ModelScope Demo Link and Bailian Demo Link. For a limited-time free Mobile-Agent-v3 API, please check the documentation. The new version based on Qwen-3-VL is coming soon.
  • [2025.9.19]🔥 GUI-Critic-R1 has been accepted by The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).
  • [2025.9.16]🔥 We have released our latest work, UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning. The paper, code, and model are now open-sourced.
  • [2025.9.16]🔥 We've open-sourced the code of GUI-Owl and Mobile-Agent-v3 on OSWorld, AndroidWorld, and real-world mobile scenarios. See the OSWorld Code. The OSWorld RL-tuned checkpoint of GUI-Owl is also released. See the AndroidWorld Code and Real-world Scenarios Code.
  • [2025.8.20]All new GUI-Owl and Mobile-Agent-v3 are released! Technical report can be found here. And model checkpoint will be released on GUI-Owl-7B and GUI-Owl-32B.
    • GUI-Owl is a multi-modal cross-platform GUI VLM with GUI perception, grounding, and end-to-end operation capabilities.
    • Mobile-Agent-v3 is a cross-platform multi-agent framework based on GUI-Owl. It provides capabilities such as planning, progress management, reflection, and memory.
  • [2025.8.14]Mobile-Agent-v3 won the best demo award at the The 24rd China National Conference on Computational Linguistics (CCL 2025).
  • [2025.3.17] PC-Agent has been accepted by the ICLR 2025 Workshop.
  • [2024.9.26] Mobile-Agent-v2 has been accepted by The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024).
  • [2024.7.29] Mobile-Agent won the best demo award at the The 23rd China National Conference on Computational Linguistics (CCL 2024).
  • [2024.3.10] Mobile-Agent has been accepted by the ICLR 2024 Workshop.

📊Results

👀Features

GUI-Owl

  • SOTA results within 7B.
  • A native end-to-end multimodal agent designed as a foundational model for GUI automation.
  • Unifying perception, grounding, reasoning, planning, and action execution within a single policy network.
  • Robust cross-platform interaction and multi-turn decision making with explicit intermediate reasoning.
  • GUI-Owl can be instantiated as different specialized agents within Mobile-Agent-v3.

Mobile-Agent-v3

  • Dynamic task decomposition, planning and progress management.
  • The highly integrated operating space reduces the perception and operation frequency of the model.
  • Extensive exception handling and reflection capabilities provide more stable performance in scenarios such as pop-ups and advertisements.
  • The key information recording capability enables cross-application tasks.

📝Series of Work

📺Demo

Learn about Mobile-Agent-v3.

pr_en.mp4

💻PC

Create a new blank PPT, and then insert a piece of text in the form of Word Art into the first slide, with the content being "Alibaba".

PPT.mp4

🌐Web

Please help me search for flights from Beijing to Paris on Skyscanner departing on September 18th and returning on September 21st.

Skyscanner.mp4

📱Phone

Please help me search for Jinan travel guides on Xiaohongshu, sort them by the number of collections, and save the first note.

default.mp4

⭐Star History

Star History Chart

📑Citation

If you find Mobile-Agent useful for your research and applications, please cite using this BibTeX:

@article{ye2025mobile,
  title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
  author={Ye, Jiabo and Zhang, Xi and Xu, Haiyang and Liu, Haowei and Wang, Junyang and Zhu, Zhaoqing and Zheng, Ziwei and Gao, Feiyu and Cao, Junjie and Lu, Zhengxi and others},
  journal={arXiv preprint arXiv:2508.15144},
  year={2025}
}

@article{lu2025ui,
  title={UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning},
  author={Lu, Zhengxi and Ye, Jiabo and Tang, Fei and Shen, Yongliang and Xu, Haiyang and Zheng, Ziwei and Lu, Weiming and Yan, Ming and Huang, Fei and Xiao, Jun and others},
  journal={arXiv preprint arXiv:2509.11543},
  year={2025}
}

@article{wanyan2025look,
  title={Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation},
  author={Wanyan, Yuyang and Zhang, Xi and Xu, Haiyang and Liu, Haowei and Wang, Junyang and Ye, Jiabo and Kou, Yutong and Yan, Ming and Huang, Fei and Yang, Xiaoshan and others},
  journal={arXiv preprint arXiv:2506.04614},
  year={2025}
}

@article{liu2025pc,
  title={PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC},
  author={Liu, Haowei and Zhang, Xi and Xu, Haiyang and Wanyan, Yuyang and Wang, Junyang and Yan, Ming and Zhang, Ji and Yuan, Chunfeng and Xu, Changsheng and Hu, Weiming and Huang, Fei},
  journal={arXiv preprint arXiv:2502.14282},
  year={2025}
}

@article{wang2025mobile,
  title={Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks},
  author={Wang, Zhenhailong and Xu, Haiyang and Wang, Junyang and Zhang, Xi and Yan, Ming and Zhang, Ji and Huang, Fei and Ji, Heng},
  journal={arXiv preprint arXiv:2501.11733},
  year={2025}
}

@article{wang2024mobile2,
  title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
  author={Wang, Junyang and Xu, Haiyang and Jia, Haitao and Zhang, Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
  journal={arXiv preprint arXiv:2406.01014},
  year={2024}
}

@article{wang2024mobile,
  title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
  author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
  journal={arXiv preprint arXiv:2401.16158},
  year={2024}
}