Upwork’s New Human+Agent Productivity Index Reveals Up to 70% Boost in Work Completion from Human and AI Agent Collaboration vs. Agents Working Alone

Today we’re announcing initial findings from Upwork’s new Human+Agent Productivity Index (HAPI), the industry’s first data-driven evaluation of how human expertise amplifies AI agent performance in real knowledge work.
Unlike other AI agent evaluations that rely on fixed or synthetic datasets, HAPI is built on actual client projects from Upwork that reflect how real knowledge work gets done. AI agents are also typically tested in isolation or evaluated on narrow simulations; HAPI studies AI agents paired with the collaboration and creativity humans bring to real work.
Upwork is the world’s largest human and AI-powered work marketplace, with roughly 800,000 active clients posting more than 3 million jobs annually on the Marketplace. This scale provides a unique view into how real work unfolds every day. Our goal with HAPI is to measure where and how people can use AI agents to do better work faster.
HAPI measures value of human-in-the-loop (HITL) on simple client projects
Recent research underscores the limited performance of current AI agents on real-world tasks—completing under 30% of tasks in simulated settings and below 3% on freelance projects. HAPI evaluates AI agents on simple, well-defined, low-complexity projects where they have a reasonable chance of success. Built from more than 300 real projects successfully completed on Upwork’s Marketplace, HAPI studies a diverse range of work categories such as Writing, Data Science & Analytics, and Web, Mobile & Software Development. Open-ended or highly complex jobs that are typical of the vast majority of work conducted on Upwork, but are ill-suited for completion by AI agents, were intentionally excluded. Virtually all of the projects included in the index were priced under $500. These types of simple jobs represent less than 6% of Upwork’s total gross services volume (GSV) and a tiny fraction of freelance and contingent work more broadly.
Initial HAPI results show that human and agent collaboration increased completion rates on simple projects by up to 70% compared to agents operating alone. Even agents based on top AI models like Gemini 2.5 Pro, OpenAI GPT-5, and Claude Sonnet 4 struggle to complete real, simple client jobs on their own, but completion rates improved dramatically when paired with expert human professionals. This validates our core thesis: The future of work is humans and AI, working together.

HAPI calculates completion rate by breaking each project down into defined criteria. Expert Upwork freelancers then use clear rubrics to rate whether all of the criteria were completed by the AI agent alone, and again after each round of human guidance. HAPI rates a project ‘complete’ when all criteria are delivered. Importantly, HAPI does not measure subjective details like tone or style. Therefore, a project included in completion rate may not meet the same quality standard that a knowledge worker delivered.
Humans improve AI agent outputs across work categories
Across categories, agents struggled even on low-complexity tasks without human oversight, requiring multiple rounds of feedback to increase completion rates. Writing, Translation, and Sales & Marketing projects saw gains of up to 17 percentage points when a human guided the work. Simple Engineering & Architecture projects improved even more, with jumps up to 23 percentage points. These results underscore that human intuition and domain expertise remain essential for shaping ideas, applying context, and ensuring quality.

Standalone agents performed best on basic technical and computational projects, particularly in the Web, Mobile & Software Development and Data Science & Analytics categories, where tasks were structured and had clear parameters. This reflects our focus on lower-complexity projects where agents can deliver the most reliable results compared to a broader set of high-complexity tasks assessed in other benchmarks. Even here, human guidance improved outcomes. Claude Sonnet 4 saw the strongest improvement, with double-digit percentage point gains, while Gemini 2.5 Pro and OpenAI GPT-5 saw more modest improvements.
For the full methodology and detailed results, see the Human+Agent Productivity Index and our accompanying research paper, which will be presented at a NeurIPS workshop later this year.
Building the human and AI-powered work marketplace of the future
While HAPI’s initial findings are based on a select set of low-complexity projects, the limitations we observe in agent-only performance offer important insight into the trajectory of AI in work. If today’s leading AI agents still require human oversight to reliably complete simple, well-structured tasks, it underscores how indispensable human discernment, ingenuity, and contextual understanding remain, especially as task complexity grows. Extrapolating from these results it’s clear that the future of work isn’t AI replacing humans, but AI and humans in deep collaboration, with agents providing speed and scale and humans providing creativity, judgment and oversight, building trust and delivering better results together.
Explore the power of human and agent collaboration and view the Human+Agent Productivity Index here: https://www.upwork.com/human-agent-productivity-index










.png)


