AI’s Limitations in Online Freelance Work: A Contrasting View to the Displacement Narrative

An experiment challenges the widespread notion of AI rapidly replacing office workers en masse, revealing that even the most advanced artificial intelligence agents struggle significantly with online freelance work.

The Remote Labor Index: Measuring AI’s Work – Automation Capabilities

The Remote Labor Index, a novel benchmark jointly developed by researchers from data annotation firm Scale AI and the non – profit Center for AI Safety (CAIS), assesses the capacity of state – of – the – art AI models to automate economically valuable tasks.

Experimental Findings: AI’s Underperformance in Freelance Work

The researchers assigned a diverse set of simulated freelance tasks to several leading AI agents. Strikingly, even the top – performing AI could complete less than 3 percent of the work, earning merely $1,810 out of a potential $143,991. Among the tools evaluated, Manus from a Chinese startup of the same name emerged as the most capable, followed by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI, and Gemini from Google.

Insights from CAIS Director

Dan Hendrycks, the director of CAIS, stated, “I should hope this provides a far more accurate understanding of the current state of AI capabilities.” He further cautioned that although some AI agents have shown substantial improvement over the past year, there is no guarantee that this rate of progress will persist.

The AI – Job Displacement Speculation

The remarkable advancements in AI have spurred speculation about AI soon outperforming human intelligence and causing large – scale job displacement. In March, Dario Amodei, the CEO of Anthropic, posited that 90 percent of coding work could be automated within a few months. However, previous AI waves have led to unfounded predictions of job displacement, such as the anticipated replacement of radiologists by AI algorithms.

Task Generation for the Experiment

The researchers generated a wide array of freelance tasks with the help of verified Upwork workers. These tasks covered various domains, including graphic design, video editing, game development, and administrative tasks like data scraping. Each task was accompanied by a job description, a directory of necessary files, and an example of a human – produced finished project.

AI’s Shortcomings in Complex Task Execution

Hendrycks noted that while AI models have made strides in coding, math, and logical reasoning in recent years, they still face difficulties in using multiple tools and performing multi – step complex tasks. “AI lacks long – term memory storage and the ability to engage in continuous learning from experiences. Unlike humans, it cannot acquire new skills on the job,” he added.

Contrasting with OpenAI’s GDPval Benchmark

This analysis offers a counter – perspective to OpenAI’s GDPval benchmark, introduced in September, which claims to measure economically valuable work. According to GDPval, frontier AI models like GPT – 5 are approaching human capabilities on 220 tasks across various office jobs. OpenAI declined to comment.

Acknowledging the Benchmark’s Imperfections

Bing Liu, the research director at Scale AI, added, “We have debated the relationship between AI and jobs for years, but much of it has been hypothetical or theoretical.” Liu and Hendrycks admit that the new benchmark is not a flawless measure of AI’s economic impact. Many professions entail tasks that are not covered by this metric. In practice, many freelancers are likely to use AI as a productivity – enhancing tool.

The AI – Job Loss Reality?

Nonetheless, the perception that AI is already causing job losses is growing. This week, Amazon announced 14,000 job cuts, partly attributing the move to the rapid ascent of generative AI. Beth Galetti, Amazon’s senior vice president of people experience and technology, wrote in a public memo, “This generation of AI is the most transformative technology since the Internet. It enables companies to innovate at an unprecedented pace (in both existing and new market segments).” However, if the Remote Labor Index is a reliable indicator, AI is unlikely to fill these vacated positions.

Reader Engagement

Are you concerned about AI taking your job? Share your thoughts by emailing ailab@wired.com.

This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters [here].

admin

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注