Is Coding Dead? Dissecting Devin AI

3 engineering experts break down the hype.

Apr 18, 2024

AI startup Cognition Labs is on a mission: build AI teammates, not AI tools.

What's the difference?

Meet Devin AI, the "first fully autonomous software engineer." Cognition claims that Devin has far greater capabilities than any existing AI tool. He's marketed as a "tireless, skilled teammate" who can execute complex projects unassisted and learn from his mistakes.

Naturally, many developers are worried. A Fireship video titled "AI just officially took our jobs…I hate you Devin" has 2.1 million views. In the comments, developers are speculating about where they'll be in 5 years — predictions include "gardening," "learning a trade," and "begging on the road."

Alarmist content about AI replacing developers is nothing new. However, Devin's bold marketing has kicked AI panic into high gear.

At the same time, many experienced software engineers are scrutinizing Devin's capabilities and pushing back on the hype.

After sifting through dozens of blogs and videos so you don't have to, our team at DevPath selected and recapped the 3 most incisive reviews of Devin AI.

Take a look and decide for yourself what you think of Devin!

Internet of Bugs

To showcase Devin's abilities, Cognition had him solve a real Upwork case. This demo became the centerpiece of Devin AI's promotional videos.

Internet of Bugs, a software engineer of 35+ years, replicates Devin's approach to the Upwork task and finds glaring issues in how he operates. For example, towards the beginning of the demo, the Upwork client asked Devin to generate setup instructions. Instead, Devin provided unrelated code.

The issues only grew from there. At one point, Devin was shown fixing errors in a GitHub repository. However, the files he worked on don't actually exist in that repository — suggesting that Devin created mistakes in order to fix them (a detail not disclosed in the demo).

Internet of Bugs points out that the demo overplayed the need for Devin's involvement. The existing README already contained all the necessary instructions for his task. Devin was only needed for minor tweaks, but contributed unnecessary and redundant code.

Finally, the demo made it seem like Devin solved the task quickly — but chat timestamps told a different story. Devin's work stretched over many hours, while Internet of Bugs could replicate the task in approximately 30 minutes.

Between the demo's deceptive editing and Devin's overcomplicating of the task (plus more inaccuracies detailed in the full video), Internet of Bugs concludes that Devin is a superficial use of AI.

Machine Learning Made Simple

A Machine Learning engineer and AI researcher, Devansh didn't initially feel compelled to review Devin AI's capabilities. After all, he'd already written extensively about the limitations of LLM coders, and Cognition hadn't advanced the technology so much as packaged it in a new way.

However, Devansh decided to address the topic after seeing Internet of Bugs dissect the infamous Upwork demo.

In an analysis of Cognition's hype tactics, Devansh expresses concern about the tech industry's inclination toward sensationalism and hero worship, which can obscure the real value and limitations of technologies like Devin.

To this point, Devansh analyzes two demos in addition to the Upwork video.

In the video "AI finds and fixes a bug that I didn’t catch!", a developer uses Devin to debug an algorithm in one of his GitHub repositories. Devansh points out that the algorithm is for competitive programming, a known strength of AI coding tools.

Sure, Devin found the bug. But he did so with a clearly defined problem, expected inputs and outputs, and prior training data. Not exactly a news-worthy accomplishment.

In one demo highlight, Devin sources and resolves an error in some test cases by adding his own code. It's very impressive for an AI coding assistant — less so for a "fully autonomous AI engineer."

The second demo, "Our AI software engineer fixes a bug in Python algebra system," is more of the same: Devin solves a straightforward problem. However, straightforward problems comprise a very small percentage of human developers' work. Devin simply doesn't have the skills to navigate ambiguity, make architectural tradeoffs, or interface with stakeholders.

In short: PR campaigns use techniques like cherry-picking and bait and switches to create hype around products that can't deliver. Devin AI is a very powerful coding assistant, not your replacement.

The Pragmatic Engineer

Software engineer and tech writer Gergely Orosz applies a critical lens to Devin's performance on SWE-bench. Devin performed significantly higher than other AI-coding tools, resolving 1 in 7 GitHub issues unassisted. This surpassed the previous record for start-of-the-art AI tools, which was 4.8% of issues unassisted.

There is reason to hype up Devin — but not as a competitor with human developers. Junior developers can solve at least 1 in 7 GitHub issues. As senior developers, they'll ideally be able to solve all 7. Even the most powerful AI tool isn't yet competitive with junior developers.

Gergely contextualizes Devin within the broader tech landscape, where major source control platforms have cornered the market on AI coding assistants. Between GitHub Copilot, Cody, and Replit AI, the market is highly saturated. It's nearly impossible for startups like Cognition to break through the noise — unless they make bold claims about their technology (e.g., we're replacing developers).

At the end of the day, LLMs often produce code that is syntactically correct, but may generate misleading information. We still need human developers to test and verify this code. Plus, we have tons to learn about how LLMs perform over time on unfamiliar technologies.

Of course, the way software developers work is rapidly changing, and will continue to do so. But the notion of a "fully autonomous AI developer" is far removed from reality — albeit an effective sales pitch.

Engineering Manager Hub

Discussion about this post

Ready for more?