First Impressions of OpenAI o1: An AI Designed to Think

By - Blink AI Team / First Created on - September 17, 2024


Blog Image

Updated on - Sep 17, 2024
OpenAI has unveiled its new o1 models, dubbed “Strawberry,” promising enhanced reasoning capabilities. Users are now getting their first taste of this new AI model, which pauses to “think” before providing answers. But does it live up to expectations? Well, the results are mixed.

The Hype vs. Reality

After months of buzz, Strawberry’s release is a step forward, but also a step back in some areas. While the o1 model shines in answering complex questions, it comes at a steep price—around four times more expensive than the previous GPT-4o model. The o1 lacks some of the standout features of GPT-4o, like multimodal abilities and speed. OpenAI itself acknowledges that GPT-4o remains the best option for most everyday prompts, as o1 struggles with simpler tasks.
According to Ravid Shwartz Ziv, a professor at NYU specializing in AI, “It’s better at certain problems, but you don’t have this across-the-board improvement.” This reflects the core issue: Strawberry works well, but only for the right kind of tasks.

When to Use OpenAI o1: Big Questions Only

o1’s design centers on handling large, complex problems. Today’s generative AI models aren’t great at solving such issues, but o1 is a step in that direction. It uses “multi-step reasoning,” breaking down complex tasks into smaller chunks before arriving at an answer. This capability is groundbreaking, but it’s also what makes it expensive. OpenAI charges for these additional “reasoning tokens,” even though users may not see the intermediate steps.

Thinking Through Big Ideas: A Unique Approach

What sets OpenAI o1 apart is its ability to break down large questions. For instance, when asked to help plan a Thanksgiving dinner for 11 people, it offered a 750+ word breakdown of how to manage oven space and even suggested renting a portable oven. This level of reasoning is what makes o1 powerful for intricate tasks. However, this approach can feel excessive for simpler queries.
For example, when asked where cedar trees can be found in the U.S., o1 delivered an exhaustive 800-word response. In contrast, GPT-4o provided a concise three-sentence answer. For straightforward tasks, o1 tends to overthink, making it less practical.

Strawberry’s Limitations: Is It Worth the Hype?

Despite its impressive reasoning capabilities, OpenAI o1 falls short in some ways. When rumors first emerged about Strawberry in 2023, many believed it was a step towards AGI (artificial general intelligence). However, OpenAI’s CEO, Sam Altman, clarified that o1 is not AGI. In fact, he tempered expectations, stating that the model is still flawed, limited, and “more impressive on first use than after extended interactions.”

What the AI Industry Thinks: Mixed Reactions

The reception within the AI community has been somewhat muted. Rohan Pandey, a research engineer at ReWorkd, believes that while o1 is good for solving specific, niche problems, it’s not revolutionary. Similarly, Mike Conover, CEO of Brightwave, said, “Everybody is waiting for a step-function change in capabilities, and it’s unclear whether this represents that.”

The Underlying Technology: Not Entirely New

The concepts behind o1 aren’t exactly groundbreaking. Techniques similar to o1’s reasoning process were used by Google in 2016 when it created AlphaGo, the first AI to defeat a world champion in the game Go. AlphaGo used reinforcement learning, playing against itself to master strategies. According to Andy Harrison, former Googler and venture capitalist, this raises a longstanding debate in AI: should we focus on automating workflows, or should we aim for generalized intelligence that mimics human decision-making?

The Debate: Workflow Automation vs. Generalized Intelligence

There are two camps in AI thinking. Camp one believes that workflows can be automated using a step-by-step agentic process, much like o1’s reasoning. Camp two believes that generalized intelligence would eliminate the need for such workflows by enabling AI to make decisions more like a human. Harrison is a proponent of the first camp, arguing that we’re not yet ready to trust AI to make human-like decisions without structured reasoning.

Real-World Applications: When o1 Excels

Despite its flaws, OpenAI o1 can be a valuable tool for those working through big ideas. For example, Kian Katanforoosh, CEO of Workera, explains how o1 could help him structure a 30-minute job interview, working backward from his goals to determine the best way to assess a candidate’s skills. In scenarios like this, o1’s reasoning abilities can provide significant value.

The Cost of OpenAI o1: Is It Worth It?

The big question remains: is OpenAI o1’s enhanced reasoning worth the hefty price tag? As AI models get cheaper over time, o1 stands out as an anomaly, being one of the first models to actually increase in cost. Whether or not this cost is justified depends largely on the type of problems users are trying to solve.

Final Verdict: A Niche Tool for Specific Problems

In conclusion, OpenAI o1 is a specialized tool designed for tackling big, complex problems. While it offers groundbreaking reasoning abilities, it lacks the versatility, speed, and affordability of its predecessor GPT-4o. For everyday tasks, GPT-4o remains the better option, but for users who need AI to "think" through complicated issues, o1 might be worth the investment—just make sure the problem justifies the cost.