Anthropic launches a new AI model that ‘thinks’ as long as you want

Spread the love

Anthropic is releasing a new frontier AI model called Claude 3.7 Sonnet, which the company designed to “think” about questions for as long as users want it to.

Anthropic calls Claude 3.7 Sonnet the industry’s first “hybrid AI reasoning model,” because it’s a single model that can give both real-time answers and more considered, “thought-out” answers to questions. Users can choose whether to activate the AI model’s “reasoning” abilities, which prompt Claude 3.7 Sonnet to “think” for a short or long period of time.

The model represents Anthropic’s broader effort to simplify the user experience around its AI products. Most AI chatbots today have a daunting model picker that forces users to choose from several different options that vary in cost and capability. Labs like Anthropic would rather you not have to think about it — ideally, one model does all the work.

Claude 3.7 Sonnet is rolling out to all users and developers on Monday, Anthropic said, but only users paying for Anthropic’s premium Claude chatbot plans will get access to the model’s reasoning features. Free Claude users will get the standard, non-reasoning version of Claude 3.7 Sonnet, which Anthropic claims outperforms its previous frontier AI model, Claude 3.5 Sonnet. (Yes, the company skipped a number.)

Blinking Photo Ad

Claude 3.7 Sonnet costs $3 per million input tokens (meaning you could enter roughly 750,000 words, more words than the entire Lord of the Rings series, into Claude for $3) and $15 per million output tokens. That makes it more expensive than OpenAI’s o3-mini ($1.10 per 1M input tokens/$4.40 per 1M output tokens) and DeepSeek’s R1 ($0.55 per 1M input tokens/$2.19 per 1M output tokens), but keep in mind that o3-mini and R1 are strictly reasoning models — not hybrids like Claude 3.7 Sonnet.

Anthropic’s new thinking modes Image Credits: Anthropic

Claude 3.7 Sonnet is Anthropic’s first AI model that can “reason”, a technique many AI labs have turned to as traditional methods of improving AI performance taper off.

Reasoning models like o3-mini, R1, Google’s Gemini 2.0 Flash Thinking, and xAI’s Grok 3 (Think) use more time and computing power before answering questions. The models break problems down into smaller steps, which tends to improve the accuracy of the final answer. Reasoning models aren’t thinking or reasoning like a human would, necessarily, but their process is modeled after deduction.

Eventually, Anthropic would like Claude to figure out how long it should “think” about questions on its own, without needing users to select controls in advance, Anthropic’s product and research lead, Diane Penn, told TechCrunch in an interview.

“Similar to how humans don’t have two separate brains for questions that can be answered immediately versus those that require thought,” Anthropic wrote in a blog post shared with TechCrunch, “we regard reasoning as simply one of the capabilities a frontier model should have, to be smoothly integrated with other capabilities, rather than something to be provided in a separate model.”

Anthropic says it’s allowing Claude 3.7 Sonnet to show its internal planning phase through a “visible scratch pad.” Lee told TechCrunch users will see Claude’s full thinking process for most prompts, but that some portions may be redacted for trust and safety purposes.

Claude’s thinking process in the claude app (Credit: Anthropic)

Anthropic says it optimized Claude’s thinking modes for real-world tasks, such as difficult coding problems or agentic tasks. Developers tapping Anthropic’s API can control the “budget” for thinking, trading speed and cost for quality of answer.

On one test to measure real-word coding tasks, SWE-Bench, Claude 3.7 Sonnet was 62.3% accurate, compared to OpenAI’s o3-mini model which scored 49.3%. On another test to measure an AI model’s ability to interact with simulated users and external APIs in a retail setting, TAU-Bench, Claude 3.7 Sonnet scored 81.2%, compared to OpenAI’s o1 model which scored 73.5%.

Anthropic also says Claude 3.7 Sonnet will refuse to answer questions less often than its previous models, claiming the model is capable of making more nuanced distinctions between harmful and benign prompts. Anthropic says it reduced unnecessary refusals by 45% compared to Claude 3.5 Sonnet. This comes at a time when some other AI labs are rethinking their approach to restricting their AI chatbot’s answers.

In addition to Claude 3.7 Sonnet, Anthropic is also releasing an agentic coding tool called Claude Code. Launching as a research preview, the tool lets developers run specific tasks through Claude directly from their terminal.

In a demo, Anthropic employees showed how Claude Code can analyze a coding project with a simple command such as, “Explain this project structure.” Using plain English in the command line, a developer can modify a codebase. Claude Code will describe its edits as it makes changes, and even test a project for errors or push it to a GitHub repository.

Claude Code will initially be available to a limited number of users on a “first come first serve” basis, an Anthropic spokesperson told TechCrunch.

Anthropic is releasing Claude 3.7 Sonnet at a time when AI labs are shipping new AI models at a breakneck pace. Anthropic has historically taken a more methodical, safety-focused approach. But this time, the company’s looking to lead the pack.

For how long is the question. OpenAI may be close to releasing a hybrid AI model of its own; the company’s CEO, Sam Altman, has said it’ll arrive in “months.”

Anthropic launches a new AI model that ‘thinks’ as long as you want | TechCrunch

Leave a Comment Cancel Reply