Google’s Gemini transparency cut leaves enterprise developers ‘debugging blind’

Join the event that trusts business leaders for almost two decades. VB Transform brings together people who build a real business AI strategy. Learn more

GoogleThe recent decision to hide the gross reasoning tokens of its flagship model, Gemini 2.5 ProAwarded a fierce reaction from developers who rely on this transparency to create and debug requests.

Change, which echoes a Similar displacement by OpenaiReplaces the reasoning step by step of the model with a simplified summary. The answer highlights a critical tension between the creation of a polished user experience and the supply of confidence and trustworthy tools that companies need.

While companies integrate models of large languages (LLM) into more complex and critical systems, the debate on the share of the internal functioning of the model should be exposed becomes a decisive question for the industry.

A “fundamental demotion” in the transparency of the AI

To solve complex problems, advanced AI models generate an internal monologue, also called “Thinking chain»(COT). It is a series of intermediate steps (for example, a plan, a draft code, a self-correction) that the model produces before arriving at its final response. For example, he could reveal how data processing, what information he uses, how he assesses his own code, etc.

For developers, this reasoning track often serves as an essential diagnostic and debugging tool. When a model provides an incorrect or unexpected output, the reflection process reveals where its logic has been lost. And it turned out to be one of the main advantages of Gemini 2.5 Pro on O1 and O3 of Openai.

In the Google AI developer forum, users have called the deletion of this functionality “massive regression. “Without that, the developers are left in the dark. Another described that it is forced to” guess “why the model failed, leading to” incredibly frustrating repetitive loops trying to repair things “.

Beyond debugging, this transparency is crucial to building sophisticated AI systems. The developers rely on the COT invites to refine and the system instructions, which are the main means of directing the behavior of a model. Functionality is particularly important to create agent workflows, where AI must perform a series of tasks. A developer noted: “The courses have greatly helped to adjust the agent workflows correctly.”

For companies, this evolution towards opacity can be problematic. The black box models that hide their reasoning introduce significant risks, which makes it difficult to trust their results in high issues scenarios. This trend, launched by OPNAI O models O models and now adopted by Google, creates a clear opening for open-source alternatives such as Deepseek-R1 And QWQ-32B.

The models that provide full access to their reasoning chains give companies more control and transparency on the behavior of the model. The decision for a CTO or an AI manager no longer concerns the model with the highest reference scores. It is now a strategic choice between a most efficient but opaque model and a more transparent model that can be integrated into greater confidence.

Google response

In response to an uproar, the members of the Google team explained their justification. Logan Kilpatrick, senior product manager at Google Deepmind, clarified that the change was “purely cosmetic” and has no impact on the internal performance of the model. He noted that for the Gemini application oriented to consumers, the hiding place of the long reflection process creates a cleaner user experience. “The% of people who will read or read thoughts in the Gemini application are very small,” he said.

For developers, new summaries were designed as a first step towards programmatic access on traces of reasoning via the API, which was not possible before.

The Google team has recognized the value of raw thoughts for developers. “I have heard that you all want raw thoughts, the value is clear, there are cases of use that require them,” wrote Kilpatrick, adding that bringing the feature back to the developers is “something we can explore”.

Google’s reaction to the developer’s reaction suggests that a common ground is possible, perhaps through a “developer mode” which reassess access to raw thinking. The need for observability will only grow as AI models evolve as more autonomous agents that use tools and perform complex plans in several stages.

As Kilpatrick concluded in his remarks, “… I can easily imagine that raw thoughts become a critical requirement of all AI systems given the complexity and the growing need for observability + tracing.”

Are the reasoning tokens are surfaces?

However, experts suggest that there is a deeper dynamic in play than the user experience. Subbarao Kambhampati, IA teacher Arizona State Universitywonders if the “intermediate tokens” that a model of reasoning produced before the final response can be used as a reliable guide to understand how the model solves the problems. A paper It is recently co-written argues that anthropomorphing “intermediate tokens” as “traces of reasoning” or “thoughts” can have dangerous implications.

The models often enter into infinite and unintelligible directions in their reasoning process. Several experiences show that the models formed on false traces of reasoning and correct results can learn to solve problems as well as models formed on well -organized traces of reasoning. In addition, the latest generation of reasoning models is formed through learning to strengthen Algorithms which only check the final result and does not assess the “reasoning trace” of the model.

“The fact that intermediate tokens sequences often look like better trained and spelled human scratch work … do not tell us much about the fact that they are used for the same ends as humans to use them, not to mention whether they can be used as an interpretable window on what the LLM is` `thinking ‘, or as a reliable justification of the final response.”

“Most users cannot distinguish anything from the volumes of the raw intermediate tokens that these models spit,” Kambhampati told Venturebeat. “As we mention, Deepseek R1 produces 30 pages of pseudo-English to solve a simple planning problem! A cynical explanation of the reason why O1 / O3 decided not to show the gross token at the origin was because they realized that people will notice how incoherent they are!”

There may be a reason why even after the capitulation, OAI only publishes the “summaries” of the intermediate tokens (probably a white washed white).
– Subbarao Kambhampati (Kambhampati Subbarao) (@ rao2z) February 7, 2025

That said, Kambhampati suggests that post-facto explanations or explanations are likely to be more understandable for end users. “The problem becomes to what extent they really indicate the internal operations that LLMs have gone through,” he said. “For example, as a teacher, I could solve a new problem with many false starts and back, but explain the solution of how I think that students understand students’ understanding.”

The decision to hide the COT also serves as competitive pits. Traces of raw reasoning are incredibly precious training data. As Kambhampati notes, a competitor can use these traces to carry out “distillation”, the process of forming a smaller and cheaper model to imitate the capacities of a more powerful. Hiding raw thoughts makes much more difficult for competitors to copy the secret sauce of a model, a crucial advantage in a high intensity industry.

The debate on the chain of thought is a preview of a much greater conversation on the future of AI. There is still a lot to learn about the internal functioning of reasoning models, how we can exploit them and to what extent the model suppliers are ready to go to allow developers to access them.

Daily information on business use cases with VB daily

If you want to impress your boss, VB Daily has covered you. We give you the interior scoop on what companies do with a generative AI, from regulatory changes to practical deployments, so that you can share information for a maximum return on investment.

Read our Privacy Policy

Thank you for subscribing. Find out more VB Newsletters here.

An error occurred.

Source link

Google’s Gemini transparency cut leaves enterprise developers ‘debugging blind’

A “fundamental demotion” in the transparency of the AI

Google response

Are the reasoning tokens are surfaces?

Leave a ReplyCancel Reply

Markets flatline amid Trump’s delay on Iran and potential Fed cuts in July

Benfica 6-0 Auckland City: Report, result and goals as heavy rainfall delays game for two hours

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

A “fundamental demotion” in the transparency of the AI

Google response

Are the reasoning tokens are surfaces?

Leave a ReplyCancel Reply

Trending now

Markets flatline amid Trump’s delay on Iran and potential Fed cuts in July

Benfica 6-0 Auckland City: Report, result and goals as heavy rainfall delays game for two hours

Anthropic study: Leading AI models show up to 96% blackmail rate against executives