Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Model minimalism: The new AI strategy saving companies millions


This article is part of the special number of Venturebeat, “the real cost of AI: performance, efficiency and large -scale king”. Learn more of this special issue.

The advent of large languages ​​models (LLMS) has enabled companies to consider the types of projects they can start, which leads to an increase in pilot programs which are now going to deploy.

However, as these projects were growing, companies realized that the previous LLMs they had used were heavy and worse, costly.

Enter small models of language and distillation. Models like Google‘s Gemma family,, Microsoft‘s Phi And Mistral‘s Small 3.1 Companies have made it possible to choose fast and precise models that work for specific tasks. Companies can opt For a smaller model For specific use cases, allowing them to reduce the cost of managing their AI applications and potentially get a better return on investment.

Liendin Distinguished engineer Karthik Ramgopal told Venturebeat that companies opt for smaller models for several reasons.

“Smaller models require less calcular, memory and faster reference times, which translates directly into an OPEX (operational expenses) and CAPEX (capital expenses) infrastructure taking into account GPU costs, availability and energy requirements,” said RAMGOAPL. “The specific models have a narrower range, which makes their behavior more aligned and maintained over time without a complex fast engineering.”

Model developers assess their small models accordingly. O4-min of Openai Cost $ 1.1 per million tokens for inputs and $ 4.4 / million tokens for outings, compared to the full version of O3 at $ 10 for inputs and $ 40 for outings.

Today, companies have a larger pool of small models, models specific to tasks and Distilled models to choose. Nowadays, most flagship models offer a range of sizes. For example, the Claude family of models of Anthropic includes Claude Opus, the biggest model, Claude Sonnet, the versatile modelAnd Claude Haiku, the smallest version. These models are compact enough to operate on portable devices, such as laptops or mobile phones.

The question of savings

When you discuss the return on investment, however, the question is always: what does the return on investment look like? Should this be a return to the costs incurred or the time savings which ultimately mean the dollars saved? Venturebeat experts have spoken to the said return on investment may be difficult to judge because some companies think they have already reached the return on investment by reducing the time devoted to a task while others are waiting for real economic dollars or more activities brought to say if IA investments have really worked.

Normally, companies calculate the return on investment by a simple formula as described by Aware Chief technologist Ravi Tola in a post: King = (advantages-hobs) / costs. But with AI programs, the advantages are not immediately apparent. He suggests that companies identify the advantages they expect to achieve, believe them according to historical data, to be realistic about the overall cost of AI, including hiring, implementation and maintenance, and understand that you must be long term.

With small models, experts argue that they reduce implementation and maintenance costs, especially when refined models to provide them with more context for your business.

Arijit Sengupta, founder and CEO of Kindsaid the way people bring a context to the models dictates the amount of savings they can get. For people who need an additional context for prompts, such as long and complex instructions, this can cause higher token costs.

“You must give a context of models in one way or another;” Consider the fine adjustment and post-training as an alternative means of giving a context of models. I could incur $ 100 in post-training costs, but it is not astronomical. “

Sengupta said they had seen approximately 100 times cost-training costs alone, which drops the cost of using the “millions to a figure to something like $ 30,000”. He stressed that this number includes the operating expenses of the software and the continuous cost of the models of the model and the vectors.

“In terms of maintenance cost, if you do it manually with human experts, it can be expensive to maintain because small models must be post-made up to produce results comparable to large models,” he said.

Experiences Led have shown that a specific model for tasks and refined works well for certain use cases, just like LLM, which makes the deployment of several models specific to use cases rather than large to do everything is more profitable.

The company compared a post-formmed version of the instruct of LLAMA-3.3-70B to a smaller 8B parameter option of the same model. The 70B model, post-training for $ 11.30, was 84% ​​precise in automated assessments and 92% in manual assessments. Once refined at a cost of $ 4.58, the 8B model reached 82% precision in manual evaluation, which would suit more minor and more targeted use cases.

Cost factors are suitable for use

Right dimensioning models do not have to be done at the price of performance. Nowadays, organizations understand that the choice of model does not only mean the choice between GPT-4O or Llama-3.1; is to know that certain use cases, such as the summary or generation of code, are better served by a small model.

Daniel Hoske, director of technology at the AI ​​product contact center, supplier Cretesaid development development with LLMS better informs potential cost savings.

“You should start with the biggest model to see if what you plan to work even, because if it does not work with the biggest model, that does not mean that it would with smaller models,” he said.

Ramgopal said LinkedIn follows a similar model because prototyping is the only way for these problems to start emerging.

“Our typical approach to agent use cases begins with the LLM for general use, because their wideness of generalization allows us to prototyper quickly, to validate hypotheses and to assess the adjustment of the product market,” said Ramgopal de Linkedin. “As the product matures and we encounter constraints around quality, cost or latency, we go to more personalized solutions.”

In the experimental phase, organizations can determine what they appreciate most of their AI applications. Understanding allows developers to better plan what they want to save and select the size of the model that best suits their objective and their budget.

Experts have warned that even if it is important to build with models that work best with what they develop, LLMS with high parameter will always be more expensive. Large models will always require significant calculation power.

However, the overuse of small and specific models also poses problems. Rahul Pathak, data vice-president and GTM AI AWSsaid in a blog article that costs optimization does not only come from the use of a model with low computing power needs, but rather from the correspondence of a model to tasks. Smaller models may not have a context window large enough to understand more complex instructions, which leads to an increased workload for human employees and higher costs.

Sengupta also warned that some distilled models could be brittle, so long -term use may not cause savings.

Evaluate

Whatever the size of the model, the players in the industry highlighted flexibility to solve potential problems or new use cases. So, if they start with a large model and a smaller model with similar or better performance and lower costs, organizations cannot be precious about their chosen model.

Tessa Burg, CTO and innovation manager at Brand Marketing Company Against-uptold Venturebeat that organizations must understand that everything they build now will always be replaced by a better version.

“”We have started with the state of mind that technology under the workflows we create, the processes that we make more effective, will change. We knew that any model we use will be the worst version of a model. »»

Burg said the smaller models help saved his business and client in the research and development of concepts. Time, she saved time, it leads to budgetary savings over time. She added that it is a good idea to burst out high -cost and high frequency use cases for light models.

Sengupta noted that suppliers facilitate automatic switch between models, but have warned users to find platforms that also facilitate fine adjustment, so that they do not engage in additional costs.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *