Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

AI models need more standards and tests, say researchers


While the use of artificial – benign and adversary intelligence – increases at dizzying speed, more cases of potentially harmful responses are discovered.

Pixdeluxe | E + | Getty images

While the use of artificial – benign and adversary intelligence – increases at dizzying speed, more cases of potentially harmful responses are discovered. These include hate speech,, copyright violation Or sexual content.

The emergence of these undesirable behaviors is aggravated by a lack of regulations and insufficient tests of AI models, researchers told CNBC.

Being with automatic learning models to the way it was intended to do is also a major challenge, said Javier Rando, AI researcher.

“The answer, after almost 15 years of research, is, no, we do not know how to do this, and it does not seem that we were better,” said Rando, which focuses on contradictory automatic learning, at CNBC.

However, there are ways to assess risks in AI, such as red team. The practice implies that individuals test and probe artificial intelligence systems to discover and identify any potential damage – a common opera modus in cybersecurity circles.

Shayne LongPre, AI and Politicians and Responsible researcher Data initiativenoted that there are currently insufficient people working in red teams.

While IA startups now use first -party assessors or second parts contracted to test their models, the opening of tests to third parties such as normal users, journalists, researchers and ethical pirates would lead to a more robust assessment, according to An article published by LongPre and researchers.

“Some of the defects in the systems that people found obliged lawyers required, doctors to actually verify, real scientists who are specialized experts to determine whether it was a defect or not, because the ordinary person probably could not or would not have sufficient expertise,” said LongPre.

The adoption of standardized reports of “IA defects”, incentives and means to disseminate information on these “defects” in AI systems is some of the recommendations presented in the document.

This practice having been successfully adopted in other sectors such as software security, “we need it in AI now,” added LongPre.

Ban this user -centered practice with governance, policy and other tools would guarantee a better understanding of the risks posed by AI tools and users, Handey said.

We are looking for an AI development path which is extremely harmful to many people, explains Karen Hao

Plus a moonshot

Project Moonshot is one of these approaches, combining technical solutions with policy mechanisms. Launched by the Infocomm Media Development Authority of Singapore, Project Monshot is a model for model assessment of large language developed with industry players such as IBM and Boston IT robot.

The toolbox incorporates benchmarking, the red team and the reference tests. There is also an evaluation mechanism that allows AI startups to ensure that their models can be reliable and not to harm the users, the data engineering manager for data and the IBM Asia Pacific told CNBC.

Evaluation is a continuous process This should be done both before and after the deployment of models, said Kumar, who noted that the response to the toolbox was mixed.

“Many startups have taken this as a platform because it was open source, And they started to take advantage of this. But I think, you know, we can do much more. “”

In the future, Project Monshot aims to include personalization for specific use cases of industry and to allow a multilingual and multicultural red team.

Higher standards

Pierre Alquier, professor of statistics at ESSEC Business School, Asia-Pacific, said that technological companies are currently rush to release Their latest AI models without appropriate assessment.

“When a pharmaceutical company designs a new drug, it needs very serious tests and evidence that it is useful and not harmful before being approved by the government,” he noted, adding that a similar process is in place in the aviation sector.

AI models must fulfill a strict set of conditions before being approved, added Alquier. A distance from the wide AI tools to developments that are designed for more specific tasks would facilitate anticipation and control of their abusive use, said Alquier.

“LLM can do too much, but they are not intended for tasks that are sufficiently specific,” he said. Consequently, “the number of possible abuses is too large for the developers to be able to anticipate them”.

Such wide models make the definition of what matters as difficult and secure, according to A search for hike was involved in.

Technological companies should therefore avoid the overexraction that “their defenses are better than they are,” said Rando.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *