Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs


Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more


Alibaba group has introduced Qwenlong-L1A new frame that allows important language models (LLM) to reason on extremely long inputs. This development could unlock a new wave of business applications that require models to understand and draw information from extensive documents such as detailed companies, long financial statements or complex legal contracts.

The challenge of long -shape reasoning for AI

Recent advances in significant reasoning models (LRM), in particular through learning to strengthen (RL), have considerably improved their problem solving capacities. Research shows that when formed with RL Fineding, LRM acquire skills similar to humans “slow thought“, Where they develop sophisticated strategies to combat complex tasks.

However, these improvements are mainly observed when the models operate with relatively short pieces of text, generally around 4,000 tokens. The ability of these models to evolve their reasoning to much longer contexts (for example, 120,000 tokens) remains a major challenge. Such long -term reasoning requires a solid understanding of the entire context and the capacity to carry out a several stages analysis. “This limitation represents an important obstacle to practical applications requiring interaction with external knowledge, such as in-depth research, where LRMs must collect and process information from high knowledge of knowledge”, Qwenlong-L1 developers write in their paper.

Researchers formalize these challenges in the concept of “RL of long -term reasoning”. Unlike the short context reasoning, which is often based on knowledge already stored in the model, long -term RL Context reasoning requires models to recover and found relevant information from long inputs with precision. It is only then that they can generate reasoning chains according to this incorporated information.

The training models to this via RL are delicate and often cause ineffective learning processes and unstable optimization. Models find it difficult to converge on good solutions or lose their ability to explore various reasoning paths.

Qwenlong-L1: a several stages approach

Qwenlong-L1 is a strengthening learning framework designed to help LRM to move from competence with short texts to robust generalization in long contexts. The framework improves LRM of short-contained context via a process with carefully structured several stages:

Adjusted final warm -up (SFT): The model first undergoes an SFT phase, where it is formed on examples of long -context reasoning. This step establishes a solid base, allowing the model of earth information with precision from long entrances. It helps develop fundamental capacities in understanding the context, generation of logical reasoning chains and responses.

RL Phasé guided by programs: At this stage, the model is drawn through several phases, the target length of input documents gradually increasing. This systematic and step -by -step approach helps the model to adapt in a stable manner its shorter short -term reasoning strategies to gradually longer. It avoids instability often observed when the models are suddenly formed on very long texts.

Retrospective difficulty sampling: The final training phase incorporates difficult examples of the previous training phases, ensuring that the model continues to learn the most difficult problems. This prioritizes difficult cases and encourages the model to explore more diverse and more complex reasoning paths.

Qwenlong-l1 process (source: arxiv)
Qwenlong-l1 Source of the process: Arxiv

Beyond this structured training, Qwenlong-L1 also uses a distinct reward system. While the training for reasoning tasks in a short context is often based on strict rewards based on rules (for example, a good answer in a mathematical problem), Qwenlong-L1 uses a hybrid reward mechanism. This combines verification based on the rules, which guarantees precision by checking strict adhesion to the criteria of accuracy, with a “LLM-A-A-Judge. “This model of judge compares the seanticity of the response generated with the truth of the soil, allowing greater flexibility and better management of the various ways, correct answers can be expressed when dealing with long and nuanced documents.

Put Qwenlong-L1 to the test

The Alibaba team evaluated Qwenlong-L1 using the answer to document questions (DOCQA) as the main task. This scenario is very relevant for business needs, where AI must include dense documents to answer complex questions.

The experimental results on seven Long context Docqa poisoned have shown the capabilities of Qwenlong-L1. In particular, the Qwenlong-L1-32B model (based on Deepseek-R1-Distill-Qwen-32B) has achieved a performance comparable to Claude-3.7 Sonnet ThinkingAnd models have outperformed as Openai O3-min and Qwen3-235B-A22B. The smallest Qwenlong-L1-14B model has also surpassed Google Gémini 2.0 Flash Thinking and Qwen3-32B.

Source: Arxiv
Source: Arxiv

An important observation relevant for applications of the real world is the way in which RL training leads to the development of long -term specialized long -term reasoning behaviors. The document notes that the models formed with Qwenlong-L1 become better for “earthing” (binding the responses to specific parts of a document), “undersender parameter” (decomposition of complex questions), “backtrack” (recognize and correct their own errors in the middle of the season) and “verification” (repetition of their responses).

For example, although a basic model can be diverted by non-relevant details in a financial document or be stuck in an unrelated analysis loop, the Qwenlong-L1 model has demonstrated an ability to engage in an effective self-reflection. He could successfully filter these distractor details, go back from incorrect paths and arrive at the right answer.

Techniques like Qwenlong-L1 could considerably extend the usefulness of AI to the company. Potential requests include legal technology (analysis of thousands of pages of legal documents), finance (in -depth research on annual reports and financial deposits for risk assessment or investment opportunities) and customer service (analysis of long customer interaction history to provide more informed support). The researchers published the Qwenlong-L1 recipe code and the Weight for qualified models.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *