🚨 Alibaba Launches OpenAI Rival: Beats in Math & Code Reasoning

by


a { text-decoration: none; }

96

<!–#outlook a { padding:0;}.es-button { mso-style-priority:100!important; text-decoration:none!important;}a[x-apple-data-detectors] { color:inherit!important; text-decoration:none!important; font-size:inherit!important; font-family:inherit!important; font-weight:inherit!important; line-height:inherit!important;}.es-desk-hidden { display:none; float:left; overflow:hidden; width:0; max-height:0; line-height:0; mso-hide:all;}@media only screen and (max-width:600px) {p, ul li, ol li, a { line-height:150%!important } h1, h2, h3, h1 a, h2 a, h3 a { line-height:120%!important } h1 { font-size:30px!important; text-align:left!important } h2 { font-size:20px!important; text-align:left!important } h3 { font-size:18px!important; text-align:left!important } .es-header-body h1 a, .es-content-body h1 a, .es-footer-body h1 a { font-size:30px!important; text-align:left!important } .es-header-body h2 a, .es-content-body h2 a, .es-footer-body h2 a { font-size:20px!important; text-align:left!important } .es-header-body h3 a, .es-content-body h3 a, .es-footer-body h3 a { font-size:18px!important; text-align:left!important } .es-menu td a { font-size:13px!important } .es-header-body p, .es-header-body ul li, .es-header-body ol li, .es-header-body a { font-size:16px!important } .es-content-body p, .es-content-body ul li, .es-content-body ol li, .es-content-body a { font-size:16px!important } .es-footer-body p, .es-footer-body ul li, .es-footer-body ol li, .es-footer-body a { font-size:16px!important } .es-infoblock p, .es-infoblock ul li, .es-infoblock ol li, .es-infoblock a { font-size:12px!important } *[class=”gmail-fix”] { display:none!important } .es-m-txt-c, .es-m-txt-c h1, .es-m-txt-c h2, .es-m-txt-c h3 { text-align:center!important } .es-m-txt-r, .es-m-txt-r h1, .es-m-txt-r h2, .es-m-txt-r h3 { text-align:right!important } .es-m-txt-l, .es-m-txt-l h1, .es-m-txt-l h2, .es-m-txt-l h3 { text-align:left!important } .es-m-txt-r img, .es-m-txt-c img, .es-m-txt-l img { display:inline!important } .es-button-border { display:inline-block!important } a.es-button, button.es-button { font-size:16px!important; display:inline-block!important; border-left-width:0px!important; border-right-width:0px!important } .es-adaptive table, .es-left, .es-right { width:100%!important } .ac-footer, .es-content table, .es-header table, .es-footer table, .es-content, .es-footer, .es-header { width:100%!important; max-width:600px!important } .ac-footer td, .ac-footer p, .es-adapt-td { display:block!important; width:100%!important } .adapt-img { width:100%!important; height:auto!important } .es-m-p0 { padding:0!important } .es-m-p0r { padding-right:0!important } .es-m-p0l { padding-left:0!important } .es-m-p0t { padding-top:0!important } .es-m-p0b { padding-bottom:0!important } .es-m-p20b { padding-bottom:20px!important } .es-mobile-hidden, .es-hidden { display:none!important } tr.es-desk-hidden, td.es-desk-hidden, table.es-desk-hidden { width:auto!important; overflow:visible!important; float:none!important; max-height:inherit!important; line-height:inherit!important } tr.es-desk-hidden { display:table-row!important } table.es-desk-hidden { display:table!important } td.es-desk-menu-hidden { display:table-cell!important } .es-menu td { width:1%!important } table.es-table-not-adapt, .esd-block-html table { width:auto!important } table.es-social { display:inline-block!important } table.es-social td { display:inline-block!important } .es-desk-hidden { display:table-row!important; width:auto!important; overflow:visible!important; max-height:inherit!important } .es-m-p5 { padding:5px!important } .es-m-p5t { padding-top:5px!important } .es-m-p5b { padding-bottom:5px!important } .es-m-p5r { padding-right:5px!important } .es-m-p5l { padding-left:5px!important } .es-m-p10 { padding:10px!important } .es-m-p10t { padding-top:10px!important } .es-m-p10b { padding-bottom:10px!important } .es-m-p10r { padding-right:10px!important } .es-m-p10l { padding-left:10px!important } .es-m-p15 { padding:15px!important } .es-m-p15t { padding-top:15px!important } .es-m-p15b { padding-bottom:15px!important } .es-m-p15r { padding-right:15px!important } .es-m-p15l { padding-left:15px!important } .es-m-p20 { padding:20px!important } .es-m-p20t { padding-top:20px!important } .es-m-p20r { padding-right:20px!important } .es-m-p20l { padding-left:20px!important } .es-m-p25 { padding:25px!important } .es-m-p25t { padding-top:25px!important } .es-m-p25b { padding-bottom:25px!important } .es-m-p25r { padding-right:25px!important } .es-m-p25l { padding-left:25px!important } .es-m-p30 { padding:30px!important } .es-m-p30t { padding-top:30px!important } .es-m-p30b { padding-bottom:30px!important } .es-m-p30r { padding-right:30px!important } .es-m-p30l { padding-left:30px!important } .es-m-p35 { padding:35px!important } .es-m-p35t { padding-top:35px!important } .es-m-p35b { padding-bottom:35px!important } .es-m-p35r { padding-right:35px!important } .es-m-p35l { padding-left:35px!important } .es-m-p40 { padding:40px!important } .es-m-p40t { padding-top:40px!important } .es-m-p40b { padding-bottom:40px!important } .es-m-p40r { padding-right:40px!important } .es-m-p40l { padding-left:40px!important } .h-auto { height:auto!important } .es-m-margin { padding-left:5px!important; padding-right:5px!important; padding-top:5px!important; padding-bottom:5px!important } }@media screen and (max-width:384px) {.mail-message-content { width:414px!important } }p a { text-decoration: underline !important; text-decoration-color: #2d59f5 !important; color: #1177eb !important; }

On Eleven Labs's NotebookLM alternative, ChatGPT integration with dev apps, Yann LeCun on AGI… 

Signup  |  Work With Us  |  Follow on X  |  Read on Web

.

Hey ,

Welcome to AlphaSignal – the most read newsletter by AI developers. 

We bring you the top 1% of news, papers, models, and repos, all summarized to keep you updated on the latest in AI.

IN TODAY’S SIGNAL

Read time: 5 min 38 sec

🎖️ Top News

📌 Intel

⚡️ Trending Signals

📝 Top Papers

  • Star Attention reduces LLM memory usage and inference time by 11x, enhancing long-sequence tasks.

  • DynaSaur framework allows LLM agents to dynamically generate and execute actions, improving flexibility.

  • Allegro outperforms commercial video generation models, improving quality and temporal consistency.

🧠 Python Tip

  • Track parallel task progress with Python’s ‘multiprocessing.Queue’ for real-time monitoring in ML workflows.

If you’re enjoying AlphaSignal please forward this email to a colleague. 

It helps us keep this content free.

TOP NEWS

Reasoning Model

Alibaba introduces an open-source model that reasons step-by-step and excels in math and programming tasks

⇧ 2,662 Likes

What’s New

Alibaba’s Qwen team releases QwQ-32B-Preview, an open-source reasoning model that competes with OpenAI’s o1 series.

The model offers a 32K context window, surpassing o1-mini and directly competing with o1-preview on key benchmarks. It focuses on solving complex problems in mathematics and programming through a deep introspective reasoning process.

Key Performance Metrics

  • 65.2% on GPQA, showing strong graduate-level scientific reasoning.
  • 50.0% on AIME, demonstrating good mathematical problem-solving skills.
  • 90.6% on MATH-500, proving solid mathematical comprehension across topics.
  • 50.0% on LiveCodeBench, validating its programming capabilities in real-world scenarios.

Core Innovation
QwQ’s core innovation lies in its deep introspection process, where it rethinks and refines its answers during problem-solving. This ability enables it to outperform other models on math and coding benchmarks but highlights the need for further development in areas like common sense and nuanced language understanding.

Challenges
Despite its strengths, QwQ-32B-Preview still faces challenges. These include entering recursive reasoning loops, occasional language mixing, and difficulties with common sense reasoning. These issues limit the model’s consistency in certain tasks, though it performs well in highly technical domains.

Availability

  • QwQ-32B-Preview is available for use on Hugging Face, where you can access the model, its documentation, and a demo.

  • The model is licensed under Apache-2.0 and can be integrated using Hugging Face’s transformers library (version 4.37.0 or later).


TRY NOW
<!–

Ready to elevate your Gen AI on solid, data-driven foundations?

Intel’s guide for data scientists shows you exactly how to optimize your workflow and boost your AI performance.

By following this guide, you will:

  • Enhance Data Quality: Clean and preprocess massive datasets faster with tools like Pandas and Modin, ensuring your AI models work with the highest-quality data.
  • Gain Insights Faster: Use powerful visualization tools like Matplotlib and Seaborn to quickly interpret data, make informed decisions, and develop better models.
  • Maximize Model Performance: Optimize training and inference with Intel’s tools like Pytorch and TensorFlow, increasing both speed and accuracy.
  • Deploy with Confidence: Streamline deployment with Intel’s optimized platforms like OpenVINO, ensuring your models perform efficiently across multiple hardware environments.

Intel’s AI frameworks and tools are optimized for speed, accuracy, and efficiency–giving you the edge to advance in the GenAI world.

Boost your data science expertise today and explore Intel’s resources designed to accelerate your journey. Ready to get started?


READ NOW
<!–

partner with us

TRENDING SIGNALS

Text-to-Speech

Eleven Labs launches GenFM: a tool to create personalized podcasts from PDFs, articles and eBooks in 32 languages

⇧ 1,924 Likes

Chatbot Update

OpenAI integrates ChatGPT MacOS app with popular developer apps like Cursor, Windsurf, JetBrains, and VS Code

⇧ 529 Likes

AGI

Yann LeCun predicts human-level AI possible within 5-10 years, aligning with Sam Altman and Demis Hassabis

⇧ 2,961 Likes

Model Training Optimization

PyTorch unveils Float8 training, accelerates model performance by 50%, and demonstrates it on Meta LLaMa models

⇧ 569 Likes

AI Agents

LangGraph releases guide with essential concepts for building AI agents, including memory and human-in-the-loop integration

⇧ 341 Likes

TOP PAPERS

Transformers

Star Attention: Efficient LLM Inference over Long Sequences

⇧ 1,263 Likes

Problem

Processing long sequences with LLMs is computationally expensive due to the quadratic complexity of self-attention, limiting tasks like multi-document summarization and large-scale retrieval. Existing methods reduce costs but often compromise scalability, accuracy, or real-time performance.

Solution
Star Attention, introduced by NVIDIA, uses a two-phase block-sparse attention mechanism. In phase one, it applies blockwise-local attention across distributed hosts. In phase two, query and response tokens use sequence-global attention to access cached tokens. This design reduces communication overhead and fits into existing global attention architectures.

Results
Star Attention reduces memory usage and inference time by up to 11x. It maintains 95–100% accuracy across tasks, improving performance for long-sequence models in real-world applications.

Agent Framework

DynaSaur: Large Language Agents Beyond Predefined Actions

⇧ 908 Likes

Problem

Existing LLM agent systems use a fixed set of actions, which limits their ability to plan and requires a lot of human work to list all possible actions in complex environments.

Solution
DynaSaur introduces a framework that allows LLM agents to dynamically create and execute actions using a general-purpose programming language. This method enables agents to generate programs at each step and accumulate actions over time for future use.

Results
The framework provides greater flexibility, outperforming traditional methods, and allows agents to recover when predefined actions fail or do not exist. DynaSaur holds the top spot on the GAIA public leaderboard, demonstrating its effectiveness in real-world applications.

Video Generation

Allegro: Open the Black Box of Commercial-Level Video Generation Model

⇧ 256 Likes

Problem

While the open-source community has made significant strides in video generation, existing resources are still insufficient to achieve commercial-level performance.

Solution
The paper introduces Allegro, a video generation model that excels in both quality and temporal consistency. It also highlights the current limitations in the field and presents a comprehensive methodology for training high-performance video generation models, covering key aspects such as data, model architecture, training pipeline, and evaluation.

Results
Allegro outperforms existing open-source models and most commercial models, ranking just behind top models like Hailuo and Kling, based on a user study.

PYTHON TIP

How to Track Progress in Parallel Processing

When running parallel tasks, tracking progress efficiently can help manage workflows and debug issues in large-scale ML or data processing pipelines.

Instead of Using


from multiprocessing import Pool

with Pool() as pool:
    results = pool.map(worker, tasks)

Use


from multiprocessing import Pool, Manager

def worker(task):
    return task ** 2

def listener(queue, total_tasks):
    completed = 0
    for _ in iter(queue.get, None): # Wait for task completion signals
        completed += 1
        print(f"Progress: {completed / total_tasks * 100:.2f}%")

if __name__ == "__main__":
    tasks = list(range(1, 101))  # Example task list
    with Manager() as manager:
        queue = manager.Queue()
        with Pool() as pool:
            # Start the progress listener in a separate process
            pool.apply_async(listener, (queue, len(tasks)))
            
            # Define worker wrapper to send task completion signals
            def wrapped_worker(task):
                result = worker(task)
                queue.put(1)  # Signal completion
                return result
            
            # Process tasks and wait for completion
            results = pool.map_async(wrapped_worker, tasks)
            results.wait()
            queue.put(None)  # Signal the listener to stop

The code efficiently tracks and displays real-time progress of parallel tasks using ‘multiprocessing.Queue‘ and a listener function. Each completed task sends a signal to update progress, ensuring accurate reporting during concurrent execution.

Benefits

  • Enables progress tracking.
  • Simplifies monitoring for long-running jobs.
  • Enhances pipeline transparency in multi-process ML workflows.

Stop receiving emails here

214 Barton Springs Rd, Austin, Texas, 78704, United States of America