• The 79
  • Posts
  • No AI model can score more than 2% in this new benchmark

No AI model can score more than 2% in this new benchmark

Hello everyone! Here’s what you need to know about AI today:

👉 All of the current AI models have failed in the new ARC-AGI-2 benchmark

👉 Another hyper-realistic image generation model is released

👉 OpenAI is delaying the rollout of its new 4o image generation model to free users

and many more!

📧 Did someone forward you this email? Subscribe here for free to get the latest AI news everyday!

Read time: 5 minutes

ARC PRIZE

New AGI test reveals AI’s persistent shortcomings

Source: Arc Prize

What’s going on: The Arc Prize Foundation recently introduced ARC-AGI-2, a new benchmark designed to evaluate artificial general intelligence by testing AI models on their ability to solve visual pattern puzzles. Unlike its predecessor, ARC-AGI-1, this updated test emphasizes efficiency and adaptability, aiming to measure how well AI can tackle unfamiliar problems without leaning on brute computational force.

What does it mean: The results are stark; leading AI models from companies like OpenAI, Anthropic, Google, and DeepSeek scored between 1% and 1.3%, while a human baseline, established by over 400 participants, averaged 60%. This gap underscores a persistent limitation in current AI systems, despite their skillfulness in specific domains, they struggle with the kind of flexible reasoning humans take for granted.

More details: 

  • The test’s design targets a core issue in AI development: the reliance on massive computing power and memorized patterns rather than genuine problem-solving ability. ARC-AGI-2 introduces an efficiency metric, challenging models to interpret patterns dynamically.

  • For context, OpenAI’s o3 model previously hit 75.7% on ARC-AGI-1, but even advanced “reasoning” models like o1-pro and DeepSeek-R1 failed on the new version.

  • Interested in details? Read this official blog post published on Arc Prize Foundation website.

IDEOGRAM

Ideogram 3.0 unveils advanced AI image generation capabilities

Source: Ideogram | This image is generated by Ideogram 3! It’s amazing how realistic it looks.

What’s going on: Ideogram launched its next-gen image generation model “Ideogram 3.0” yesterday. This update, now freely accessible to all users via ideogram.ai and its iOS app, introduces a suite of features aimed at enhancing realism, creativity, and usability. The model excels in photorealism, delivering detailed scenes with precise lighting and color control, and sets a new standard in text rendering, adept at producing complex, stylized text for graphic design, advertising, and marketing.

What does it mean: A key innovation is the “Style References” feature, allowing users to upload up to three images to guide the aesthetic of their generations, bypassing the limitations of text-based prompting for a more efficient and expressive workflow. Additionally, a Random Style option taps into a library of 4.3 billion presets, enabling users to explore unique visual outputs effortlessly, with the ability to save and reuse preferred styles via codes.

More details: 

  • In human evaluations, Ideogram 3.0 consistently outperformed competitors like Google’s Imagen 3, Flux Pro 1.1, and Recraft V3, achieving the highest rating across diverse prompts testing various capabilities, styles, and composition challenges.

  • This new update also enhances the “Describe” feature for richer scene descriptions and integrates with Canvas for advanced editing options like Magic Fill, Extend, and Replace Background.

  • Early access users received 10,000 priority credits to test these capabilities.

  • Interested in the details, benchmarks, and many cool examples? Read their official blog post.

💰 Nvidia is in talks to acquire Lepton AI, a server rental company powered by Nvidia's AI chips, for several hundred million dollars.

🖼 ChatGPT's new native AI image feature “4o Image Generation“ is delayed for free users due to unexpectedly high demand. “Images in ChatGPT are wayyyy more popular than we expected (and we had pretty high expectations),” Sam Altman said in a post on X yesterday.

💻 OpenAI is adopting Anthropic's Model Context Protocol (MCP), an open-source standard that allows AI models to draw data from various sources, like business tools and software, to produce more relevant responses; this move will integrate MCP support across OpenAI products.

📦 Amazon has launched "Interests," a new AI-powered feature that allows users to enter personalized shopping prompts reflecting their interests and preferences, leveraging AI to provide more relevant product suggestions and notifications on new items, restocks, and deals.

🚫 A recently leaked database reveals that China has developed an AI system, trained on 133,000 examples, to enhance its censorship capabilities by identifying and flagging content deemed sensitive by the government, including subtle dissent and topics that could stir social unrest.

💸 The Amazon Alexa Fund is expanding its investment strategy beyond voice technology to include AI-enabled hardware, generative media, smart agents, and emerging AI architectures, with recent investments in four startups: NinjaTech AI, Hedra, Ario, and HeyBoss.

📞 Krisp, an AI audio startup, has launched a new AI-powered feature that can change a user's accent during calls, initially supporting the conversion of Indian English accents to US English, with the goal of improving understanding and communication.

AI + Automobile mechanic

Act as an experienced car mechanic with 15+ years experience. I need somebody with expertise on automobiles regarding troubleshooting solutions like; diagnosing problems/errors present both visually & within engine parts in order to figure out what's causing them (like lack of oil or power issues) & suggest required replacements while recording down details such as fuel consumption type. My first inquiry is “ My car won't start although battery is fully charged, what should I do”

GPT-4o-mini’s answer

Thank you for staying with us like always! If you are not subscribed, subscribe here for free to get more of these emails in your inbox! Cheers!