- The 79
- Posts
- Everybody is scared of Chinese AI
Everybody is scared of Chinese AI
Hi everyone! Here’s what you need to know about AI today:
👉 A new tough benchmark challenges AI
👉 OpenAI’s AI agent is here now
👉 Everybody is freaking out about Deepseek
and many more!
📧 Did someone forward you this email? Subscribe here for free to get the latest AI news everyday!
Read time: 5 minutes
AI BENCHMARK
Humanity’s Last Exam that AI can't pass yet
Source: Scale AI
What’s going on: The Center for AI Safety (CAIS), a nonprofit organization, and Scale AI, a firm offering data labeling and AI development services, have introduced a rigorous new benchmark designed to test advanced AI systems. The benchmark, titled Humanity’s Last Exam, features thousands of crowdsourced questions spanning topics such as mathematics, humanities, and natural sciences. None of the top mainstream AI models (OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Google Gemini 1.5 Pro, and OpenAI o1) could score more than 10% on this benchmark. (Read about the study)
What does it mean: Having better benchmarks is a must for building more capable AI. As you know, the latest models like OpenAI’s o1 and o3, or Deepseek-R1 can score 80%+ on current toughest benchmarks, but we all know that they are still not there intelligent-wise. When Dan Hendrycks, CAIS co-founder, released the MATH benchmark back in 2021, the best model scored less than 10% and few predicted that scores higher than 90% would be achieved just three years later. We have to wait and see how much time does it take for AI models to beat this new benchmark.
More details:
To increase the difficulty of this benchmark, the questions are presented in various formats, including those with diagrams and images.
CAIS and Scale researchers collected more than 70,000 trial questions initially. Eventually they finalized the dataset to a set of 3,000 questions.
Humanity’s Last Exam was a global collaborative effort involving nearly 1,000 contributors from more than 500 institutions across 50 countries.
OPENAI
OpenAI's AI agent "Operator" is officially out
Source: OpenAI
What’s going on: OpenAI announced yesterday that it is launching a research preview of Operator, a general-purpose AI agent that can take control of a web browser and independently perform certain actions. Operator is coming to US users on ChatGPT’s $200 Pro subscription plan first. OpenAI says it plans to roll this feature out to more users in its Plus, Team, and Enterprise tiers eventually.
What does it mean: Every major tech company has announced their agentic AI products or at least plans. It was finally OpenAI’s turn to step into the AI agents market as the industry leader. Operator promises to automate frustrating tasks such as booking travel accommodations, making restaurant reservations, and shopping online, according to OpenAI.
More details:
Operator is driven by a Computer-Using Agent model (CUA), which blends the visual capabilities of its GPT-4o model with the reasoning skills of its more advanced AI systems.
CUA is specially trained to interact with the front-end of websites.
Operator had a 87% success rate on WebVoyager, a test of live website navigation and 58.1% success rate on WebArena, which simulates real-world ecommerce and content management scenarios.
Operator already has strong competitors. Yesterday, ByteDance (TikTok’s parent company) launched its own AI agent, UI-TARS, for controlling web browsers and performing actions on a user’s behalf. The interesting part is that it is totally free and open-source compared to OpenAI’s $200/month price tag for the Operator!
🏭 Reliance, an Indian multi-national holding, is reportedly planning to build the world's largest AI data center in India with a capacity of 3 gigawatts, 5 times more than the current largest data center in the world, Microsoft’s 600-megawatt site in Virginia.
🔍 Anthropic has introduced a new citations feature designed to reduce errors in AI-generated responses by providing sources for its answers.
💰 OpenAI and SoftBank are reportedly investing $19 billion each into Stargate, the America’s controversial AI venture.
📱 Perplexity has launched a new AI-powered assistant, “Perplexity Assistant“ for Android devices. It is multi-modal, it can use your phone’s camera, and uses reasoning, search, and apps to help with daily tasks.
👨🏼💻 Hackers are now using a new AI tool called GhostGPT, designed for cybercrime, to carry out more sophisticated and dangerous attacks. They are using it to facilitate malware creation and phishing attacks.
🤯 Everyone in the AI industry is freaking out about Deepseek, a Chinese AI company who has recently stunned the world with its high-performing AI models. Folks at OpenAI and even Meta’s Llama creators are buzzing about the latest Deepseek model, R1, which is more capable than OpenAI’s top-notch o1 model, is 95% cheaper, much faster and open-source!
Time management with AI
I have an interview in two weeks. I want to maximize my productivity for the upcoming two weeks. Please create a detailed weekly plan based on these tasks and priorities:
1. Preparing for the interview
2. Studying for an exam
My working hours are from 8:30 to 5 on weekdays, and I can dedicate 6 hours a day on the weekend. Prioritize urgent tasks first, leave buffer time for unforeseen tasks, and include enough time for relaxation or personal development. I want you to give me a weekly plan in table format.
Deepseek-V3’s answer
Lyft - Senior Software Engineer, AI
Notion - Software Engineer, AI Product
OpenAI - Data Scientist, Business
Thank you for staying with us like always! If you are not subscribed, subscribe here for free to get more of these emails in your inbox! Cheers!