- The 79
- Posts
- OpenAI's next-generation audio models are here
OpenAI's next-generation audio models are here

Hello pals! Here’s what you need to know about AI today:
👉 OpenAI released two new transcription and voice generation models
👉 Anthropic added a web search feature to Claude
👉 Meta AI is coming to Europe
and many more!
📧 Did someone forward you this email? Subscribe here for free to get the latest AI news everyday!
Read time: 5 minutes

OPENAI
OpenAI's next-generation audio models are here

Source: OpenAI
What’s going on: OpenAI has announced upgrades to its transcription and voice-generating AI models, which will be available through its API. The new text-to-speech model, 'gpt-4o-mini-tts,' is designed to provide more realistic and nuanced speech and allows developers to control how things are said using natural language instructions. The new speech-to-text models, 'gpt-4o-transcribe' and 'gpt-4o-mini-transcribe,' are replacements for the older Whisper model.
What does it mean: The voice transcription models are trained on high-quality audio datasets and are better at capturing varied speech, even in noisy environments, with fewer hallucinations. However, the transcription accuracy varies by language, with higher error rates for Indic and Dravidian languages. Unlike previous models (Whisper family), OpenAI does not plan to make these new transcription models openly available due to their size, focusing instead on models suitable for end-user devices for future open-source releases.
More details:
These models align with OpenAI's vision of building automated systems capable of independently performing tasks for users.
The maximum number of input tokens for GPT-4o-mini-tts model is 2,000. OpenAI charges $0.60 per 1 million input text tokens and $12 for a million output audio tokens.
For the GPT-4o Transcribe model, the context window is limited to 16,000 tokens and the company charges $2.5 and $10 for 1M input and 1M output tokens respectively. The maximum output tokens for this model is 2,000.
Want to try OpenAI’s new text-to-speech model? Visit this page and click on “Try in Playground“ button.
Interested in GPT-4o Transcribe? Visit this page.
To learn more about all of the details and the benchmarks, read OpenAI’s official blog post.
ANTHROPIC
Web Search feature is now available in Claude

Source: Anthropic
What’s going on: Anthropic, the company behind the AI chatbot Claude, has introduced a new feature that allows the chatbot to search the web. This update brings Claude closer to the capabilities of competitors like ChatGPT, Gemini, Grok, and DeepSeek which already offer web search abilities. For now, the feature is available in a preview mode for paid Claude users in the US, with plans to expand it to free users and other countries later.
What does it mean: Anthropic had previously designed Claude to rely solely on its internal knowledge base rather than real-time internet access. Now, users can activate web search through their profile settings in the Claude web app, and it only works with the latest model, Claude 3.7 Sonnet. When Claude uses web data, it includes citations, making it easier for users to check the sources themselves.
More details:
This feature also introduces the risk of errors, as other AI models have struggled with accuracy when pulling from the web. Recent studies show ChatGPT and Gemini can get things wrong over 60% of the time, and there’s a chance Claude could face similar issues.
Claude web search can be used to enhance various tasks, such as improving sales strategies, making informed investment decisions, strengthening research proposals, and facilitating more informed shopping choices.
Interested in Claude and its new web search feature? Try Claude 3.7 Sonnet.

👨💻 Manycore Tech has open-sourced its multimodal spatial comprehension model, SpatialLM, at GTC 2025, aiming to lower barriers and empower embodied intelligence training for robotics and 3D environment understanding. Check it out on Hugging Face.
💰 Perplexity is in early discussions to raise up to $1 billion in funding, which would value the company at $18 billion amid increasing competition in the AI search space and its expansion into new areas like 'agentic' browsers.
🤖 Nvidia released Cosmos-Transfer1, an innovative AI model that generates highly realistic, physics-aware simulations for training robots and autonomous vehicles. Check it out on GitHub.
🍎 Apple is reorganizing its AI leadership, with Mike Rockwell, VP of the Vision Products Group, taking over the Siri team from John Giannandrea, as the company seeks to improve Siri's performance.
✂ Pruna AI has open-sourced its AI model optimization framework that combines advanced compression techniques like caching, pruning, quantization, and distillation to create faster, smaller, and more efficient AI models without significant quality loss. (GitHub Repository)
☀ Microsoft is expanding its renewable energy portfolio by adding 475 megawatts of solar power through a partnership with AES, involving three solar projects across Illinois, Michigan, and Missouri to meet the rising energy demands of its data centers.
🚀 Meta AI is launching in the EU with a limited "intelligent chat" function across its social platforms after facing regulatory hurdles concerning user data and privacy. The EU version of Meta AI hasn't been trained on local user data and will initially offer basic chatbot capabilities in 6 languages.


AI + Research synthesis
Provide a detailed summary of the latest advancements in [specific field, e.g., quantum computing] over the past year, including key players, breakthroughs, and potential applications, drawing from credible web sources and recent X posts.
Grok 3’s answer

Celonis - Senior Product Data Scientist
BetterUp - Senior Data Scientist
PermitFlow - Machine Learning, Software Engineer
Tenable - Staff Data Scientist
Thank you for staying with us like always! If you are not subscribed, subscribe here for free to get more of these emails in your inbox! Cheers!