- The 79
- Posts
- Gemini now has a Canvas + Audio Overview feature
Gemini now has a Canvas + Audio Overview feature

Hi everyone! Here’s what you need to know about AI today:
👉 Stability AI launched a tool that converts images to 3D video scenes
👉 Google added two new features to Gemini, Canvas + Audio Overview
👉 Nvidia released new models for physical AI
and many more!
📧 Did someone forward you this email? Subscribe here for free to get the latest AI news everyday!
Read time: 5 minutes

STABILITY AI
Generating 3D video scenes from 2D images is now possible

Source: Stability AI
What’s going on: Stability AI, the company behind the well-known Stable Diffusion image generation model, has launched a new tool called Stable Virtual Camera, designed to convert 2D images into 3D video scenes. This AI model takes one or more images, up to 32, and generates videos with a sense of depth and perspective, allowing users to define camera angles or choose from preset paths like “Spiral” or “Dolly Zoom.” The resulting videos can be produced in various aspect ratios, including square, portrait, and landscape, with a maximum length of 1,000 frames.
What does it mean: This is a practical step toward making 3D content creation more accessible, especially for those familiar with digital filmmaking or animation tools, where virtual cameras are already widespread. The tool is currently available as a research preview under a noncommercial license on Hugging Face.
More details:
Stable Virtual Camera isn’t without limitations. It struggles with certain scenarios, such as images of people, animals, or tricky textures like water, and can produce flickering or artifacts when dealing with complex scenes or unusual camera movements.
Stability AI seems to be targeting creators who want to explore generative AI without needing extensive 3D modeling expertise, offering a downloadable option for those willing to play around with it.
Interested in learning how they built it? Read their technical paper.
For downloading the model’s weights, visit this Hugging Face page and for accessing the source-code check out this GitHub repository.
Google adds a ‘Canvas’ and Audio Overview feature to Gemini
Source: Gemini | Gemini’s new ‘Canvas‘ feature
What’s going on: Google has rolled out two new features for its AI-powered Gemini chatbot, aiming to boost productivity and creativity for users. The first, called Canvas, is an interactive workspace that allows real-time collaboration on text documents and code. The second feature, Audio Overview, transforms uploaded documents, like notes, slides, or research reports into podcast-style audio summaries.
What does it mean: With Canvas, users can draft content, tweak it with Gemini’s help, adjusting things like tone or formatting, and even preview code outputs, such as HTML or React web prototypes. It’s built for practical use, letting you refine a speech, essay, or script without jumping between apps, and offers a straightforward export to Google Docs for team collaboration. Audio Overview generates a two-host discussion that breaks down the material, connecting key points in a conversational way. It’s handy for multitasking or reviewing content on the go, with options to download or share the audio
More details:
The Audio Overview feature is currently limited to English and it’s rolling out globally to both free and Gemini Advanced users, though the code preview in Canvas is web-only for now.
This comes as Google focuses on integrating Gemini deeper into its ecosystem to compete with ChatGPT and other rivals. The Canvas feature mirrors similar efforts by OpenAI (Canvas) and Anthropic (Artifacts), following a trend toward interactive AI workspaces.
How to use the Canvas: Open Gemini → click on “Canvas“ button in the prompt bar → write your prompt → Gemini will show you the response in a canvas.
How to use Audio Overview: Open Gemini → click on the plus (+) icon in the prompt bar → upload a document (e.g. a PDF) → this will trigger the Audio Overview shortcut → click on the “Generate Audio Overview“ button → enjoy the generated overview!

🩺 Google has introduced new healthcare-focused features, including AI-enhanced search results, medical record APIs, and open AI models for drug discovery, along with a pulse detection feature on the Pixel Watch 3.
🤖 Nvidia unveiled Groot N1, an open-source AI foundation model with a dual-system architecture for "fast and slow thinking," designed to power generalist humanoid robots in various form factors. (Read more)
💊 Google is developing and planning to release a collection of open AI models called TxGemma through its Health AI Developer Foundations program to help researchers predict the properties of potential new therapies and accelerate drug discovery.
🤖 Nvidia, Disney Research, and Google DeepMind are collaborating to develop Newton, a physics engine that will power Disney's next-generation entertainment robots, including Star Wars-inspired droids, with an early open-source version planned for release later in 2025.
🛰🔥 Google is funding FireSat, a satellite constellation equipped with AI and infrared sensors to detect and track wildfires as small as 5x5 meters globally, providing updated imagery every 20 minutes to aid emergency responders and improve wildfire modeling.
🚙 GM is partnering with Nvidia to integrate AI into its factories, robots, and self-driving cars, using Nvidia's GPUs and AI infrastructure to revolutionize manufacturing, enterprise operations, and in-vehicle experiences, including advanced driver-assistance systems.
📈 Mark Zuckerberg says Meta's Llama family of open-source AI models has reached 1 billion downloads, which is a significant increase since December 2024. Check out the latest Llama models.
💰 Graphite, an AI-powered code review platform backed by Anthropic, raised $52 million to further develop its platform, which leverages AI to provide feedback on code, suggest changes, and automatically catch coding bugs.


AI + Book summary
Summarize [book name] and extract the top 5 key lessons with actionable takeaways for personal growth.
GPT 4o-mini’s answer

Coupang - Senior Staff ML Infrastructure Engineer
First Street - Machine Learning Wildfire Scientist
Samsung Semiconductor - Staff Engineer, Machine Learning
Thank you for staying with us like always! If you are not subscribed, subscribe here for free to get more of these emails in your inbox! Cheers!