
In this tutorial, we build a speech recognition and translation workflow using NVIDIA Canary-1B-v2. We begin by setting up the required audio, NeMo, NumPy, and SciPy dependencies, then load the Canary model on a GPU-enabled runtime for efficient inference. From there, we prepare audio into a clean 16 kHz mono format, perform English ASR, translate speech into multiple languages, generate word and segment timestamps, export translated subtitles as an SRT file, test long-form transcription, run b
NVIDIA Canary-1B-v2 is a speech recognition model that can transcribe audio, translate speech into multiple languages, and generate timestamped subtitles. The tutorial demonstrates how to set up the necessary software dependencies, load the model on a GPU for efficient processing, prepare audio files in the required format, and run various tasks including English speech recognition, translation into supported languages, and automatic subtitle file generation. This workflow enables building a multilingual speech recognition and translation pipeline for applications like subtitle generation and large-scale transcription of audio files.

The all-cash deal gives MoEngage access to technology that assigns AI agents to individual customers.

A new update for Google Home could make it less likely your smart home cameras mistake you for someone else, just because you're facing away from the camera. Starting June 23rd, Google's expanding its facial recognition feature so that people you've tagged in your Familiar Faces library can continue to be identified when their faces aren't clearly visible, using "additional non-biometric signals (body size, clothing color, etc.)." The Familiar Faces library will also begin aut

A scan of an imaging phantom, segmented to validate how cleanly structures separate under controlled conditions. | Image: Midjourney Medical Last week, Midjourney, an AI startup best known for its image generator, made an unusual pivot: medical imaging. The company announced a futuristic ultrasound scanner that would dunk users into a vat of water and, hopefully, produce "something as powerful as MRI" yet "as casual as a trip to the spa." Midjourney says the goal is to help peo
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven