
In this tutorial, we build a full Crawlee-for-Python workflow that covers environment setup, local website generation, static crawling, dynamic crawling, structured extraction, and downstream data processing. We begin by configuring a compatible Crawlee runtime with pinned Pydantic support, Playwright browser installation, persistent storage directories, and Colab-safe execution handling. We then generate a realistic local demo website containing product pages, documentation pages, blog content
This is a tutorial on building web crawling pipelines using Crawlee for Python, covering three types of crawlers: BeautifulSoupCrawler for fast HTML extraction, ParselCrawler for precise CSS and XPath-based extraction, and PlaywrightCrawler for rendering JavaScript content in a headless browser. The tutorial demonstrates practical workflows including environment setup, generation of a demo website with product pages and documentation, extraction of structured data like titles and metadata, and handling of dynamic DOM elements. These techniques matter because they enable automated data collection from websites at different complexity levels, from static HTML to JavaScript-rendered content, with features for respecting robots.txt rules and exporting data for downstream processing like retrieval-augmented generation (RAG) applications.

The all-cash deal gives MoEngage access to technology that assigns AI agents to individual customers.

A new update for Google Home could make it less likely your smart home cameras mistake you for someone else, just because you're facing away from the camera. Starting June 23rd, Google's expanding its facial recognition feature so that people you've tagged in your Familiar Faces library can continue to be identified when their faces aren't clearly visible, using "additional non-biometric signals (body size, clothing color, etc.)." The Familiar Faces library will also begin aut

In this tutorial, we build a speech recognition and translation workflow using NVIDIA Canary-1B-v2. We begin by setting up the required audio, NeMo, NumPy, and SciPy dependencies, then load the Canary model on a GPU-enabled runtime for efficient inference. From there, we prepare audio into a clean 16 kHz mono format, perform English ASR, translate speech into multiple languages, generate word and segment timestamps, export translated subtitles as an SRT file, test long-form transcription, run b
Want to go deeper than the news? Explore live, cohort-based AI courses taught by practitioners.
Browse AI courses on Maven