Prague, Czech Republic, July 23, 2024 – Apify, the world’s leading cloud platform for developing and running web scraping solutions, is excited to announce the launch of Crawlee for Python, a web scraping and browser automation library that helps users build fast and reliable crawlers.
Crawlee was created by a team of experts who scrape for a living and extract data from millions of web pages daily. Building upon the original Crawlee for Node.js, launched in 2022, Crawlee for Python offers an open-source solution that simplifies web crawler development.
“One of the main advantages of Crawlee is that the library has a single interface for both HTTP and headless browsers,” says Jan Čurn, CEO of web scraping and automation platform Apify. “You can write your crawlers using the same base abstraction, and the framework takes care of the heavy lifting such as parallelization, proxy rotation, and scaling.”
Crawlee for Python is developed and maintained by Apify. With clients including Siemens, Intercom, Microsoft, Groupon, and Accenture, Apify has become acclaimed in the industry for its innovative web scraping platform and marketplace for developers to monetize their software. Its open-source web scraping library, Crawlee, is designed to help devs build and maintain their crawlers faster.
“Developers of scrapers shouldn’t need to reinvent the wheel and can just focus on building the ‘business’ logic of their scrapers,” Čurn adds.
Some of the key features of the Crawlee for Python launch include:
- Unified interface for HTTP and headless browser crawling.
- HTTP: HTTPX with Beautiful Soup.
- Headless browser: Users can switch their browsers from HTTP to a headless browser in 3 lines of code. Accessible with Chrome, Firefox, and other popular browsers, Crawlee builds on top of Playwright and adds its own features.
- Automatic parallel crawling based on available system resources.
- Written in Python with type hints to offer better DX (IDE autocompletion) and fewer bugs (static type checking).
- Automatic retries on errors or when you’re getting blocked.
- Integrated proxy rotation and session management.
- Configurable request routing – direct URLs to appropriate handlers.
- Persistent queue for URLs to crawl.
- Pluggable storage of both tabular data and files.
- Crawlee is built on Asyncio, so it’s fully asynchronous.
With an active Discord community of over 8,000 web scraping developers, an array of excellent benefits, and fully open source, Crawlee for Python prioritizes high-quality, readable, and maintainable code and reliable crawlers.
Apify encourages anyone interested in learning more about its Crawlee for Python announcement to try out the new web scraping and automation library today on the Crawlee website, where they can also join the Discord community.