517 Open Source Scraper Software Projects
Free and open source scraper code projects including engines, APIs, generators, and tools.
Huginn Huginn 29783 ⭐
Create agents that monitor and act on your behalf. Your agents are standing by!
Avbook 7247 ⭐
AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Weibo_terminater 2275 ⭐
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Realsirjoe Instagram Scraper 1648 ⭐
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Felipecsl Wombat 1209 ⭐
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Scrapoxy 1226 ⭐
Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!
Node Website Scraper 849 ⭐
Download website to local directory (including all css, images, js, etc.)
Imdbpy 714 ⭐
IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies
Operative Framework 469 ⭐
operative framework is a OSINT investigation framework, you can interact with multiple targets, execute multiple modules, create links with target, export rapport to PDF file, add note to target or results, interact with RESTFul API, write your own modules.
Jikan Me Jikan 462 ⭐
Unofficial MyAnimeList PHP+REST API which provides functions other than the official API
Advanced Web ScrAPIng Tutorial 381 ⭐
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
Php Goose 372 ⭐
Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
Freshonions Torscraper 333 ⭐
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Xidel 301 ⭐
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Hquery.php 290 ⭐
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Webinspector 288 ⭐
Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its meta, links, images more.
Weibo_terminator_workflow 262 ⭐
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
Cryptocmd 263 ⭐
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Eracle Linkedin 273 ⭐
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Heroku_ebooks 249 ⭐
A script to generate Markov chains and to post to an _ebooks account on Twitter using Heroku
Java Spider 260 ⭐
Goose Parser 210 ⭐
Universal scrapping tool, which allows you to extract data using multiple environments
Scrape Linkedin Selenium 193 ⭐
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Jacktuck Unfurl 177 ⭐
Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based :zap:
Readablewebproxy 172 ⭐
Rewriting web proxy and archival tool. At this point, it just tries to download all the things.
Goribot 175 ⭐
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Urs 185 ⭐
Universal Reddit Scraper - Scrape Subreddits, Redditors, and submission comments. A command-line tool written in Python (PRAW).
Gmdb 170 ⭐
GMDB is the ultra-simple, cross-platform Movie Library with Features (Search, Take Note, Watch Later, Like, Import, Learn, Instantly Torrent Magnet Watch)
Media Scraper 173 ⭐
Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
Javgo 225 ⭐
Anime Dl 165 ⭐
Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Serpscrap 141 ⭐
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Skrape.it 144 ⭐
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.