Various Scrapers
I have written various scrapers to extract and analyze data from websites.
25 October, 2023 • 2 min read
About My Scrapers
Let's dive right into the action, I've developed 3 automated processes so far, and I'll go through each one to give you a rough overview.
Little fun fact on the side: All my scrapers are written in Python and rely on Selenium as a WebDriver.
1. Ebay - Kleinanzeigen Scraper
This scraper acts as an API server, meaning I send a request - and get a response back.
GitHub: https://github.com/DanielWTE/ebay-kleinanzeigen-api
Features
- Receive 25 random listings
- Get details of a listing
- Get the views of a listing
CURL - Requests
If anyone wants to try it out:
# 25 Random Listings
curl -X GET "http://<ip>/getInserateUrls" -H "accept: application/json"
# 25 Random Listings ***Return***
{"urls": ["https://www.ebay-kleinanzeigen.de/s-anzeige/xyz", ...]}
# Details of a Listing
curl -X GET "http://<ip>/getInseratDetails" -H "accept: application/json" -H "url: https://www.ebay-kleinanzeigen.de/s-anzeige/xyz"
# Details of a Listing ***Return***
{"title": "xyz", "price": "1500.00", "images": ["https://img.ebay-kleinanzeigen.de/api/v1/prod-ads/images/81/xyz"], "tags": ["Kleinanzeigen Berlin", "Electronics", "Mobile & Phone"], "views": "0", "description": "xyz", "uploadDate": "18.02.2023", "adId": "058195681"}
# Views of a Listing
curl -X GET "http://<ip>/getViews" -H "accept: application/json" -H "url: https://www.ebay-kleinanzeigen.de/s-anzeige/xyz"
# Views of a Listing ***Return***
{"views": "xyz000"}
What can you do with it?
For example, something like this:
2. Shein Scraper
This is really cool and took a bit to complete. With this scraper, you can retrieve products and their details, like product images, to scrape. And as if that wasn't enough, I've also included a feature to get review pictures per product as well. That means you can analyze all reviews with corresponding images!
All you have to do is put categories into the designated text file.
Then you run a few scripts and have your own image collection at home.
To store the data, I used MongoDB. But you can also receive product URLs as JSON.
As a little extra, there are download scripts for the images from the database.
GitHub: https://github.com/DanielWTE/shein-scraper
3. AniList Auto Liker
On Anilist.co there are so-called Activities, basically a social media feed. To test something, I wrote a script that logs into the account and likes all activities.
Of course, I've also built in some exceptions, for example, in case you get rate-limited.