Mysterium Network Logo

Web Scraping Workshop

This is a workshop for using the Mysterium network to scrape the web.
With some simple code and a single API, you can launch your own platform to index products and services from around the world. Concert tickets, crypto prices, trainers, hotels, news articles, lead generation, sports match history, or even where to stream the latest movies - source and organise the world’s information by your preferred metrics, such as price, relevance or time. This workshop shows you how to easily collect open data through the Mysterium node network spanning over 100+ countries.

Pre-requisites

  • docker
  • docker-compose
  • NodeJS 14
  • Yarn

Running

  1. Make sure your nodes are running. All nodes can be spin up using docker-compose:
docker-compose up fleet

or

yarn fleet-start

  1. After launching all nodes, you need to register each individual node by navigating to NodeUI and completing the on-boarding process:

!Make sure to set your password to qwerty123456 - all scripts will expect this

  1. After all nodes have been registered you need to instruct each one to connect to respective country
yarn fleet-connect

This will try to connect to the listed countries residential node, and will tell node to act as http proxy

yarn fleet-info

Will print out each node balance in MYST - 0 balance means that node will not be able to pay for traffic

  1. Finally, you can begin scraping ;)
  • yarn booking - will run a scraping using single node (slow but simple setup)
  • yarn booking-fleet - will run scraping use case on multiple nodes at once (fast but more complex setup)
  • yarn google-fleet - fast crawl google search
  • yarn hotels-fleet - fast crawl hotels.com

Fleet commands

  • yarn fleet-start spins up docker containers with myst node
  • yarn fleet-stop stop all docker containers
  • yarn fleet-restart stop and start all docker containers
  • yarn fleet-info print each node balance
  • yarn fleet-connect connect each node to respective country (configuration is in config.ts)
  • yarn fleet-disconnect disconnects all nodes from providers (active connection drains MYST over time)

Utility commands

  • yarn fix runs prettier to fix the code style

NOTE:

!!! DO NOT REMOVE ANY MOUNTED VOLUMES (MAKE BACKUP) !!!

Registrations are not free, they require a tiny bit of MYST, it is possible to pay with credit card too a symbolic sum of ~1$

Volumes will contain keystore for each node which is your wallet with money (MYST)

Volumes will be created in the root of this project. You will have to chown -R <user> myst-nodes-* in order to access them.