Data harvesting 2.0: from the visible to the invisible web
Abstract: Personal data are fuelling a fast emerging industry which transform them into added value. Harvesting these data is therefore of the outermost importance for the economy. In this paper, we study the flows of personal data at a global level, and distinguish countries based on their capacity to harvest data. We establish a cartography of international data channels on the visible and invisible Web. The visible Web is composed of the sites that are available to the general public and are typically indexed by search engines. The invisible Web refers to tags, Web bugs, pixels and beacons that appear on Websites to track and profile users. It is well known that the US dominate the visible Web with more than 70% of the top 100 sites in the world. We show that this domination is even stronger on the invisible Web.The largest proportion of trackers in most countries are indeed from the US. Apart from the US, two countries exhibit an original strategy. China, which dominates its visible Web with a majority of local sites, but surprisingly these sites still contain a majority of US trackers. Russia, which also dominates its visible Web, and is the only country with more local trackers than US ones.