How does the Internet track us.

Code & Tech

Many people seem to believe that you can be completely anonymous on the Internet. In the 1990s, this statement could have been true – today, it certainly is not. A once extremely popular drawing by Peter Steiner reflects the mood prevailing on the Internet – the sense of anonymity, the possibility of hiding your online activity behind a false identity. However, this picture has become completely outdated over the years. Our online activity has increased hundred-fold and tools to track it have become more sophisticated and, at the same time, more difficult to control by an ordinary user. If one was to determine date that could mark the turning point in limiting privacy on the Internet, it would certainly be 1994. It was then that Lou Montulli, a Netscape employee, created a file that allowed for tracking of activity of particular users on websites. We know this tool very well today - this is, of course, a cookie file. The use of this tool, however, is only the tip of the marketing iceberg. So, how does the process of online data collection look like in the 21st century? How is this data used and how can we defend ourselves from violating our privacy?

What and how data is collected

Techniques of online tracking and profiling are becoming increasingly invasive and are beginning to cover nearly all areas of our private lives. Data relating to health, interpersonal relations, financial situation, weaknesses and dreams of millions of people is collected, properly categorized, profiled and used for commercial purposes. How do the companies know all these things about us? Here are some examples of your data most likely owned by many companies:

  • Browsing history - websites we browsed can be a good basis for creating a personalized marketing offer. Based on this data, companies know e.g. that we are interested in purchasing a trip for holidays or that we are planning to renovate our apartment. In order to obtain information from our browsing history, two methods are usually used: tracking cookie files and internal storage. In the first case, thanks to tracking cookie files and similar programs, the browsing history is tracked directly from the computer. Thanks to cookies placed in our device, and more precisely in the web browser, websites can collect information about the sites we visit. Collecting data directly on the website (internal storage) has a different character. It mainly takes place on websites where we already have an account, e.g. Facebook, eBay, Amazon, etc. After logging in, analytical programs built into a particular platform try to collect browsing history by means of dedicated software. This method of data storage linked to the account makes clearing cookie files of the browser ineffective.
  • Location - many websites collect IP addresses, which are dedicated numbers assigned to particular computers connected to the network and determining the location of the device from which the activity is performed. Practically every computer connected to the Internet has an IP address.
  • Relationships with friends, family and co-workers - many websites use analytical tools to gather data on our interactions with other users of the network. How does this system work? Comprehensive programs for monitoring and analysis try to calculate relations between you and other persons you interact with within the network. When collecting this data, sites such as Facebook may try to categorize your relations with others by means of such elements as frequency of establishing contact, duration of contact, potential character of the relationship, or common interests.

The above methods of monitoring our online activity are of a rather traditional character - these techniques have been known for a long time. However, there are methods far more sophisticated and difficult to analyze, including a very interesting browser fingerprinting, which is a unique "fingerprint" created for each browser. This technique of tracking web browsers is based on information regarding site configuration and settings. So, let's trace the process of browser fingerprinting. After loading the website, you automatically send some information about the web browser to the visited site - and to any trackers embedded in this website. The sent information includes: types of installed fonts, set language, installed apps, screen size and color depth, time zone.

The website you visit may analyze your browser and the above-mentioned information using JavaScript, Flash and other methods. The website can then create a user profile connected to the above characteristics related to the browser and, at the same time, in no way related to any specific tracking cookie file. This way, a unique "fingerprint" of your browser is created and will work even if you ensure that cookies are removed. It is a unique code which enables identification of a specific device and, in combination with other data, also of a specific user. As a result of such marking, data about us may be gradually aggregated and our online presence is never anonymous.

Knowing which sites monitor us and how they collect information about us is certainly the first step to introducing effective forms of protections. To facilitate this process, we have created a special website where you can check what the most popular trackers (monitoring sites) on the Internet are, where they obtain data and where they send it -

How is our data used?

All the above data snatched and tracked by various kinds of tools, alone - fragmentary and unordered - have no significant value to companies. Then how does transferring information about us often become the basis for business operations of many companies? Why and for whom are these data so valuable? In order to best answer these questions, let's trace the entire process of obtaining, processing and using our data.

Let us imagine a situation where we visit our favorite page with culinary recipes.

The main source of maintenance for this site is the sale of advertising space. Therefore, the main client for this page is not the one who visits it but the one who pays for displaying advertisements on it. And for those companies, the most important thing is to perfectly tailor the offer to the target - namely anyone visiting the site with the intention to look for lunch ideas. Context-sensitive advertisement wins here – it is addressed to a defined recipient who, at the time of displaying the advertisement, will be interested in purchasing a particular product. Only by knowing our interests, purchasing power and ways of responding, the suppliers of content are able to ensure relevant click-through rate. And here, all our data comes to play, information more or less consciously shared by us with marketing companies - namely our gender, age, place of residence, interests, financial situation, purchase history.

Having this information, obtained by means of tracking cookies and browser fingerprinting, the supplier of online content transfers it to the so-called Demand Side Platform programmed to find users from a specific market segment. These segments are determined by media agencies on the basis of various kinds of data provided by us via websites (profile examples: representative of the middle class who spends a lot of money online, senior who is reluctant to shop online, millennials following trends, etc.).

The task of the agency is to provide its customer with someone demonstrating a set of properly compiled characteristics. The basis for knowing about those features is data on who bought what in the past. This is supplemented with features and behaviors characterizing a person who hasn't yet bought the advertised product but may do so in the future. Such a hypothetical customer profile is called look-alike in marketing jargon. On its basis, our value is determined for businesses wanting to show us their advertisements. For example - the value of a consumer who is looking for kitchen appliances will be significantly higher to companies offering kitchen products rather than to producers of electronics.

Having an already estimated value of the online user, companies begin bidding via online stock exchange. The purpose of stock exchange is to optimally adjust the advertisement to the user. As a result of the whole process, within seconds of visiting the website, the company that won the purchased advertising space shows messages perfectly adapted to our profile and preferences. With each subsequent package of data collected by marketing companies, the advertisement becomes more and more tailored to our expectations. In marketing jargon, the entire process is defined as real time bidding (RTB).

Why is it important to care for our privacy?

Watching advertisements carefully created to fit our preferences may seem like a win-win situation. Users get an offer they are looking for and are willing to benefit from, companies sell their products more effectively. Tracking our online activity and collecting data about us may have, however, seriously negative consequences. It is a common practice to condition prices of goods based on the profile of a specific customer.

An example may be Orbitz - service for travel planning and hotel/flight booking - which, by analyzing available data, discovered that on average users of Mac computers spend 30% more funds on booking a hotel room than owners of other equipment. Bearing this knowledge in mind, the company created an advertisement personalized specifically for the owners of Mac where the suggested hotels were 11% more expensive than those offered to all other users. Similar overpricing for a specific group of recipients has been commonly used in flight ticket sales, clothing or electronic industry. The situation can, however, backfire on people with lower incomes (or qualified by the system as such). It is enough for the company to create a campaign encouraging to purchase products by offering discounts and promotions. In all likelihood, Internet users categorized as those with low income or hardly active on the Internet will not be informed of the possibility to use codes or promotional coupons.

This peculiar economic segregation is also accompanied by a high probability of being a victim to data leak. One can only imagine the damage that can be inflicted by making our information regarding health, finances or relations with other users public.

Bearing in mind not only the protection of our privacy but also the condition of our wallet, it is worth ensuring that our information is not treated as a trendy product and our privacy is not sold between various companies.

Sources: = 111#fingerprintTable

Picture: The New Yorker, Peter Steiner, 1993.