You might have heard this phrase multiple times now that “Data is the new oil”. What it means is that data has become as precious as Oil was until a few years ago. Many wars were fought over Oil. Many economies used to (and still) run on Oil. Most of the current political problems revolve around Oil.
Compare Oil & Data
Data & Oil also have lots of differences. For instance Data is available in abundance whereas Oil is increasingly becoming scarce. Data is man-made and is increasing by the day and Oil is natural resource diminishing by the day. We can run out Oil but never run out of Data. Having said that there are a few similarities that call for such a comparison valid.
Dependence on Oil v/s Dependence on Data: This seems to be very similar. Economies now increasingly depend on Data to function and prosper as it has been dependent on the Oil for a long time.
Monopoly: Oil monopoly with a handful companies and a few countries that have the access to this natural resource is well understood. Possession of huge amount of Data with a handful of companies like Microsoft, Apple, Facebook, Amazon, and Google (also now called as the big 5) is giving them an unfair advantage over the competition
Oil leak v/s data leak. Both can cause significant damage. Oil leak can damage environment and economy whereas Data leak can cause damage to economy & even topple Governments (wikiLeaks has been able to stir many controversies by selectively leaking Data). Recent US Elections and current US administration is increasingly affected by the leaks.
Cause of Conflict: Oil has been the cause of various wars and Data is now becoming the tool for cyber warfare.
Data Hacks in US
Last year there were three separate instances of Data hacks in the US. The first one was the US personnel Data that hacked. The second was the medical Data (patient records, their ailments and the severity of their medical illness), and finally the banking Data (with account balance, income, and expenses).
Based on the investigations done by the cyber intelligence department of US, they found that three hacks were very well planned and internally sponsored by some hostile nations.
The motive was to get hold of critical classified information from the US. Earlier getting hold of classified or top secret information involved use of spying agencies who would try to find a mole in the government and try to obtain these secrets. This is something that most countries do to their enemy nations. This is risky, costly, time consuming and not as effective means to get hold of such classified documents.
Data if used properly can be a very potent weapon. This is what was proved in the series of cyber-attacks on the US systems. The motto was to get the classified information through a mole.
How they plotted was the first attack was done to get a list of people in the US with high level of security clearance specifically towards their objective. Once they were able to hack and obtain the HR personnel data which had the security clearance on record, they had the list of people who had the information. Their next target was to find who could be most vulnerable amongst those people.
The second attack was to get the medical data which revealed the health records containing information related to specifically the one with serious medical conditions. They were then able to cross reference the data and find how many had such serious ailments or had close relatives with such conditions. Now they were able to narrow down people who high security clearance and have a health crisis. The cost of healthcare in the US is very high and those suffering from serious and long term ailment end up spending a large amount of their earnings and savings towards the treatment.
The third and the final attack was done to identify the financial conditions of those with high security clearance and serious/long term medical conditions that needed expensive treatment.
Volla! They found some matches. Now this information could have been given to their agencies who contacted these potential targets to see if they could be bribed. They can easily offer an attractive sum of money to these people who had the means and who had the need and who were most vulnerable. Most effective way to find a mole and the mission accomplished. So quick, Inexpensive, Least risky and most effective. Used data as a weapon.
This is how effectively the Data can be used as a tool for proxy war. Today many countries including US, Russia, China, North Korea and Japan are preparing for the cyber warfare. Since most of the weapons are controlled through technology and by hacking the enemy systems it easiest way to paralyse and defeat the enemy. One can render the enemy missiles useless, ground their fighter planes that rely on the guidance systems if their systems are compromised. Cyber security has therefore become of the key areas of focus for the defence segments and is increasingly being strengthened.
Data Used by Companies
Data is also used by companies to compete against each other. Data has become so popular that companies have started employing statistical and qualitative analysis and predictive modelling as primary element of competition over the traditional factors.
Traditional Factors were pricing, branding, features, offers & discounts. New analytics & predictive modelling allows companies to use the existing data to identify key patterns and then apply those patterns to predict possible outcome of a strategy. For example, launching a new cologne or a new flavor based on the acceptability and taste of the target audience captured using their purchasing history, their browsing pattern and surveys/feedback on their usage experience.
Data is currently being exchanged between companies like Facebook, Google & Amazon who then offer this information to the companies trying to understand the preferences of the consumers. If you navigate to Facebook posts or search for a particular term using google or show interest in a particular product then these companies capture that information and are able to correlate your personal information with your preference and then apply these data analysis and prediction models to understand your behavioural patterns and your personal preferences.
So much information is being captured these days as every online activity (even offline activity) we do can be stored and then reused for further analysis. You go to amazon to search for any product, compare a few products and even if you don’t buy anything, amazon has got lot of new information about you and your tastes, your likes, your desires & your plans. This information can be further used to create customised offers. Everywhere you browse, you will be reminded of these offers. This is called re-marketing and is a very important tool used by the companies which is primarily based on your browsing history. Even if you don’t buy any product or services, simple search combined with your profile can be used to understand your demography and your choices.
Facebook might appear to be a social networking site where people share their experiences like places they visit, places where they like to eat and places they live. While to most this appears a network site to connect friends, for corporate like Facebook this much more than social networking. Facebook has for a long time been collecting the travel preferences and customer experience of their subscribers and has enough data now to become the most effective travel portal in the world. They possess the information that most travel portals don’t have. For instance, the best places to stay, most popular places to visit, most common activities you can do, places to avoid, things to do and so on. This is very precious information for any traveller and Facebook has it all. It can now use this information and provide useful data to the travel portals who can then design customised packages suited for individuals based on their browsing history on Facebook.
Try going to any travel portal search for a particular flight regularly for a few days. Over time you might get a different (may be higher) price. Through browser cookies these portals can easily collect information about your how frequently are you looking at a particular flight. If they can conclude that you have a concrete plan for a particular date range they might not give you a special deal which might be available to others. Have you ever noticed a deal on a flight that was visible for a few times and then disappears and only to reappear after a while? This is because of targeted pricing which is now being employed by companies. They use the information collected from your browsing pattern to decide whether you are likely to purchase or just checking prices.
When you install any app on your phone you are providing more refined Data about yourselves. Now they can easily get Data related to your friends. You might have seen companies like TrueCaller who displays the ID of the callers whenever your phone rings. TrueCaller has enough Data to not only give you name of the contact, it now holds details about your calling pattern, your close friends, associates, people that you most often interact with, last time when someone was active and even tell you if someone was already busy on phone without dialling their number.
One of the major reasons why Facebook bought WhatsApp was to collect this particular information as it can now let suggest and offer you to connect with friends that you might know. This is the information that they have now obtained from WhatsApp and your phone contacts and now they are able to share this data with Facebook and provide you with more desirable and personalized experience.
Data & Artificial Intelligence
The more comfortable you get with these systems, the more you are likely to interact and the more you interact, the more Data you end up sharing with these companies. The more Data these companies have the smarter their offerings can become using the machine learning and Artificial Intelligence.
When we talk about artificial intelligence there are some of the common applications that come to my mind. First is the Google Map. I am sure most of you have used google map at some point of time. Did you know that the more you use Google Map the more intelligence it has become as it is recording your travel coordinates? The time it is taking to travel from one place to another. It can based on the user experience also recommend the approximate travel time at any given point of time and their recommended time is factoring in your travel speed, the traffic and the time of the day. Google maps is now able to offer you alternative routes once you are about to stuck in traffic and its accuracy is improving by the day as the technology used to run the Google Map feasts on data. The more Data it gets the better the accuracy of the application.
Another example is Amazon Alexa, google home, IBM Watson or Apple Siri. These are bots that are designed to interact with humans. They are designed to understand the language and interpret the intent of the discussion. Based on the intent they are able to deliver a response to give a human like experience. Over the time these platforms have become better as they use Natural Language Processing and Machine Learning. The more it interacts the better it gets. Think of it has a baby that is just learning how to communicate. Baby is listening to every word that we speak in front of them. They use their cognitive learning skills to remember these words and then as they grow they are able understand and speak the language. Similarly these bots are learning every word in the dictionary. Some of these bots are now providing platforms to corporate who can use the NLP (Natural Language Processing) and ML (Machine Learning) and replace bots with the customer service representative.
Capital One bank is the one of the biggest bank in the US. They were evaluating whether they should implement a bot to handle customer services. Based on their data of the user queries they found that the most common query asked by their customer was “What time does the bank open”. This is one simple question for which they don’t need to have an expensive call centre representative answering the phone. This can easily be handled by a bot. So the phone banking can be handled to answer these common queries through these bots. The interaction is further stored to make this experience even better.
One of the reasons we make this comparison between Oil and Data is the emergence of big companies around both oil and data.
Big Oil or Super majors, a name commonly used to describe the world’s six or seven largest publicly owned oil and gas companies. These companies have to a great extent monopolised the Oil industry.
Now similar concerns are being raised by the giants that deal in data, the oil of the digital era. These companies are Google, Amazon, Apple, Facebook and Microsoft — look unstoppable. They are the five most valuable listed firms in the world. Their profits are surging. Amazon captures almost half of all money spent online in America. Google and Facebook have accounted for almost all the revenue growth in digital advertising in the US last year.
Such a monopoly prompted calls for the tech giants to be broken up, as Standard Oil was in the early 20th century.
Various uses of Data
Another similarity between data and the oil is to do with the high dependence of the two for the economy. Data has now become the key driver of growth and change. Flows of data have created new infrastructure, new businesses, new monopolies, new politics and — crucially — new economics. Digital information is unlike any previous resource; it is extracted, refined, valued, bought and sold in different ways. It changes the rules for markets and it demands new approaches from regulators. Many a battle will be fought over who should own, and benefit from, data.
Most important, the value of data is increasing. Facebook and Google initially used the data they collected from users to target advertising better. But in recent years they have discovered that data can be turned into any number of artificial-intelligence (AI) or “cognitive” services, some of which will generate new sources of revenue. These services include translation, visual recognition and assessing someone’s personality by sifting through their writings — all of which can be sold to other firms to use in their own products.
The majors pump from the most bountiful reservoirs. The more users write comments, “like” posts and otherwise engage with Facebook, for example, the more it learns about those users and the better targeted the ads on news feeds become. Similarly, the more people search on Google, the better its search results turn out.
These firms are always looking for new wells of information. Facebook gets its users to train some of its algorithms, for instance when they upload and tag pictures of friends. This explains why its computers can now recognize hundreds of millions of people with a high degree of accuracy.
Today several applications use or plan to use visual recognition as part of authentication. Even the Adhaar authentication (in India) is set to use visual recognition. This is also being increasing used in criminal justice systems in many parts of the world. Several law enforcement companies in US for instance are exploring the visual recognition technologies during crime patrol. The cops have body cams which can capture visuals of their interaction with a patron which is then uploaded on the server. These visual recognition systems then are able to search and return valuable information in real time back to the law enforcement officers. These technologies are able to help these officers take immediate actions based on the information retrieved from these systems.
Google’s digital butler, called “Assistant”, gets better at performing tasks and answering questions the more it is used.
Several applications with voice recognition features are now being developed that leverage these voice recognition, text to speech, speech to text features of Google Assistant.
Similarly google which has been working on the driver less car has been collecting data that is helping them design and further optimize its self-driving algorithms.
Data is the fuel for the machine learning and once the machines have sufficient data they can easily perform functions and research several times faster than humans.
Google’s AlphaGo last year defeated World’s champion Ke Jie in the Chinese ancient strategy game called Go. Once considered farfetched this became reality sooner than most anticipated. With enough data given to these platforms the results can be exponentially better than expected. AlphaGo was created by London based DeepMind which was later acquired by Google.
Future of Data
One of the reasons why the machines have a capability to deliver results sometime humans aren’t able to is due to the way humans think. We have a tendency to conclude and then research our way to prove our conclusions. Machines on the other hand do not have any empathy to cloud their judgement and are able to better interpret the data. Humans have a tendency to ignore some data due to the preconceived theories whereas the machines treats each data element equally.
There are several interesting use cases for data in the future:
Medical research: medical data from various medical facilities across the globe can provide useful information that can help cure and prevent several incurable diseases.
Astronomy: We have been collecting data about the various planets, galaxies & the existence of multiple universes.
Human psychology: So far we have been unable to predict or understand reasons why humans behave the way they behave. With so much of data related to the human behaviour available there is a lot of potential to uncover the mystery behind the human behaviour.
Data is everywhere. We are collecting data at every stage of our life, in every activity we perform, with every gadget we use. While watching TV, travelling, on the phone, on the internet, attending events, while voting, buying, transacting, playing sports & even sleeping.
Data is the new oil that we have in abundance. We are never going to run out of data. On the contrary the data will only grow and that too exponentially. Learning to deal with the data and using it effectively will be the most sought after skill of this century.