Author(s): Aadit Kapoor
It is the 21st century, technology is on the rise, the internet has succeeded paper texts. We live in a world that is interconnected. In this fast-paced, growing world, data is being rapidly created every second. The use of algorithms and statistical measures allows us to graph each movement in a way that is acceptable for predictive modeling.
Big data refers to huge amounts of data accumulated over time through the use of internet services. Traditional econometrics methods fail when analyzing such huge amounts of data and we require a host of new algorithms that can crunch this data and provide insights. (Harding et al, 2018). Big data can be referred to all the human activity performed over the last decade and exponentially growing every second.
Being interconnected has its benefits and drawbacks, one of the major drawbacks being privacy. Big data does just encompass the analysis of data but it also consists of data collection. Data collection is on the ways where personal user data can become compromised. (Kshetri, 2014). Predictive modeling will not only help us improve our services but it would have a deep impact on industries like healthcare and food.
The accumulation of data cannot be stopped and we must be well aware of the benefits and drawbacks of the holy grail of technology, data. This paper aims to examine all the facts, case studies related to data, and how it affects our modern life.
Data or information can be referred to as the accumulation of past behavior. Information can also be categorized as a sort of data. Typically something that we took for granted a few years back has not boomed in this decade due to a large amount of human activity and computing technologies.
In the 21st century, we are surrounded by data that can be composed of two types: discrete and continuous. Discrete data consists of entries that can be used for classification whereas continuous data refers to the entries that can be used for regression.
Man and Data are inseparable as it is the flow of information. Data has been a very important part of human existence from time immemorial. Once civilizations were established they could not function without data. Indus Valley Civilization had seals (a type of coins) in which data was tabulated.
The Incas, another very old civilization, had the same methods for data collection. As civilization progressed, man-made data tabulation also developed. It graduated into coins that replaced the barter system. There were also numbers which have been used since biblical times.
The seafarers also had a system of data that helped in their trade. Historically data collection was an important aspect of life in ancient times and around the 1950s, due to the rise of computing systems, data could be presented in the format of bits and bytes. In the 21st century data has been regarded as the new oil.
Privacy is the state of freedom from intrusion and the ability of an individual to have the information only up to themselves. The person should have the freedom to share the information whenever they require it. In the 21st century, due to the boom in data and computing, companies have tried to exploit it using sophisticated algorithms and techniques known as data mining.
Due to limited enforcement by the government for these privacy laws companies have exploited the data to gain more and more users by invading their privacy. (Cate, 1997). One reason between the disconnection of data and privacy is that many users are not aware of when their data is being collected (Acquisti et al, 2016).
While we may consider data a valuable resource, we should be aware of how this data can be exploited by companies or politicians to attract a certain set of customers. Users can give out unintended personal information to these platforms in forms of text, images, preferences, and browsing time (Xu et al, 2014).
These data collections pose a threat to humanity and to rectify this, new techniques to perform data mining are being explored extensively where the main aim is to study, analyze, process data in such a way in which privacy is maintained (Xu et al, 2014).
In the 21st century, human-computer interface activity is at its peak. A lot of companies depend on the accumulation and processing of huge amounts of data(Oussous et al, 2018).
Huge amounts of data, also known as big data, are a resource to a company’s research and development as they help companies decide on where to put the money and invest. The world economy has been changed into something called a data economy that refers to an ecosystem where data is gathered, organized, and exchanged using big data algorithms.
These days data can be huge, cluttered and unstructured, an example is when different clients have different accounts on the same platform and to extract useful information the source algorithms have to first preprocess the data in such a way that manages bias, outliers, and imbalances (Tummala et al, 2018).
We are surrounded by data in such a way that services like YouTube experiences a new video every 24hours with a rough estimate of 13 billion to 50 billion data parameters in a span of 5 years (Fosso Wamba et al, 2015).
Harnessing human data to predict future movement is a common strategy for companies to game data, while Youtube is producing such huge amounts of data, people using the service are contributing back to the service by storing their likes and dislikes in a “big database” maintained by YouTube. Big Data and Business analytics are estimated to provide an annual revenue of about 150.8 billion dollars in the US (Tao et al, 2019).
While these firms earn by providing users with a better interface using their data, some firms exploit data to influence a portion of individuals. They use computer algorithms to predict and transform user data into something usable, using data crunching and data mining techniques, they extract user data to sell or influence.
Facebook, a social network company, was recently involved with Cambridge Analytica, a data-mining firm that gathered data of Facebook users using loopholes. Community profiles were built upon this data which was used to target customized ads.
Due to this Facebook was on a decline as this was considered a massive data breach and personal user data consisting of images, text, posts, and likes. This scandal played a key role in the US Elections 2016 and following this GDPR (General Data Protection Regulation) was established in the EU (Tao et al, 2019).
It is estimated that companies use past data to build something called as recommendation engines that can predict what sort of content a user wants to view. One such example is Netflix that asks users to rate movies on a scale of 1 to 5 to build a personalized profile for the user.
For the Netflix recommendation engine, linear algebra or to be more precise SVD (Singular Value Decomposition) was used to a system that can predict what the user might like (Hallinan et al, 2014).
To Conclude, Big data and privacy go hand in hand as they are interconnected and interdependent. For a breach of privacy, one must have access to huge amounts of data, and to build these computing engines, we need large scale distributed computing resources and techniques.
We see how data became so popular and with the disruption of the right technological tools and algorithms, companies were able to harness the predictive capabilities of the system.
We also see how privacy is a big part of the data economy and how data collection methods seem to differ in this fast-paced growing world. We cannot stop the flow of data but we can surely be aware of what is being collected. Companies like Cambridge Analytica used loopholes in the Facebook Platform to gather personal user data was surely a breach of privacy due to which Facebook was called upon in Congress and their market share dropped drastically. We provide a logical flow of how data, privacy arose, and how due to the huge amounts of human activity data seemed to be called “Big Data”.
For the future direction of this research, we plan to analyze how we can control the flow of data by using technological techniques and we plan to discuss the effect of racial bias in such huge amounts of data, more specifically how racial biases affect data mining algorithms (Obermeyer et al, 2019).
Harding, Matthew, and Jonathan Hersh. “Big Data in Economics.” IZA World of Labor, 2018, doi:10.15185/izawol.451.
Kshetri, Nir. “Big Data׳s Impact on Privacy, Security and Consumer Welfare.” Telecommunications Policy, vol. 38, no. 11, 2014, pp. 1134–1145., doi:10.1016/j.telpol.2014.10.002.
Data is the new oil. Header_image. [accessed 2020 Jun 22]. https://spotlessdata.com/blog/data-new-oil
Cate FH. Privacy in the information age. Washington, D.C.: Brookings Institution Press; 1997.
Acquisti A, Taylor C, Wagman L. The Economics of Privacy. Journal of Economic Literature. 2016;54(2):442–492.
Oussous A, Benjelloun F, Ait Lahcen A, Belfkih S. Big Data technologies: A survey. Journal of King Saud University — Computer and Information Sciences. 2018 [accessed 2020 Jun 22];30(4):431–448.
Tummala Y, Kalluri D. A review on Data Mining & Big Data Analytics. International Journal of Engineering & Technology. 2018 [accessed 2020 Jun 22];7(4.24):92.
Fosso Wamba S, Akter S, Edwards A, Chopin G, Gnanzou D. How ‘big data’ can make a big impact: Findings from a systematic review and a longitudinal case study. International Journal of Production Economics. 2015;165:234–246.
Xu L, Jiang C, Wang J, Yuan J, Ren Y. Information Security in Big Data: Privacy and Data Mining. IEEE Access. 2014 [accessed 2020 Jun 22];2:1149–1176.
Hallinan B, Striphas T. Recommended for you: The Netflix Prize and the production of algorithmic culture. 2014;18(1):117–137.
Tao H, Bhuiyan M, Rahman M, Wang G, Wang T, Ahmed M, Li J. Economic perspective analysis of protecting big data security and privacy. Future Generation Computer Systems. 2019 [accessed 2020 Jun 23];98:660–671.
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019 [accessed 2020 Jun 23];366(6464):447–453.
Economics of Big Data and Privacy: Exploring Netflix and Facebook was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI