Jump to content

Is 'Anonymous' Data Really Anonymous? - Tinfoil Hats vs Informed Consumers


exile360

Recommended Posts

Recently the issue of privacy vs telemetry has been coming up a lot lately, with the impending end of support for Windows 7, data breaches becoming the norm, and privacy issues becoming more prevalent each day and even Malwarebytes conducting their own study on user privacy awareness and preferences, it's become a hot issue (not to mention all the flak companies like Facebook, Microsoft and Google have gotten over telemetry and user tracking over the past several years).  Even Malwarebytes is talking about the possibility of deleting social media accounts due to these issues.

So I decided to dive a little deeper into why I believe much of this 'anonymous' telemetry data that is being collected and sorted by these massive AI supercomputers may pose a much more serious risk to our privacy than we might think, even if we don't mind when companies send us targeted ads that creep us out, and maybe even reveal more about us than we might want.

The issue isn't that they collect so much data; it is that this data, these hundreds or even thousands of individual data points that are collected, even when 'anonymized' by decoupling them from any 'officially' personally identifiable information (things such as your full name, address, date of birth, phone number, social security number etc.) can actually reveal much more than you think and can be reconnected to identify exactly who you are (or at least reveal much more than you ever shared or intended to share).  What's worse, with Machine Learning/AI behind the wheel, reconnecting this data is a completely trivial matter and is happening (because it must be; there's far too much data on each of us out there for it not to be) so any illusions we might have about all this data being 'harmless' and 'anonymous' goes right out the window.

If you haven't been keeping up with technology lately, Artificial Intelligence, sometimes referred to as Machine Learning, Network Intelligence and other such fancy terms that basically mean really powerful computers/distributed computing across multiple systems to crunch massive amounts of data that were not possible only a few short years ago thanks to recent advancements in technology (especially in the computer discrete graphics space, where those cards that were once used exclusively for gaming and rendering 3D are now being used in parallel with hundreds/thousands of other graphics cards in the same/multiple systems to apply their compute power to crunching huge amounts of data via complex mathematical equations and software).  This technology is so incredible that it is rapidly changing our world, especially in business, and even more so for large corporations like Microsoft, Google and other large technology firms that rely increasingly on telemetry (i.e. user/customer data) for profit, be it through advertising or through using the data to determine statistical trends that help them make business and product decisions.  These aren't necessarily bad things of course; everyone wants better software that fits their needs better and solves their problems, making their daily tasks easier, and even improvements in advertising mean that if we aren't blocking all the ads, the ones we see are more relevant to us and products/services/solutions that actually interest us.  These are good things, and I totally understand that, so I see why many in and around the tech industry raise a red flag and label me a 'tinfoil hat' as soon as I start spouting off about privacy concerns when it comes to telemetry, AI and big data.  But my concerns are valid, and I have the facts to back it up.  You see, the answer lies in how all that data they collect is anonymized.  It's not just a matter of them only collecting information that couldn't possibly be linked to a single individual, because we all know that in order to serve ads they MUST collect at least our individual browser/machine ID (maybe even more, such as our user login credentials for platforms like Google and Microsoft that function and synchronize across multiple platforms and devices), but also because the way that they anonymize this data has changed.  Where once they only collected the bare essentials for what they were interested in, disregarding all PII and unnecessary data points, now they collect it all and leave it to AI to anonymize it/strip out the PII and keep everything else, however to work properly, these AI data sets must be correlated, meaning the data points for each individual and group of 'similar' (as far as the data points are concerned at least) individuals must be linked for the data to be statistically useful and for their AI algorithms and Marketing teams to get useful info and make educated assumptions/decisions based on the data which instantly compromises its supposed 'anonymity' right from the start.

Then of course there is the issue of combining multiple databases from multiple sources; say your web browser and operating system, your smart phone, social media account(s), email account(s), search histories across multiple platforms/devices/search providers as well as all the websites you visit which they are able to track (either through the browser directly, or through fingerprinting, and that doesn't even address the issue of all of the criminal data breaches that occur each year, and how in the hands of bad guys, all this data can be used to target us through spam, phising, social engineering, extortion scams and even identity theft.  You must remember that malware is big business now, and the bad guys are well armed and well funded, meaning they have access to the same level of AI technology (both hardware and software) that these big legitimate corporations like Microsoft and Google use for harvesting and sorting their data, so all that correlation can be done by the bad guys, applied to multiple aggregate databases that were never meant to be cross-referenced, thus enabling them to identify and target countless individuals.

Am I blowing all of this way out of proportion?  Maybe; but just think, we never had extortion scams like these until we started feeding so much data to these companies and the bad guys became more interested in that than in trying to infect individual systems (in fact, the largest botnets in the world are now made up of Internet of Things (IoT) devices, not PCs as had been the norm up until recently.

TL;DR: If you think I'm nuts, or just want to learn more about why I'm concerned about all this data, please give at least one or two of the following articles a read:

https://www.itprotoday.com/data-privacy/anonymous-data-collection-more-risky-users-think
https://chooseprivacyeveryday.org/choose-privacy-week-2015-what-you-should-know-about-anonymous-aggregate-data-about-you/
https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006

I swear it's the world that's gone crazy, not me.  We all used to call this stuff by its true name before Microsoft and Google turned it into a multi-billion dollar industry and it suddenly became 'normal' somehow.  In fact, this stuff was the very reason that companies like Malwarebytes were created in the first place because they targeted these ancillary threats that the big AV vendors were either really bad at targeting or just weren't interested in targeting.  These days we call much of it 'PUP' (Potentially Unwanted Programs), but they had other names before that which go back to a time before Malwarebytes existed.  They were called Adware and Spyware, but now that Microsoft, Google, Facebook and everyone else to one degree or another (yes, even Malwarebytes; note the Usage and Threat Statistics option under Settings>Application in Malwarebytes) is using it, suddenly it is normal and acceptable, however the information linked above reveals how nothing could be further from the truth and that in fact now, thanks to the sheer volume of info and multiplicity of data points and sources being collected, sorted and stored, all this data poses a serious risk, not only to our privacy, but also our security thanks to all these breaches, unscrupulous companies selling and/or sharing their databases with others as well as companies that don't make it not disposing of customer data properly after going out of business, not to mention times we've already seen as well as times companies decided to sell customer data.  Incidents such as these can lead to some scary stuff.  It's bad enough with all the accidental/unintentional data leaks and breaches without these organizations deliberately sharing/selling off their data to others, and every time this happens, one or more databases with countless points of supposedly 'anonymous' data gets merged and correlated with another and comes that much closer to positively, uniquely identifying and profiling each and every person in those databases, and the problem only gets worse with time as more data is collected and correlated, and AI makes all that work much faster and far less difficult (what would take a large organization of humans years can take these machines only days, hours, minutes or even seconds; that's the kind of scale we're talking about here, and a big reason I am so concerned).

But hey, I'm just some uninformed, tinfoil hat wearing nutjob, right?  If so, then you have absolutely nothing to worry about; the sky isn't falling, your data is secure, your identity is safe, you're as anonymous as you choose to be online, and absolutely no one is actively or planning to do anything malicious at all with all of this information because people are good, corporations and governments care about the public and take privacy seriously and would never do anything to harm us or our privacy just for an advantage or profit, right?

Link to post
Share on other sites

Hehe, I've watched it (great show, by the way; one of my favs :)), but reality is far from the fiction presented in that show.  The only 'AI' we have currently is in the form of complex mathematical algorithms and pattern analysis; it's nothing even remotely resembling 'consciousness' or even basic thought on any level; just complex decision trees, data sets, mathematical comparisons etc. and that's about it.  The risk factors I'm talking about have to do with the fact that so much seemingly 'useless' aggregate data exists out there across all these sources from sites we've visited, services we've used (both online and in real life where companies have collected various data from us such as buying habits, watching/listening habits etc. etc.) and companies we've had dealings with that AI could very easily weave back together into a very identifiable profile via the process of elimination.  Just think about things such as regional stores that only exist in certain parts of the country, regional foods that you might eat, region specific ISPs and countless other factors, then when you tie that into more public resources such as social media it's absurdly simple to nail down exactly who someone is and far more about them can be inferred than most people would be comfortable with just through the evaluation of all these various data points and cross-references with other known individuals/groups of individuals with similar habits/tastes/decisions/profiles/whatever, and it can 'know' (infer) things about you that no one besides you may know; information you haven't made public anywhere or to anyone just based on the aggregate and patterns/profiles.

It really can get pretty intense once you dive down the rabbit hole of just how complex and powerful all of this data can be once combined with AI data analysis techniques.  It also gets worse when you think about how easy all of this kind of information would make it for a scammer to better target individuals to deceive them and scam money and/or real risky PII like account numbers etc. out of them.  All of that becomes much easier when you are able to determine where someone shops, what they buy, who their mobile carrier is, where they live, what kind of car they drive, how they vote, what their beliefs are, what their entertainment tastes are and countless other factors.

Link to post
Share on other sites

  • 2 weeks later...

My research has continued and I've revised my list of must-read content with regards to online privacy.  Here's the current list of items I believe everyone interested in their privacy should read:

First off, Microsoft's privacy policies and telemetry collection info for Windows and Office:

Next, articles and resources from around the web on various subjects including Windows 10, Windows in general, websites, social media, mobile devices and much more:

I may do a write-up on the various tools I use to aid in keeping my info and browsing private online, though I'm not quite sure since much of it doesn't apply to Windows 10 thanks to some of the changes Microsoft has made in the latest version of their widely used OS which have removed much of the control that users had in previous Windows versions, and since I'm not a 10 user myself, I wouldn't want to provide tools and instructions which may or may not apply to that OS without knowing for certain and being able to indicate with confidence which tools work and which ones don't with Windows 10.  That said, there's a certain level of irony in all of this given the fact that Microsoft in Internet Explorer were among the first to adopt privacy block lists as well as one of the earliest adopters of the 'Do not track' feature in their browser.  Oh how things have changed since they decided to become an advertising/telemetry/big data driven/OS-as-a-service provider rather than a straight-up software developer/vendor.

Edited by exile360
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
Back to top
×
×
  • Create New...

Important Information

This site uses cookies - We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.