What Data of Mine Is Out There, and How Did It Get There?

For many of us, the question lurks in the back of our minds but few of us bother to dig out the answer. The reality is that data comes from many, many sources and is pretty widely spread. That may sound ominous, but you’re actually aware of much of it, and it’s been out there for some time with your knowledge and blessing. It’s been preventing fraud and keeping your digital life working the way you’ve asked it to.

It’s important to understand two basic realities about our digital selves. First, our data is widely distributed but closely held, such that compiling a comprehensive view of the details of a person’s life is more aspiration than reality (despite the hype). Second, the value that data-holders place on our information usually relates to their commercial relationship with us as individuals - something they have no desire to share with competitors. The result is that the most widely shared information is the least alarming and most general.

It’s also important to understand that there are a lot of ways to organize our thinking about data and where it originates. Our taxonomy is based on eight data sources, but there are other frameworks out there that are just as valid. With that in mind, let’s start from the obvious data sources, and work our way toward the less obvious and more abstract.

The first source is, of course, data you yourself provide when you fill out a form, upload a picture, write a blog, or record a social connection - direct user input. While the act takes only a moment, the data provided can linger for a long time, leading to the second data source: databases.

Databases are not new. If you’ve ever interacted with government or a big institution like a bank or employer, there are records of you and your interactions with the world (some may be on *paper*). Which leads us to number three: transactions.

Every time you use a credit card, interact with your bank, use a reward card or coupon, you leave a digital footprint. While the ease of capturing this information has increased with the digital age, this has been going on a long time, both online and off.

The fourth source is your connection to the network through your mobile phone, computer or the many other connected devices now emerging. Networks need to know a fair amount about what you’re doing and where you are just to function.

This takes us to the edge of the network, where operating systems and sensors reside (sources five and six). While the network needs to know enough to connect you, the operating system needs to know enough to manage the various data flows and interactions between you and the outside world. Some of this is user driven, but increasingly your devices have the ability to measure the world around them on their own, through cameras, microphones and other sensors. Managing all this data to perform useful tasks leads us to applications.


Applications are where the power of data is truly in evidence. The central role that applications play as the go-between for users and data means that they are often seen as the biggest challenge in establishing trust and privacy. More often than not, applications are simply tapping data from the other sources to perform a service or complete a request, but applications can, and do, capture and generate data of their own.


The final source in our taxonomy is data that is generated by algorithms and background processes based on the first-order data coming from the other sources. This is where things like artificial intelligence and machine learning reside - programs that use data to create other data or to make decisions like who’s credit worthy or how many pet rocks to stock for the holiday rush.

So with that framework as background, what do each of these data sources know about us, and which are the ones to fear?