The Answer

Advice, staff picks, mythbusting, and more. Let us help you.

Photo collage showing the outline of a person in front of a spreadsheet of data.
Illustration: Wirecutter

Big Companies Harvest Our Data. This Is Who They Think I Am.

It’s a surreal experience to see all the data you’ve given a company in one place.
Online and off, nearly every life choice you’ve made, every item you’ve purchased, or every website you’ve visited has been logged, categorized, and then entered in a spreadsheet to be sold off. Once it’s laid out in front of you, it may make you rethink how you share information in the future.

At the start of this year, California enacted its new privacy law, the California Consumer Privacy Act (CCPA), which gives people the right to see what data businesses have on them, to delete that data, and to opt out of further collection. Such data requests give everyone a chance to see a sliver of that data, and although it takes some time to jump through the hoops to make the request, doing so is a useful exercise because it can help you make informed decisions about your data in the future.

I spent around 20 hours requesting and reading my data from more than 30 companies. I asked for every single bit of data from the obvious companies, such as Apple and Amazon, as well as from the less obvious data brokers like Acxiom and Clearview. After I filed these requests, the data arrived up to 45 days later, in large spreadsheets or text files, often filled with codes impossible to decipher. Going through my files, I felt like an amnesiac detective trying to piece together clues from his own past.

Retailers usually collect everything they can

It won’t surprise anyone that Amazon collects and stores everything you do on Amazon. The company has a list of everything I’ve purchased or returned, everything I’ve watched on Amazon Prime Video (including what device and location I watched it from), and everything I’ve searched for on Amazon (including if I clicked an internet or external link). It tracks every customer service email and chat. It tracks everything you can imagine it tracks, and some things you might not.

a chart showing what requested data was received within 45 days.
Companies technically have just 45 days to respond to a request, but several sent a notice saying that they were taking an additional 45 days.

None of this practice is unexpected, but it’s fascinating to see it all in one place. Going through my purchase (and return) history on a giant spreadsheet has yielded little micro-stories about moments in my life: when I got divorced, when I moved to a new state, when I got real into the banjo, or when ants invaded for months on end.

Brick-and-mortar stores collect data, too. For example, Home Depot had a much larger file on me than I had expected considering I don’t have an account with that retailer. My data was complete with details like inferred income level and net worth (both very wrong), ethnicity and gender (both correct), home ownership (incorrect), and purchase history. Best Buy had a similar set of data, adding my inferred religion and political party to the mix alongside other “triggers,” such as how I have a decreased likelihood of buying a cell phone (true), a satellite dish (also true), or a luxury car (very true).

Looking at my marketing profile was as relieving as it was horrifying. For everything it got wildly wrong (a high likelihood to enjoy soccer), there was something so right, I couldn’t help but wonder how they came up with it (a high likelihood of, ahem, “Heavy fiber focused food buyers”). In either case, with all these metrics in front of me, I could easily imagine a hypothetical future where this data would be abused, either to deny me a service or to take advantage of a negative (or positive, for that matter) life event. But where do retailers get this type of data—about political parties, inferred religion, and the like—in the first place?

That brings us to data brokers.

Data brokers can create an accurate profile of you, even when you try to minimize your online footprint

Data brokers are companies that collect and sell information about consumers to other data brokers or to individual companies. Data brokers collect information from everywhere they can, including public records, commercial sources, and Web browsing. They then collate that data into a profile. For a more detailed look, Vice breaks down how these services work, including how data brokers collect and sell this information. Vice also provides a resource for opting out of them.

I was able to get my data from a couple of data brokers, including Acxiom and Equifax. Acxiom has my addresses going all the way back to my childhood home, my email addresses from high school onward (including my first Prodigy.net address), and my age. Acxiom also includes what it infers about gender, income, and children. This is the type of third-party service companies such as Home Depot and Best Buy use to build profiles of me—if not directly, then from a similar data set. I asked Home Depot’s privacy department if my profile was from Acxiom, and although representatives told me that the retailer uses data services like Acxiom, they wouldn’t directly confirm which ones. Best Buy did not reply to my email.

A chart showing data Best Buy collected about the author.
According to the data Best Buy collected from a data broker, it turns out no brand can motivate my laundry choices (the lower the number, the company expects, the higher the likelihood it describes me).

Acxiom tracks my buying behavior across apparel, electronics, general orders, and more. It knows how many purchases I make a month in each category and how much I tend to spend. It knows how much I travel and how much I donate. It uses this information to create a profile that includes my lifestyle interests, discretionary income, and segmentation inferences (I’m an “upscale earner,” “young digerati,” and “wealthy achiever,” which is news to me). Even if we claim our purchases don’t define us, data brokers use what we buy to do exactly that. Based on Acxiom's inferences, I’m punching well above my actual wealth class.

Clearview—the startup that helps law enforcement match photos of unknown people to their online images—isn’t exactly a data broker, but it operates in the similar gray area of providing surveillance as a service. In my case, my data was boring, including profile photos from various websites I’ve worked for over the years. But as Vice’s Anna Merlan notes, these photos don’t all come from social media, as you might expect. Some come from more obscure sources, in her case including “an enraged post someone wrote accusing me of yellow journalism, and the website of an extremely marginal conspiracy theorist who has written about me a handful of times.”

Gadgets and services track every swipe

Everything you do on every device you own is tracked. It’s not surprising that Netflix has a list of all 5,751 videos I’ve streamed, Spotify knows every track I’ve played (and favorited), Amazon has a log of exactly where I’ve abandoned reading an ebook, and Apple Music tracks how far into a song I’ve listened.

Perhaps more surprising is how much hardware tracks engagement. Amazon knows not only how often I change a page (likely so that the Kindle can display the time remaining in a chapter or book) but also whether I tap or swipe the screen to do so. I assume this tidbit gets used when Amazon designs the interface for the next generation of the Kindle. Netflix’s report gave me a variety of data beyond the videos I’ve watched, such as what I’ve searched for, how far I’ve gotten through movies, and more. This is the kind of data that Netflix mines to come up with show ideas.

Screenshot of Netflix recommended shows.
Netflix doesn’t fully understand what makes “Twin Peaks” great, instead just recommending more crime dramas.

Recommendation algorithms feel like magic when you first see their output. But when you’re given a chance to look at all the input—every bit of data you supply to companies, all in one place—the magic disappears. For example, I’ve listened to Purple Mountains a lot this year, and therefore Spotify recommends Silver Jews. I’ve searched for Zodiac on Netflix several times, and therefore Netflix assumes I’d like to watch Extremely Wicked, Shockingly Evil and Vile, or if it can’t get the rights, something similar enough. The algorithms make obvious connections, but they can’t figure out the nuance of why you liked something, which makes them no more useful than two seconds of Googling would be. What makes Zodiac one of the best movies of all time isn’t that it’s about a serial killer—it’s everything else.

What about social networks?

When you’re requesting data from Facebook (including any services it owns, such as Instagram), Google, or Twitter, each one typically redirects you to internal tools to download everything you’ve submitted to that service (such as photos on Instagram or search history in Google). Social media companies don’t offer any hidden data or insightful details about your account. If you’ve been with these services for a long time, it might be interesting to see your data all in one place, but I didn’t find it worth the effort. All these companies will show you your ad profiles, though:

How to request your own data

If you want to request your own data (or exercise any of your other rights under the CCPA), prepare to hunker down for a few hours. Some companies asked me to prove my identity with a state-issued ID, while others required photos or some sort of address validation. A few simply trusted me. Stick with what you’re comfortable with, and if a company asks for more identification than you think it needs, contact the company directly. In any case, if you don’t have a password manager, now is the time to set one up. Password managers are helpful for securely keeping track of all these requests and accounts.

As a California resident, I started with this list on GitHub and Common Sense Media’s list to find companies I’d purchased from. If you’re not a California resident, some of the biggest companies allow you to request your data regardless of where you live:

The CCPA does allow you to opt out of the sale of information (though to do so you need to fill out another form or click the Do Not Sell My Personal Information link on every single website you visit), but there’s no way to stop companies from using the data internally. And shopping in the real world and avoiding rewards cards doesn’t insulate you completely, either: Best Buy, for example, had a full profile of me even though I don’t have a My Best Buy account. Marketing companies find a personality metric embedded in every product you choose, and when you take a glimpse at that data, the underbelly of how these services work gets exposed.

Even though the CCPA adds some protections and transparency to personal data collection, it still needs improvements. Companies implement its requirements unevenly, and getting through requests is like wading through sludge, but the law represents a good first step toward the type of transparency that makes it easier for everyone to understand where their data is, how it’s bought and sold, and ultimately, what it’s worth. Having seen what so many companies collect about me, I recommend going through this data request process if you can, if not for every place you’ve ever shopped, then at least for some of the larger companies. The experience is similar to learning how to repair an item you own, where taking it apart gives you a new appreciation for how it works. Learning how your data gets collected and then moves between companies is the first step to understanding the process as a whole.

We give up more data than we’ll ever know. The CCPA shines a light on some of it, and looking at your results will make you rethink how much you give away freely in the future. It’s still nearly impossible to shut down this type of collection completely, but the more tools people get from laws like the CCPA, the more chances everyone will have to stop some of this.

Further reading

Edit