White Blood Cells for Your Online Data

Published in

Product Management for the People

8 min readMar 27, 2024

White blood cells cleaning between red blood cells

White blood cells are amazing, aren’t they? They wander around your blood stream, in between your red blood cells, checking for harmful organisms to absorb and remove. They work tirelessly to keep you free from harmful agents and organisms.

Our bodies are full of these little weirdos, doing their own thing, helping to keep us safe by cleaning up the stuff that shouldn’t be there. We’re blissfully unaware of their actions, but they play a very important role in making sure we continue to live a healthy life. Hold that thought for a moment.

The right to be forgotten

In law, at least in the UK and probably the EU, there is a right to be forgotten. You can request any company that holds data about you to disclose what they hold and have them delete those records, if you so wish.

In practice, this often reduces to a manual process, where a human being does the search for your data records and then requests somebody with database access permissions to delete your records. There frequently isn’t a user interface to allow support staff to remove all trace of you from a company’s servers, without further help or technical intervention and the request will have to be performed semi-manually, including in backups of the data.

Consequently, most people don’t bother to request their data is deleted and they have no guarantee or trust in the fact that the data really has been expunged. It all relies on trust and the honour system.

Who has your data?

In many cases, the average internet user doesn’t even know where all the data about them is kept. They may have no relationship whatsoever with data brokers and their clients, who may have extensive data about you, but you have no way of ensuring it is deleted and forgotten.

What you need, as an ordinary human being, is the equivalent of a white blood cell colony, which automatically and constantly scours the internet, following threads of API connectivity from database to database, to automatically, autonomously delete data about you, from wherever it happens to be kept, while permitting some records to remain, if that is your wish.

That couldn’t just happen by magic. It would require every holder of a database to provide a public API that both permitted your data to be discovered, then deleted, without any human having to intervene at a granular level. This API would have to prevent malicious deletion by third parties, as well has providing access to the data clean up spiders.

Cleaning up after yourself

How might this work? When a data collector asks for your data initially, you might grant them a cryptographic token to hold your data, which is recorded in a personal immutable ledger (hosted by the participants in whatever blockchain technology you choose to record your ledger and grant data access to your personal data). A maximum holding holding time could be agreed at the outset, as part of the data collection grant.

Whenever a data collector grants a downstream data processor access to your data, a similar immutable cryptographic token is recorded to sanctify that sub-processing relationship.

Then, at some subsequent data discovery time, perhaps prompted because the initial token granted for data collection has lapsed (timed out or else the relationship with the data collection entity has dissolved), or at any other arbitrary time, then querying the API provided for data discovery will find a match to the token and the data can be divulged, by machine, representing all that is held about you by the entity. The payload, of course, could be encrypted to transmit it back to the original data owner.

If the token representing the grant has lapsed, the data collector is, by law, required to delete the data held and the access token initially granted. The token deletion, again, would be recorded in the data owner’s personal immutable ledger as acknowledgement of the data discovery and/or deletion request.

If no data record key is found, but a query of personally identifiable fields reveals data exists, then the data collector is obliged, by law, to provide the user with data held matching their personally identifiable data, as an encrypted payload, and silently delete those records, thereby formally ending any data collection (whether agreed or not), by deleting the access token presented.

However, if a key is found matching the one presented to the API, the data collector is obliged, again by law, to divulge not only the data they hold, but also the keys granted to any sub-processing data processors and their associated API universal resource locators. That way, you can crawl down to those other APIs and make similar discovery and, if necessary, deletion requests.

If a data collector is found to be holding data without agreement, the regulator could compel the discovery query and result recorded in an immutable ledger held by the regulatory authority, for later examination of instances of unwarranted data collection. A data collector would not want to be recorded holding unauthorised data on a regular basis, belonging to multiple non-consenting data owners. They would also not want to be holding data beyond the agreed holding period made at data collection time.

The point of all this seeming complexity is that there is machinery to automatically log data access grants and any instructions to discover and delete data, whether or not a data agreement token is in place.

Compliance

In the event of data deletion being necessary, there are automated means to delete the data and any copies rendered to downstream data processors, under data sub-processing processing agreements granted by the data collector on behalf of the data owner.

In the event that data is held without agreement, the regulator can be automatically informed and the case reviewed, to sort repeat offenders from unintended mistakes.

As a data owner, it is, today, virtually impossible to query data aggregators and data brokers to discover what they know about you and from which sources they obtained that data. A system like the one proposed here would at least give you a fighting chance of finding those downstream data processors and having your records forgotten by them.

The extent of data abuse today

Your phone and your modern car send your location data to data aggregators, without your explicit consent. Even if they don’t right now, they could and you need protection against that.

They also (potentially) send accelerometer data to your insurer, via a data broker, so that they can adjust your insurance premiums upward, if your data indicates higher risk. Your own devices or car can spy on you to your financial detriment. You don’t even know it’s happening. Take your car to a well-supervised track day and you may find your next car insurance bill is sky high. You may never know why and have little to no recourse to challenge it. That peak track day behaviour may be used to characterise your driving habits in general.

By requiring data sub-processors to be granted an access token by the primary data collector to do so, a phone or car user could discover who is getting hold of their behavioral data and delete it. This, of course, would potentially cut off a revenue stream to your phone manufacturer, app vendor or car maker, but so it should!

Why would they?

What would compel data holders to spend money to develop and maintain an API that permits a web crawler to securely delete specific records they hold about you?

Law.

This is going to take agreement from a well-funded data protection authority, some heavyweight regulatory infrastructure and the compliance of data collectors and processors, to work.

If it were compulsory and easily policed by the regulating authority, launching “secret shopper” crawlers to try to access these APIs, both to test legitimate and malicious data access, would then reveal non-compliant data holders or those data collectors with insufficient protection around their data (i.e. your data privacy).

The crawlers could be hosted by the regulatory authority. Much like search engines (such as Google) have vast data centres, hosting banks of servers, for the sole purpose of accessing the world’s web pages and content, to index it all, the white data blood cells which crawl for your personal data could be hosted by your government. A functioning democracy is a necessary pre-condition, of course.

Those crawlers could have the functionality of discovering your data in database APIs that are encountered, recognising your personal cryptographic key to access that data, and optionally deleting any records found, which the owner wants forgotten.

If more people were interested in being forgotten after some period of time, by data holders that are holding data about them, especially inadequately anonymised data about them, then data holders would realise that providing the API for a web crawler under the control of the user would be far more cost efficient than processing the request manually, or worse, not complying with the data deletion request at all.

Caveats

There are significant technical and legal details to be worked out, by people much smarter than me, to ensure this scheme isn’t open to horrific, unintended abuse. However, if you could model a colony of data white blood cells that clean your data up after you, your personal privacy and security could potentially be greatly enhanced.

The onus would be on the data collectors and processors to prove to you why they should legitimately hold data about you over a longer period of time (or at all), not for individual users to have to manually police their data records, across vast numbers of known and unknown parties.

No automated system is perfect and data accessed by a trusted employee of a data collecting entity could always be copied down by writing it on a post-it note, without the data owner ever finding out, but at least it would be no worse than the situation that exists now (where leakage to untrustworthy trusted employees is a risk). It would, I believe, dissuade bulk data broadcasts to unauthorised parties, without a revocation mechanism for the data owners.

Professional security consultants will be able to spot all the egregious flaws in this proposed mechanism and maybe blow it out of the water as absurdly infeasible. I am not one, so I cannot. Real white blood cells in your blood stream can also become dysfunctional, with serious consequences. There is no perfect solution known. Should perfect be the enemy of good?

The purpose of this article was to start a conversation about giving people more control over who collects, processes and holds their data. It is theirs, after all. Today, they have virtually no protections of this sort.

Something to think about.

<https://upscri.be/5b4dca/>

About the author

Michael Topic is a freelance Product Manager and musician, with over thirty years experience delivering products that didn’t exist before. He welcomes contract enquiries to define new, competitive products, design them and deliver them. His speciality is software-based products.

Disclaimer

The hypothetical design ideas discussed are intended to demonstrate possibilities in product design. They are not intended to imply anything whatsoever about past or present employers, nor to infringe on any intellectual property.

About the “Possible Future Designs” Series

This occasional series of articles examines the many ways in which product management discipline can be applied to a variety of markets, to propose and examine innovative, new product opportunities.

Organisations wishing to pursue any of the ideas discussed are encouraged to contact the author to discuss potential research and development collaboration.

White Blood Cells for Your Online Data

Written by Michael Topic