Big Data Needs Variables

Many concepts from Mathematics, Computer Science, and programming should be leveraged to improve our social/market interactions. One such concept is the variable.

You may know one from algebra as x. x = 3; x + y = 7; y = 4. But now x = 11, which means x + y = 7; y = -4.

A sensible, successful information society depends on proper segregation and apportionment of data. But you wouldn’t know it based on the governments’ and corporations’ attitudes towards our data.

What do I mean by segregation of data? I mean that certain information is need-to-know. For example, a corporation does not necessarily need to know my physical address, my e-mail address, my phone number, my date-of-birth, et cetera.

Why do they ask for these things, then? Because they don’t have an alternative choice.

Why should they want an alternative? Look at my examples above and you will notice that except for date-of-birth, they can and will all change from time to time.

What they should want, in lieu of an e-mail address or credit card or other billing data: a variable.

A credit card’s processing information is sort of like a variable. You can pay it off with cash (if the issuing bank is local to you), check, other credit, et cetera. But in its use, it has become largely become a value in itself. It expires, and knowledge of it is treated as authorization to charge to it.

A variable names a piece of ephemeral data. You can e-mail me at variable@variable.invalid (which might as well be programmatically generated for our purposes), and that can then point to my current e-mail.

A strong variable system can mean that I control the value on an ongoing basis, while depending parties don’t have to worry about me updating their copy of my data when it changes.

Have you ever changed e-mail addresses and had to go to umpteen different online accounts to change it? Maybe not if you’re young enough to always have only had one account, but if you’re old enough to have seen your primary e-mail change from, say, hotmail, to gmail, and maybe something else in the future (eg, an employer-based mail or some secure alternative, at least for some uses), you know that pain.

That needless pain, which harms the corporation just as much. Because they see some value in knowing how to contact you, but not enough to recognize the real and profound risk they are placing on themselves by not looking toward a variable-based solution.

Okay, but I mentioned something about apportionment of data. What’s that? The data should have a home, and maybe a vacation place or safe house. But it should not live everywhere. A thousand copies of data that do not follow ACID (Atomicity, Consistency, Isolation, Durability) — in this case the C-as-in-Consistency, is begging for pain.

You want data to be properly allocated across the world for security and privacy, too. If you let the data seep all over, that’s a lot of targets to get your information from.

Other benefits include being able to seamlessly transition between services. The next gmail will have a harder time making a dent in the market, when everyone has to change their services to point to their new shinymail address.

The benefits of variables currently get ignored by big businesses, because they think that their database values outweigh the costs. But my guess is if you look at aging databases, like MySpace or Hotmail, they lost opportunities more than they ever monetized their databases.