Big Data Needs Variables

Many concepts from Mathematics, Computer Science, and programming should be leveraged to improve our social/market interactions. One such concept is the variable.

You may know one from algebra as x. x = 3; x + y = 7; y = 4. But now x = 11, which means x + y = 7; y = -4.

A sensible, successful information society depends on proper segregation and apportionment of data. But you wouldn’t know it based on the governments’ and corporations’ attitudes towards our data.

What do I mean by segregation of data? I mean that certain information is need-to-know. For example, a corporation does not necessarily need to know my physical address, my e-mail address, my phone number, my date-of-birth, et cetera.

Why do they ask for these things, then? Because they don’t have an alternative choice.

Why should they want an alternative? Look at my examples above and you will notice that except for date-of-birth, they can and will all change from time to time.

What they should want, in lieu of an e-mail address or credit card or other billing data: a variable.

A credit card’s processing information is sort of like a variable. You can pay it off with cash (if the issuing bank is local to you), check, other credit, et cetera. But in its use, it has become largely become a value in itself. It expires, and knowledge of it is treated as authorization to charge to it.

A variable names a piece of ephemeral data. You can e-mail me at variable@variable.invalid (which might as well be programmatically generated for our purposes), and that can then point to my current e-mail.

A strong variable system can mean that I control the value on an ongoing basis, while depending parties don’t have to worry about me updating their copy of my data when it changes.

Have you ever changed e-mail addresses and had to go to umpteen different online accounts to change it? Maybe not if you’re young enough to always have only had one account, but if you’re old enough to have seen your primary e-mail change from, say, hotmail, to gmail, and maybe something else in the future (eg, an employer-based mail or some secure alternative, at least for some uses), you know that pain.

That needless pain, which harms the corporation just as much. Because they see some value in knowing how to contact you, but not enough to recognize the real and profound risk they are placing on themselves by not looking toward a variable-based solution.

Okay, but I mentioned something about apportionment of data. What’s that? The data should have a home, and maybe a vacation place or safe house. But it should not live everywhere. A thousand copies of data that do not follow ACID (Atomicity, Consistency, Isolation, Durability) — in this case the C-as-in-Consistency, is begging for pain.

You want data to be properly allocated across the world for security and privacy, too. If you let the data seep all over, that’s a lot of targets to get your information from.

Other benefits include being able to seamlessly transition between services. The next gmail will have a harder time making a dent in the market, when everyone has to change their services to point to their new shinymail address.

The benefits of variables currently get ignored by big businesses, because they think that their database values outweigh the costs. But my guess is if you look at aging databases, like MySpace or Hotmail, they lost opportunities more than they ever monetized their databases.


What’s the Brain?

One thing that we all have in common is our brains.  We all have these immensely powerful computers in our heads.  For perspective, the total of humanity’s computing power (including things like cell phones) is probably a little more than one brain.  Understanding the brain is important to operating it, so today’s post focuses on the brain.

Let’s start with an exercise.  The word brain.  Your eye falls across the word, sending the pattern of shapes into the brain.  Inside your brain, neurons search for a pattern they recognize.  They pass that information to other neurons which are looking for other patterns.  And so on.

What results is a kind of thunderstorm of activity, with many neurons firing in several discrete parts of the brain.  Some of the patterns are discarded, others are highlighted.  Finally your conscious mind is awash in a set of believed-good patterns:

  1. The recognition of the word brain in the environment.
  2. The associations, including possibly a mental picture of the human brain’s physical existence as a folded mass of neurons, also including similar words like brian, rain, etc.
  3. The relative physical location of your own physical brain behind your eyes and between your ears.

Your brain is pulling up patterns constantly.  It is analyzing your visual field for movement, color.  It’s listening for patterns and breaks in the auditory environment.  It’s pulsing out to your body and receiving feedback like an itch on your nose or weight distribution on your feet.  It’s balancing you upright (when you’re standing).

The brain is a pattern recognition wizard.  And it improves with practice.  That’s how chess masters earn that name: through repetition of action, their brains rewire themselves to match chess patterns quickly and see the game as a sequence or cascade of questions (ie, a decision tree) rather than as a single move at a time.

Your brain may not be a chess master’s, but you have pretty good reading skills.  Your brain is matching patterns as you read this, deriving meaning from the shapes and sequences.

Your brain also has the concept of reward.  The neurons that match patterns well, or that quash invalid matches result in different chemicals released in the brain.  Indeed, the brain tries to make success occur, tries to be rewarded.

But the brain is mostly operating below the conscious level.  And it’s doing a ton of work to keep recompiling itself into a more useful and more functional device in reaction to your environment.  One of the primary tools the brain uses to that end is sleep.  A good night’s rest allows your brain to improve its efficiency and keep the patterns coming smoothly.

One thing you may notice is, upon learning a new videogame for example, you may be better the second day, after you have slept.  Your brain will optimize its pattern systems to the particular controls, physics, and visual and auditory stimuli of that game.  Or if you’re learning to cook, your brain will do the same for chopping onions or estimating cook times.

Buddhists meditate because of neuroscience.  The meditation process is initially about feeding the brain white noise.  A completely placid input, which results in the pattern matchers growing very calm and quiet. That’s because over time in normal use the brain’s pattern matchers throw all sorts of patterns up the chain that should be suppressed.  They’ve become overactive, due to the pace of life and the abundance of stimuli.

Meditation allows the matchers to sort of reset, but they also let the higher-level matchers to perform functions similar to those during sleep.  The brain rewires itself.

At some level of pattern matching, the brain uses higher level beliefs to suppress or highlight patterns.  That’s why rival football fans, watching the same game, actually see different outcomes throughout.  Their brains are suppressing what the other side sees, due to higher level beliefs about their team and the other team.  It’s the equivalent to a color filter, which highlights clashing colors and masks similar colors.

The brain would normally satiate itself on certain inputs.  For example, if you play with a dog with the same toy in the same way over and over, it will get tired of it.  However, usually it won’t because there’s enough natural variation (and social feedback) to keep it interested.  That’s why football fans don’t stop watching: there’s enough variation (and they have their brains tuned to find certain patterns interesting).

Music.  Your brain likes music.  It likes new music for the novelty of the patterns.  It likes old music for both the dependable patterns it evokes and the various memories/associations that music has.  It likes its own music, too.  Studies have shown the brain produces an internal music during sleep, likely to help with the bookkeeping that goes on.  Studies have also shown that insomnia can be partly alleviated by recording the brain’s music and playing it back when someone is trying to fall asleep.

The brain is damned intriguing.  I think I’ll stop here for now as my brain seems to have exhausted its current thoughts about its kind of system.  There’s a lot more to say, though.


Bin Packing

I wrote a Java program to begin analyzing a certain aspect of bin packing my professor mentioned last semester. The version of bin packing investigated is as follows:

Bins hold up to 1.0 of ‘straws’

The size of straws is constrained as low as 0.1 (in my case I used steps of 0.5, so 0.5) and as high as 1.0

Each straw is passed one at a time and placed into the first bin it fits in (or in the case it fits in no existing bin, a new bin is created)

Waste is defined as the amount of empty space in a bin (and therefore the total amount of wasted space across bins for a given run)

Now, the weirdness was a result he encountered when he examined bin packing constrained at 0.1, 0.2, …, 0.9, 1.0 straw sizes. According to him there was a strange lump at 0.8 where there was more waste than at higher and lower values. That is, a maximum formed at 0.8.

My data (below) on a single run reflects similar results. Total waste peaks at 0.8 with a value of 3,263.027; average waste per bin peaks at 0.7 with a value of 0.218; Total bins and average straws per bin do not peak at an unusual place but at the maximum straw size.

These results are typical of my trials (roughly a few dozen). I’ve not yet added functionality to gather and process the data over many trials. The data below represents 20 runs each on 30,000 straws randomly chosen to be greater than zero and less than the straw constraint for that run.

I’ll add a follow-up story to this when I have added the ability to automatically gather the data for runs and created some graphs. I’ll also release my source at that time. And, if I’m feeling lucky I may try to give some simple analysis of why I think this occurs.

A CSV looks to be the easiest way to create the file output from the program and then I’ll import it into OpenOfficeOrg Calc to create graphs.

=== Maximum Straw Size: 0.05 ===
Total Waste: 14.195
Average Waste per bin: 0.019
Total Bins: 766
Average Straws per bin: 39.164

=== Maximum Straw Size: 0.1 ===
Total Waste: 51.983
Average Waste per bin: 0.033
Total Bins: 1552
Average Straws per bin: 19.33

=== Maximum Straw Size: 0.15 ===
Total Waste: 119.784
Average Waste per bin: 0.05
Total Bins: 2373
Average Straws per bin: 12.642

=== Maximum Straw Size: 0.2 ===
Total Waste: 216.584
Average Waste per bin: 0.067
Total Bins: 3216
Average Straws per bin: 9.328

=== Maximum Straw Size: 0.25 ===
Total Waste: 338.669
Average Waste per bin: 0.083
Total Bins: 4080
Average Straws per bin: 7.353

=== Maximum Straw Size: 0.3 ===
Total Waste: 489.451
Average Waste per bin: 0.099
Total Bins: 4961
Average Straws per bin: 6.047

=== Maximum Straw Size: 0.35 ===
Total Waste: 691.107
Average Waste per bin: 0.117
Total Bins: 5928
Average Straws per bin: 5.061

=== Maximum Straw Size: 0.4 ===
Total Waste: 941.426
Average Waste per bin: 0.136
Total Bins: 6934
Average Straws per bin: 4.327

=== Maximum Straw Size: 0.45 ===
Total Waste: 1,206.66
Average Waste per bin: 0.151
Total Bins: 7968
Average Straws per bin: 3.765

=== Maximum Straw Size: 0.5 ===
Total Waste: 1,470.779
Average Waste per bin: 0.163
Total Bins: 9013
Average Straws per bin: 3.329

=== Maximum Straw Size: 0.55 ===
Total Waste: 1,808.49
Average Waste per bin: 0.179
Total Bins: 10103
Average Straws per bin: 2.969

=== Maximum Straw Size: 0.6 ===
Total Waste: 2,171.645
Average Waste per bin: 0.195
Total Bins: 11135
Average Straws per bin: 2.694

=== Maximum Straw Size: 0.65 ===
Total Waste: 2,580.432
Average Waste per bin: 0.209
Total Bins: 12339
Average Straws per bin: 2.431

=== Maximum Straw Size: 0.7 ===
Total Waste: 2,926.006
Average Waste per bin: 0.218
Total Bins: 13405
Average Straws per bin: 2.238

=== Maximum Straw Size: 0.75 ===
Total Waste: 3,094.46
Average Waste per bin: 0.217
Total Bins: 14245
Average Straws per bin: 2.106

=== Maximum Straw Size: 0.8 ===
Total Waste: 3,263.027
Average Waste per bin: 0.213
Total Bins: 15297
Average Straws per bin: 1.961

=== Maximum Straw Size: 0.85 ===
Total Waste: 3,257.898
Average Waste per bin: 0.205
Total Bins: 15870
Average Straws per bin: 1.89

=== Maximum Straw Size: 0.9 ===
Total Waste: 3,244.492
Average Waste per bin: 0.195
Total Bins: 16675
Average Straws per bin: 1.799

=== Maximum Straw Size: 0.95 ===
Total Waste: 3,231.088
Average Waste per bin: 0.185
Total Bins: 17501
Average Straws per bin: 1.714

=== Maximum Straw Size: 1.0 ===
Total Waste: 3,037.173
Average Waste per bin: 0.168
Total Bins: 18119
Average Straws per bin: 1.656