The site uses cookies that you may not want. Continued use means acceptance. For more information see our privacy policy.

Pruitt’s Data Rule and Deep Learning

(Soon-to-be former?) head of the EPA Pruitt has proposed a public data rule (RIN 2080-AA14). This could be a good rule, but it really depends on the implementation. This post focuses, briefly, on the implication for deep learning science in such a rule.

Briefly, deep learning takes normalized, record-based data and creates a mapping from input data to some per-record output determination.

Think of a phone book (the data) with individual listings (the records) and then some determination you want to do on those records. It could be something very simple (last name has n vowels) or something complicated.

The data itself may be public, but depending on the implementation of the proposed rule, making this secondary data public in any meaningful sense may be very difficult.

There are several challenges. One is simply the amount of records that may be used. Another is the trained network may be proprietary or non-portable or even dependent on custom hardware. There may also be situations where several neural networks act in tandem, each derived from a bulk of training data (some of which may itself be output from other networks), which would further complicate the data access requirements.

But there is also the question of whether the output would be public, even if published. Normally data is public when the individual measurements are available and the methodology behind those measurements is known. But there is a reasonable and inevitable blindness to the internal workings of deep learning. Trying to explain the exact function the machine has derived is increasingly difficult as complexity increases, and even if all the inputs and outputs are public, the transition function may be obscure.


Which isn’t to say that data, methods, and findings should not be replicated, peer reviewed, and subject to introspection. The EPA should, for example, draw a stricter line against carbon fuel companies and other chemical companies, requiring that more of their filings be public.

In the case of deep learning, not for the EPA’s sake, but for the sake of science itself, better rules for how to replicate and make available data and findings are needed.

Others have already pointed out the difficulty of studies predicated on sensitive personal data like medical records. But there is a general need to solve that problem as well, as the inability to examine such information may block important findings from surfacing.

This is similar to the fight over minors buying e-cigarettes online: opponents of e-cigarettes act as though there is a particular, nefarious plot by vendors, but we do not have anything close to a universal age verification system. Better to develop one for all the tasks that require it.

And so it is with the EPA rule: Congress should draft a law that allows all scientific data used by the government to be as public as is possible.

The Food and Beverage Network Issues

Arguing that the dietary epidemic hinges on the lack of role models for good diets and lifestyles.

There’s a lot of talk about food lately. The Corporation of Coca-Cola has admitted to supporting science that dictates we should all exercise a lot more.

Is it too much food? Not enough exercise? What’s a human to do?!

The sad thing is, it’s not really either one, really. It is, indirectly. Too much food and the wrong foods certainly deserve the lion’s share of blame. People should be more active, too. Exercise at any weight is important, and I’m among those who doesn’t get enough exercise.

But the real problem isn’t too much food, bad food, too little exercise. The real problem is the lack of acculturation to a healthy lifestyle. People take behavioral cues from those around them and from the media they see. In the case of media, watching a food eating contest doesn’t mean you’ll try to swallow a Buick’s worth of food. But it does mean that continuously seeing commercials for foods cue a mental response that makes you say, “Yes, that grease sandwich does look devourable, even though I ate recently.”

Moreover, nobody knows what a healthy lifestyle and diet look like anymore. Is it eating the culinary equivalent of pocket lint? I’m pretty suspicious that it just might.

Our primary cues for what to eat and how much of it to eat are from those around us. If you were raised by wolves (not saying you were, mind you, though you do have big eyes and big teeth, etc.) you would have learned a wolf diet. But if you went and lived with wolves for a few months (assuming they didn’t eat you), you would also likely adopt at least some of their dietary habits (quit gnawing that bone!).

Point is, if you go live with vegan granolites, you’ll tend to eat like them. If you join up with the Barbecuists, you’ll eat like them. But if you want to eat healthfully, whom do you join up with?

Think beer. If you know people who mostly drink Budweiser, you’re more likely to, too. If your friends and coworkers like more expensive beers, you probably do, too.

Scanning sites like Instagram and Pinterest for pictures of food won’t do much good. Even visiting a site like ChooseMyPlate.gov probably won’t help. Sites like that, meaning well and based on science, still fail to distill their wisdom into actionable behavioral changes.

Take their PDF, ChooseMyPlate.gov: PDF: “Sample 2-Week Menus”, which gives recipes and nutritional information, and it is based off of recipes developed for low-income individuals. Given the know-how and the desire to make home-cooked food, resources like that and many others are useful. But it seems likely that if home cooking were particularly common, we wouldn’t have the food-related issues in the first place. If it is, then it’s a matter of mostly replacing bad recipes with good ones.

But it seems more likely that the dietary habits shy away from home-cooking in favor of processed foods and heat-and-serve options. That’s when people eat at home, versus fast food or composing meals of snack food and junk food entirely.

In any case, it seems reasonable to assume that the key feature of better diets is more exposure to better diets around you.

FDA’s Proposed Nutritional Labeling

A look at the FDA’s proposed changes to nutrition labels.

The FDA has proposed new regulations for food labeling and determining portion size. While giving clearer information to consumers is a good first step, when will they finally ban flavored food? Just kidding (also, in solid or liquid form which both pose real dangers (e.g., choking and drowning); also, kidding).

The proposal seems good for as far as it does go (see Federal Register: for publication on 3 March 2014: Food Labelings: Revision to the Nutrition and Supplement Facts Labels to download the PDF; the serving-size proposal is a separate document and proposal (both proposals have some information combined at FDA: Press Announcements: 27 February 2014: FDA proposes updates to Nutrition Facts label on food packages)). One missing feature would be something to improve digital access to nutritional information and ingredient listings for foods.

There are, apparently, mobile applications that can do optical character recognition (OCR) to import nutrition facts, but something more universal might help both improve adoption of digitally tracking food and of the use of better physical-to-digital handling in wider industries. Also, using a digital format could keep the printed version succinct while possibly expanding manufacturers’ participation in the publishing of voluntary data.

Also noteworthy is that at-present the regulations do not require a specific font. Quoting from the proposal (pp. 251-252):

In addition, we are requesting comments on […] requiring the use of a specific font.

It also mentions (pp. 274-275) that the current regulations “[…] specify […] that the type style should be a ‘single easy-to-read type style’ but no specific type style is required. However, […] we urge that certain type styles […] be used” with a parenthetical: “i.e., Helvetica Black, Helvetica Regular, Franklin Gothic Heavy.”

Although I’ve never seen a Nutrition Facts panel in Comic Sans, I do wonder if font variability exists and how much it affects the use of OCR. Also, certain format variations (there are a number of them, even for existing labels) may make OCR very hard, including lack of opaque background (e.g., on foods wrapped in clear plastics), deforming packaging (again, most likely thin plastics).

Digitally available nutrition information could eventually lead to much simpler printed information. Some countries employ much simpler labels, usually in the form of five pips with specific data such as caloric content, fat, sodium, etc. This takes up less space than the FDA’s tabular design, integrates with packaging better, and comes across as gentler, less authoritarian.

They could also go further by setting requirements for display of the information digitally. Junk food would be required to display in Comic Sans, while organic vegan baby food would be required to display in a blackletter.

No more calories from fat, not even voluntarily. But it’s not that simple. They still allow calories from saturated fat voluntarily and it says they considered making that mandatory.

They stuck to a reference diet of 2,000 calories. Again, importing the information to a digital system would allow recalculation based on an individual’s dietary need. The printed label should be basic, but the digital display could be very much tailored to the reader. Digitizing the ingredients would also make it much easier for those with allergies and sensitivities to avoid problem foods.

On the whole it is very good to see this vital service get a reroll. The only real danger is that this step in the right direction will end up being followed by such a long pause that we won’t have readily-digitized, expanded information available on foods until around 2034.