The site uses cookies that you may not want. Continued use means acceptance. For more information see our privacy policy.

JSON Schemas and Making a Versatile Application.

Please excuse the big wad of JSON in the middle of the post.

Things have come a long way from the previous post (diehealthy.org: 27 October 2022: “Working on a Local Single Page Application.”) about working on a local file that provides a whole data tracking. While I originally wanted it to be used for games, I’d seen other JSON-schema-based stuff online and decided it could be fully schema driven. While it doesn’t yet edit its own schemas, that’s probably feasible in the long run (if not something I’ll necessarily do).

In order to further refine it toward that schema system, I forked it and made a version to track the movies I’ve watch. The post image is of that version, though they are currently at parity aside from the schema differences.

To give an idea what I mean by schema, here’s some excerpts from the film version:

{
    "name": "filmtracker",
    "defaults": {
        "filters": {
            "recorded": 1,
            "watched": 0
        }
    },
    "props": [
        {
            "name": "name",
            "title": "Name",
            "type": "string",
            "display": "headertext",
            "displaystyle": "italics",
            "sortOn": true,
            "req": true,
            "def": null
        },
        {
            "name": "recorddata",
            "title": "Record data",
            "type": "group",
            "props": [
                {
                    "name": "added",
                    "title": "Added",
                    "hidden": true,
                    "display": "fulldate",
                    "displayprefix": "Added on",
                    "type": "date",
                    "sortOn": true
                },
                {
                    "name": "modified",
                    "title": "Modified",
                    "hidden": true,
                    "display": "fulldate",
                    "displayprefix": "Updated on",
                    "type": "date",
                    "sortOn": true
                }
            ]
        },
        {
            "name": "watchdata",
            "title": "Watch info",
            "type": "group",
            "props": [
                {
                    "name": "recorded",
                    "title": "Recorded",
                    "type": "bool",
                    "display": "iffalse",
                    "displayvalue": "Unrecorded.",
                    "summaryvalue": "📅",
                    "filter": true,
                    "def": 0
                },
                {
                    "name": "watched",
                    "title": "Watched",
                    "type": "bool",
                    "display": "text",
                    "displaydepends": [
                        "recorded"
                    ],
                    "displayvalues": [
                        "Unseen",
                        "Seen"
                    ],
                    "summaryvalues": [
                        "💾",
                        "🍿"
                    ],
                    "filter": true,
                    "def": 0
                },
                {
                    "name": "watchdate",
                    "title": "Date of watch",
                    "type": "date",
                    "display": "monthdate",
                    "displaydepends": [
                        "watched"
                    ],
                    "displayprefix": "Watched",
                    "sortOn": true,
                    "sortDep": "bought",
                    "def": ""
                }
            ]
        },
        {
            "name": "filmdata",
            "title": "Film info",
            "type": "group",
            "props": [
                {
                    "name": "releaseyear",
                    "title": "Year of Release",
                    "type": "num",
                    "display": "text",
                    "displayprefix": "Year:",
                    "sortOn": true,
                    "minv": 1900,
                    "def": 2022
                },
                {
                    "name": "score",
                    "title": "Review score",
                    "type": "range",
                    "display": "stars",
                    "displaydepends": [
                        "watched"
                    ],
                    "displayprefix": "Score:",
                    "summaryvalue": "stars",
                    "sortOn": true,
                    "sortDep": "watched",
                    "minv": 0,
                    "maxv": 5,
                    "step": 1,
                    "def": 0
                }
            ]
        },
        {
            "name": "categories",
            "title": "Tags",
            "type": "strarray",
            "display": "ulist",
            "def": null
        },
        {
            "name": "notes",
            "title": "Notes",
            "type": "text",
            "display": "longtext",
            "summary": "📓",
            "def": null
        }
    ]
}

It’s quite a lot, and messy (as is the rest of the application so far), and it mixes data definitions with their presentation, but it’s mostly serviceable. But it lets me define, with the current version:

  • groups
  • numeric inputs
  • string inputs
  • url inputs (unvalidated at the moment)
  • date inputs (though I’d prefer to move to month/year or such)
  • boolean inputs (implemented as radio buttons)
  • tri inputs (radio buttons)
  • range inputs (radio buttons)
  • text inputs (text boxes)
  • string array inputs (for tags, with autocomplete!)

I’d originally gone with checkboxes and three-state checkboxes for booleans and tris, but decided it was too much trouble and having separate buttons looked nicer. The basic checkboxes were reused instead for toggling off or on the listing filters for boolean and tri fields.

It can also search in and sort the listing.


One of the nicer parts of all this is the ability to export and import from JSON, as well as being able to use the browser console to loop through and batch-modify the data. I’m not sure how feasible it would be to implement that through UI, which would be more useful to nontechnical users. But for now this is just a little project to handle my own data.

If I can get it cleaned up, I’ll probably throw a copy up here at some point, though so far I’m still iterating on it a good bit as I go through my old list of movies I watched and fill in their details. I started keeping a list at some point last year, and that was title-only, so it’s been a lot of searching, figuring out which was the movie I actually watched, trying to remember enough to give it a rating.

But the main goal is that if I decide I need a list of something, I can copy the file over, throw a schema together, and start making the list in a way that lets me update it or review it better than a text file and not requiring the ugliness of a spreadsheet. Like when I kept a list of the legislators who objected on 6 January 2021, that’s just a text file. Next time it could be a nice little application that makes the task not necessarily easier, but more useful:

  1. Get a text list of the things
  2. Some light massaging to turn it into JSON
  3. Write a quick schema
  4. Import the data
  5. Fill in any details, be able to filter, sort, search easily.

I can only imagine what data wrangling looks like in 100 years. I hope by then it’s all AI user interfaces that look pretty are functional no friction. Until then, this project has reminded me how powerful, how versatile, and above all how fairly simple, HTML plus Javascript has gotten.

Pruitt’s Data Rule and Deep Learning

(Soon-to-be former?) head of the EPA Pruitt has proposed a public data rule (RIN 2080-AA14). This could be a good rule, but it really depends on the implementation. This post focuses, briefly, on the implication for deep learning science in such a rule.

Briefly, deep learning takes normalized, record-based data and creates a mapping from input data to some per-record output determination.

Think of a phone book (the data) with individual listings (the records) and then some determination you want to do on those records. It could be something very simple (last name has n vowels) or something complicated.

The data itself may be public, but depending on the implementation of the proposed rule, making this secondary data public in any meaningful sense may be very difficult.

There are several challenges. One is simply the amount of records that may be used. Another is the trained network may be proprietary or non-portable or even dependent on custom hardware. There may also be situations where several neural networks act in tandem, each derived from a bulk of training data (some of which may itself be output from other networks), which would further complicate the data access requirements.

But there is also the question of whether the output would be public, even if published. Normally data is public when the individual measurements are available and the methodology behind those measurements is known. But there is a reasonable and inevitable blindness to the internal workings of deep learning. Trying to explain the exact function the machine has derived is increasingly difficult as complexity increases, and even if all the inputs and outputs are public, the transition function may be obscure.


Which isn’t to say that data, methods, and findings should not be replicated, peer reviewed, and subject to introspection. The EPA should, for example, draw a stricter line against carbon fuel companies and other chemical companies, requiring that more of their filings be public.

In the case of deep learning, not for the EPA’s sake, but for the sake of science itself, better rules for how to replicate and make available data and findings are needed.

Others have already pointed out the difficulty of studies predicated on sensitive personal data like medical records. But there is a general need to solve that problem as well, as the inability to examine such information may block important findings from surfacing.

This is similar to the fight over minors buying e-cigarettes online: opponents of e-cigarettes act as though there is a particular, nefarious plot by vendors, but we do not have anything close to a universal age verification system. Better to develop one for all the tasks that require it.

And so it is with the EPA rule: Congress should draft a law that allows all scientific data used by the government to be as public as is possible.

Federal Reporting Should Be Automatic

With the recent gun massacre in a church, it came to light the attacker should have been barred from purchasing firearms on the open market due to a prior conviction. And now Congress may amend the law to try to strengthen mandatory reporting. But that’s the wrong move here. Why leave open the option for someone to neglect to do the mandatory when they could require that the system be automatic?

For this and many other data issues, we still rely on some human to either file a piece of paperwork or otherwise ensure that the relevant notifications are made. That’s wrong. The existence of computerized records means that such notifications and updates should be completely automated. This includes the elimination of the need to acquire certified copies of birth, marriage, and death certificates, along with other routine and necessary data sharing. There should be a widespread effort to let computers do what they’re good at so that humans don’t have to.

With automatic reporting, mistakes will still be made by humans. There needs to be an auditing process and a corrections process. But even there, once corrected, the updates should be automatic.

We can move toward blockchain-backed systems that allow for improved recognition of where errors have occurred and been corrected. But it’s high time that we remove error-prone mandates that pass without action.