What is Data Infrastructure?

In place of hot air about the definition of infrastructure, some thoughts about data infrastructure.

Data infrastructure is something like a road, or maybe a bridge. It’s a set of standards that allows data systems and consumers of data to operate without too much effort. DNS, the domain system, is infrastructure, translating readable names into numeric addresses and adding a layer of abstraction to things so that addresses can change. With paved land and spans, you don’t have to load sacks on a burro to carry goods through an untamed land, up and down hills, fording rivers, gazing at the stars and bouncing from landmark to landmark to find your way. Data infrastructure should work the same way. The modern internet needs more data infrastructure.

In email, we have the problem of spam. To deal with spam, many systems have been developed to filter it, to block spammers, and so on. It isn’t perfect, but it generally works. But we haven’t generalized that infrastructure to other problems. Some things, like anti-virus scanning of mail, uses some of the same systems, but each is generally its own thing. Other places, we see ad blockers and browsers have lists of malicious websites to warn users. Some data infrastructure exists here, but could be more generalized.

Other places, we see no real data infrastructure. Several times per year I read about journalists and others who are targets of harassment campaigns online. The social systems lack the same kind of filter technologies that email has. But it could be generalized. It should be generalized.

Identity, the ability to create a digitally-signed identity and authenticate with websites, would be a great and welcome form of data infrastructure. It has risks if it lets government censors snuff out dissent, but that already happens too often. A correctly described identity system would allow for multiple identities or multiple expressions of an identity depending on where and how it’s used.

The advertising industry already tries to create identity tokens, but users have limited control over them. Some laws get passed to try to give control, and now every website has to tell you about cookies, but you still don’t have control, you just have an extra piece of cyber garbage floating atop every website.

Infrastructure lowers friction. Building websites has a key barrier: user-sign-up. The easier it becomes for a user to sign-up, the less advantage incumbents have. That is paramount for competitiveness in many online spaces. This and other barriers are the sort that data infrastructure should break down, in the same way that the transcontinental railroad and other major infrastructure projects opened up lands for new cities and new economies.

Other commerce-related data infrastructure changes would be welcome. One is simple resource links that are platform-agnostic. They would let you link to a song without linking to a particular music service. Or link to a video game without pointing at a specific store. That kind of infrastructure helps to allow competition without forcing fans to show favorites or act as advertisers without their consent.

As the internet matures, opportunities arise to define and build data infrastructure. It took humans thousands of years to figure out roads and city planning (and we still get it wrong sometimes), but as we settle into patterns of use and behavior, and as we continue to have more data capacities, we need to be looking at ways to generalize our tools into outright infrastructure that supports the smooth flow of activities of all sorts.

What are the internet’s rivers that we can send boats down? Where are its mountains, requiring us to seek passes or blast tunnels? What will be the critical pieces of infrastructure that let computers do more work, rather than pretend to be a fancy form of paper?


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.