Categories
hyperweb

Thoughts on Designing a Vouched-By System for the Web

Motivation

As an avid reader of several planets, as well as a user of the various whitelist and blacklist extensions for Firefox, I sometimes must make allowances for sites that I implicitly trust by virtue of their belonging to the planets.

So, let’s first look at that trust.  I trust the organizations that administer the planets to only syndicate trustworthy blogs, and if a blog is found to be untrustworthy, I trust the planet will remove it (at least until whatever problem has been fixed).

In other words, I’m willing to give the syndicated blogs the benefit of the doubt for the whitelists in the extensions I use, but that adds up to at least 500 blogs, which is more than I would manually wish to add and maintain (as membership changes in planets).

These planets offer what may be a unique opportunity for trust, where something like a <meta name="vouched-by" content="http://planet.example.com/vouched-by/"/> element in blogs syndicated to planets.

What are the requirements for such a system?

First, it would be convenient if it could be foregone in favor of using the referrer data, but feed readers wouldn’t handle that, and visiting a blog from a sibling on the same planet would also not support that.

It seems a good candidate for using page metadata. The metadata requires specifying who (in the form of a URL) vouches for the page or site.  From there, an extension would ask the browser what metadata (or if that specific piece of metadata) exists, and if so, would query the claimed vouching site to verify.  Once verified, the appropriate access would be granted to the vouched site.

This would also allow for the user to report a dysfunctional member site to the vouching site, helping with maintenance of community/planet standards.

Drawbacks

The main drawback to such a system would be the overhead.  The participating sites and the planet would need to implement this.  So would the extensions that use whitelists.  The extensions would have to complicate their whitelists by allowing users to specify that particular sites would be given authority to vouch for other sites in this way.

Another drawback is that It may be a use-case that’s limited to this specific type of community.  This isn’t necessarily a killer, because none of the implementation requires true standards changes.  But it may be hard to justify adding complexity to the extensions that serve a broader community just for the few users that would like this support.

One alternative that avoids that is to add the new functionality to a separate, companion extension that handles voucher discovery and updates the whitelist data accordingly.  The main trouble here would be that not all whitelisting extensions support the concept of a temporary addition, which adds extra complexity to the bookkeeping responsibility for the supplementary extension.

Conclusion

This was more of a thought experiment than a serious proposal.  I believe that, while there may be limited use for this specific idea outside of community planets, there are many applications for vouching-based access authority in the technology world that aren’t currently used.

Access control will be a key consideration as new technologies create links between existing technologies for the added convenience and utility of all.

Categories
software

Python and Simple Web Use

Another post today about some fun with a desktop python application. Today the situation is dealing with the web. A short look at two of the tools of the trade, urlparse and urllib2.

Let’s start with a look at URIs in structure. This is probably familiar to most web developers, but it’s worth reviewing anyway.

[scheme:][//[user:pass@][hostname.][:port]][path][?query][#fragment]

Together the [//[user:pass@][hostname.][:port]] is the netloc. But the password portion (IIRC) is deprecated for security reasons.

Relative URIs do not include a netloc, and are meant to be formed according to a base URI, which is the same base URI as the document they are referenced from. The recipe for getting the base URI is to take the document URI and chop off anything before the last path delimiter (ie, /).

Python offers the urlparse module for interacting with URLs. In this case, the urlparse.urljoin() method serves us well. You can simply pass it the document URI and the relative URI of a reference, and it will give you the full URI for the reference.

>>> import urlparse
>>> urlparse.urljoin('http://www.example.com/python/urlparse.html', 'urlparsedemo.html')
'http://www.example.com/python/urlparsedemo.html'

But there is a catch:

>>> urlparse.urljoin('http://www.example.com/python', 'urlparsedemo.html')
'http://www.example.com/urlparsedemo.html'

You have to get the right baseURI, which means that the non-file path needs that trailing /. And it may not be present in a reference to the page, but it should be present if you actually load the page.

That brings us to the urllib2 module (though you could use a module that uses curl‘s library, or Twisted, etc.). You can easily load a document over HTTP using urllib2.urlopen(). It returns the response object, which you can then response.read() or other similar reading methods, but you can also get the URL actually used for the response.

As example.com happens to be unresponsive for me at this moment, I will be using google.com for this example:

>>> import urllib2
>>> response = urllib2.urlopen('http://google.com')
>>> response.geturl()
'http://www.google.com/'

We see that Google redirected the request to the www subdomain. Here’s another example that happens to occur on the link from their current homepage:

>>> urllib2.urlopen('http://google.com/nexus').geturl()
'http://www.google.com/nexus/'

Note that the actual URL has the ending slash, which means that relative references on that page will use that in the base URI. Without it, you would get a series of response codes 404 for the CSS and images that are relatively referenced on that page.

The main downside to using urllib2 is that you don’t get any kind of built-in caching. You can build your own caching, but before long you’re building all sorts of infrastructure beyond your small project. This is why I still believe the long-term future of the web on the desktop is having dedicated services for things like HTTP, with some capacity to bypass them through the browser. Having a service to handle the HTTP for a small Python application would save a lot of trouble, but would also let you have multiple browsers without the redundant caching and requests.

Furthermore, such a service could still have a permissive API that would allow direct, one-off loading of resources for situations like stale cache or cache control headers that want different behavior.