Categories
software

Numerical Step Function in Python

The classic range() (and in 2.* series, xrange()) is useful for getting an iterator of numbers. Its full signature is: range(start, stop[, step]).

So you can do, e.g., range(5, 10) = [5, 6, 7, 8, 9].

Or you can do, e.g., range(6, 13, 3) = [6, 9, 12].

But as far as I know there’s not an easy, built-in way to iterate over a range-like set of integers defined both by the range and a number of parts desired.

An example: you want five evenly-distributed numbers starting with 1 and ending with 10. So something like [3, 5, 6, 8, 10].

In my case, I had two similar use-cases. The first was the example above: a semi-arbitrary set of values in a range. I didn’t need to strictly include the endpoints in the values, but wanted a decent distribution of the values between start and end.

The second case was a little different, in that I wanted the range to always include the ends (it was a case of covering a whole range), but I also wanted to know how much each “step” over the range amounted to.

In the naïve version of this case, you don’t need the magnitude, as you could cheat and throw an extra piece in to account for slight differences (e.g., 11 / 3 => 1, 4, 7, 10 with the last piece being 10 through 11).

But there’s a nice way to evenly distribute the extra pieces: using rounding of the fractional value to distribute the extras.

Example:

10 / 4 = 2.5
0 * 2.5 = 0.0
1 * 2.5 = 2.5; round(2.5) = 2
2 * 2.5 = 5.0
3 * 2.5 = 7.5; round(7.5) = 8
[0, 2, 5, 8]

(In Python, round(number[, ndigits]) of n.5 goes to the even side (when using ndigits=0 or with a single argument).)

In this case, the caller could buffer the previous value and calculate the gap/step itself, but this is Python, so we might as well give it a mode to get that itself.

Without further ado, this is what I came up with:

def equal_parts(start, end, parts, include_step=False):
    part_size = (end - start) / float(parts)
    for i in range(parts):
        part = start + round(part_size * i)
        step = start + round(part_size * (i + 1)) - part
        if include_step:
            yield (part, step)
        else:
            yield chunk

It’s messier than it needs to be, due to its dual-use nature. It’s arguably cleaner to have a second function that would handle the include_step=False case:

def equal_parts_only(start, end, parts):
    for part, step in equal_parts(start, end, parts):
        yield step + part

That function would remove the conditional business at the end of the original equal_parts:

def equal_parts(start, end, parts):
    part_size = (end - start) / float(parts)
    for i in range(parts):
        part = start + round(part_size * i)
        step = start + round(part_size * (i + 1)) - part
        yield (part, step)

In the stepless version, it’s got another nice property: what if you do equal_parts(0, 10, 11)? You get: [1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10]. That’s a nice property: getting more parts than integers in the range.

I wrote a GIMP plugin to create stepped (or random) gaussian blurs on an image. The stepless version lets me create the set of blur levels, while the step-including version lets me properly select (mostly-)even parts of the image.

Here’s an image that used this plugin containing a dual use of this function: Sample image of gaussian blur in sections

If anyone wants a copy of the plugin, let me know and I’ll put it on Github or such.

Categories
software

Pondering a Query Language for Mail

Query languages are typically used for databases. The basic formula of a query follows subject-predicate form: which entries are targeted and what to do with them.

The simplest form is “print (or return) all entries in this database.”

When you’re dealing with a single piece of mail, you don’t really need a query language. It’s just a matter of directly applying predicates to it.

But when you want to do batch operations, it seems like a query language would be much more useful.

Thankfully, there’s always python. With a few lines of this language you can do the equivalent to what you would with a query language.

Doing something like (untested):

import mailbox
import datetime

# factory to None to avoid it being an rfc822.Message
md = mailbox.Maildir('~/Mail', factory=None)
for key, mail in md.iteritems():
    if not mail['Subject']:
        date = datetime.datetime.fromtimestamp(mail.get_date())
        mail['Subject'] = "[Received on {0}]".format(date)
        md[key] = mail

This will set any mail missing a subject to have it say when it was received as the subject. That’s probably not very useful, but for other things it might be.

But it’s still a bit complicated, in that you need to know python and if you need to operate on several folders, it gets more complicated.

Also, given the number of systems mail might have passed through, all the different filters and handlers, each which may tack on its own special headers, mail is rather messy. Add to that the fact that you may need to leave them untouched for reasons of regulatory compliance and/or security, and the medium becomes a great joy to deal with.

My guess is that separating e-mail into the usable copy and the canonical copy is the best strategy. Every mail gets thrown in a read-only archive that’s untouched and only accessed to pull out of, and another copy goes to the user where they can mangle it as much as they please.

I’ll also take issue with the interfaces used for mail. There has to be a better way to build an interface for cases where people have to manually sort through many items.

A few thoughts on that:

  • Don’t necessarily show the user the whole mail as one piece. By breaking up subject, from/to, body, etc. and showing these independently, it may allow the user to make better judgments. They don’t have to look at the subject and the from/to, two types of data, in rapid succession.
  • Reshow the mail several times. By having it appear more than once, the user has less pressure to get it right in one go. In a normal interface, if they misfile the mail, it will be permanently misfiled. If they know they will see it a few more times, they can give their best guess and the system can sequester mail that has conflicting filings over multiple passes.
  • Show progress. The user seeing that they are on page X of Y has some idea of progress, but most systems, they’ll stay on the first page and the Y will diminish. They won’t have as clear an idea of how much they’ve accomplished.
  • Let modes be modes. While some activities eschew modal uses, others thrive as modes. Manual sorting is such an activity. Instead of trying to have the same interface function for both browsing and sorting, a dedicated sorting mode is perfectly fine.

When used with a query language, such a sorting activity might just be to weed out any false positives. Other times, the query language would be used to feed into the sorting, with sorting being the predicate.

It’s important to recognize that the use for e-mail can apply equally to any sequence of complex artifacts. It could be news, search results (web pages, library books, etc.). Currently, you get a large set of ordered results from services like Google, which you have little ability to easily cull out of. While the engineers work hard to guarantee the relevance of items, often a further search within the set would be easier left to the user when equipped with more advanced tools.

Categories
linux

Tips for Python Gtk2 to Gtk3

Today I’ll look at some tips for making a port of a python application using GTK+ 2 to use GTK+ 3.

The first thing you’ll want to do is read over GNOME: Live: PyGObject/IntrospectionPorting. That resource includes a link to pygi-convert.sh, which is a shell script that tries to do a bulk of the menial replacements in your code automatically.

I found that script to do a pretty good job.  It misses some things, such as import lines that include multiple modules, but the only place it made a mistake when I used it was with Gtk.Label, trying to add extra parameters unnecessarily. One thing it did miss was calls to Gtk.pack_{start,end} which now requires padding to be specified.

But this got things to a near-working state.

You’ll want to open a Python shell so you can do some manual inspection of GTK objects. For that you’ll almost definitely want something similar to:

import re
def grep(obj, string=None, insensitive=False):
    obj_dir = dir(obj)
    if string is None:
        print(obj_dir)
    else:
        if insensitive:
            expr = re.compile(string, re.IGNORECASE)
        else:
            expr = re.compile(string)
        for text in obj_dir:
            match = expr.search(text)
            if match is not None:
                print(match.string)
>>> from gi.repository import Gtk
>>> grep(Gtk, 'bar')
HScrollbar
HScrollbarClass
Scrollbar
ScrollbarClass
Statusbar
StatusbarClass
StatusbarPrivate
Toolbar
ToolbarClass
ToolbarPrivate
ToolbarSpaceStyle
ToolbarStyle
VScrollbar
VScrollbarClass
>>> grep(Gtk, 'bar', True)
HScrollbar
HScrollbarClass
InfoBar
InfoBarClass
InfoBarPrivate
MenuBar
MenuBarClass
MenuBarPrivate
ProgressBar
ProgressBarClass
ProgressBarPrivate
STYLE_CLASS_INLINE_TOOLBAR
STYLE_CLASS_MENUBAR
STYLE_CLASS_PRIMARY_TOOLBAR
STYLE_CLASS_PROGRESSBAR
STYLE_CLASS_SCROLLBAR
STYLE_CLASS_SIDEBAR
STYLE_CLASS_TOOLBAR
Scrollbar
ScrollbarClass
Statusbar
StatusbarClass
StatusbarPrivate
Toolbar
ToolbarClass
ToolbarPrivate
ToolbarSpaceStyle
ToolbarStyle
VScrollbar
VScrollbarClass

This will make your life easier, as often looking in the same object, the name of a function has changed, and it can be faster than consulting the online documentation (though you will probably still have to follow up with that, if the signature has changed).

For the online documentation, the main resource you’ll use is the GNOME Developers: GTK+ 3 Reference Manual, as Google Search and other search engines tend to have a lot of GTK+ 2 cruft sitting there to waste your time. But there’s also a promising bit of work at ReadTheDocs.org: The Python GTK+ 3 Tutorial. While it’s brief as of this writing, it’s a decent, hardworking resource, very clean, etc.

You’ll also want to do a grep -rn 'except:' [path] or possible a more sophisticated alternative to find out where in the code you are working with are exceptions being handled. This is especially useful if what you’ll find is:

try:
    foo()
except:
    pass

Those are the worst, you’re wondering why something isn’t working, when it turns out that foo() isn’t being allowed to tattle. Censorship sucks, even in code. Python needs whistleblower protection! Don’t let them silence your flaws, it will come back to bite you in the ass.

One place where things have changed and maybe are still changing that I noticed was GdkPixbuf and other graphics-related things. These are moving toward cairo. In those cases you’ll need to dig into the pycairo docs as well, but chances are you won’t have to go too deep or the component you’re working with would’ve been written calling cairo from the start.

Another thing to remember is that the way the original developers did it may have been due to earlier versions of GTK+ not supporting something it now supports. In those cases, you can save yourself some trouble if you can realize that and make the change.

Categories
software

WIP: A Python Eye

Today I decided to play around with PIL a bit, working on a python eye. Basic concept is to take an input image, push it through the eye, and output what the eye sees. At present I’ve only got a rough version working, and it doesn’t yet handle proper blending, but if the itch stays around I may work on it some more. For now here’s the output on a test image (found via Flickr’s Creative Commons search):

The original image (under a CC-attribution license), it’s by Flickr user auntielaurie: Mural

The output:

Like I say, it’s nothing special right now. It doesn’t even send duplicative/overlapped data to multiple neurons so that they can properly compute excitation/attenuation. And the color/intensity processing are just very rudimentary. And it’s not even threaded. And so on.

Indeed, right now it’s basically a very slow image-specific cp, that doesn’t even keep the colors and intensities correctly.

But it’s still fun.

Categories
software

Python and Simple Web Use

Another post today about some fun with a desktop python application. Today the situation is dealing with the web. A short look at two of the tools of the trade, urlparse and urllib2.

Let’s start with a look at URIs in structure. This is probably familiar to most web developers, but it’s worth reviewing anyway.

[scheme:][//[user:pass@][hostname.][:port]][path][?query][#fragment]

Together the [//[user:pass@][hostname.][:port]] is the netloc. But the password portion (IIRC) is deprecated for security reasons.

Relative URIs do not include a netloc, and are meant to be formed according to a base URI, which is the same base URI as the document they are referenced from. The recipe for getting the base URI is to take the document URI and chop off anything before the last path delimiter (ie, /).

Python offers the urlparse module for interacting with URLs. In this case, the urlparse.urljoin() method serves us well. You can simply pass it the document URI and the relative URI of a reference, and it will give you the full URI for the reference.

>>> import urlparse
>>> urlparse.urljoin('http://www.example.com/python/urlparse.html', 'urlparsedemo.html')
'http://www.example.com/python/urlparsedemo.html'

But there is a catch:

>>> urlparse.urljoin('http://www.example.com/python', 'urlparsedemo.html')
'http://www.example.com/urlparsedemo.html'

You have to get the right baseURI, which means that the non-file path needs that trailing /. And it may not be present in a reference to the page, but it should be present if you actually load the page.

That brings us to the urllib2 module (though you could use a module that uses curl‘s library, or Twisted, etc.). You can easily load a document over HTTP using urllib2.urlopen(). It returns the response object, which you can then response.read() or other similar reading methods, but you can also get the URL actually used for the response.

As example.com happens to be unresponsive for me at this moment, I will be using google.com for this example:

>>> import urllib2
>>> response = urllib2.urlopen('http://google.com')
>>> response.geturl()
'http://www.google.com/'

We see that Google redirected the request to the www subdomain. Here’s another example that happens to occur on the link from their current homepage:

>>> urllib2.urlopen('http://google.com/nexus').geturl()
'http://www.google.com/nexus/'

Note that the actual URL has the ending slash, which means that relative references on that page will use that in the base URI. Without it, you would get a series of response codes 404 for the CSS and images that are relatively referenced on that page.

The main downside to using urllib2 is that you don’t get any kind of built-in caching. You can build your own caching, but before long you’re building all sorts of infrastructure beyond your small project. This is why I still believe the long-term future of the web on the desktop is having dedicated services for things like HTTP, with some capacity to bypass them through the browser. Having a service to handle the HTTP for a small Python application would save a lot of trouble, but would also let you have multiple browsers without the redundant caching and requests.

Furthermore, such a service could still have a permissive API that would allow direct, one-off loading of resources for situations like stale cache or cache control headers that want different behavior.