Posts tagged "python"

Filtering lists in Python, Ruby, and JavaScript

Recently I listened to Gary Bernhardt comparing Python and Ruby. In the talk Gary states that he finds Ruby code ugly and Python code beautiful. He then goes on to say that the things which reduce Ruby's aesthetic appeal are the very things which allow Ruby to do beautiful things impossible in Python.

Gary provides several examples of equivalent code in Python and Ruby to highlight situations in which one language reads better than the other, such as the following.

'\n'.join(obj.name
    for obj in (
        repository.retrieve(id)
        for id in ids)
    if obj)

ids.map do |id|
  repository.retrieve(id)
end.compact.map do |obj|
  obj.name
end.join('\n')

The Ruby code (the one beginning with ids.map) reads top to bottom and is easy to follow. The Python code is equally succinct but takes a bit of effort to decipher.

I've been greatly enjoying the act of writing JavaScript lately, so simply for pleasure I worked out the JavaScript equivalent.

My first attempt used the filter array method.

ids.filter(function (id) {
    var obj = repository.retrieve(id);
    return obj && obj.name;
}).join('\n');

filter, though, just removes from an array the items which fail the provided "test". So the code above is on the right track, but fails to produce a list of names.

reduce is the correct method for the job. reduce "reduces" an array to a single value, which could be a string, an object, another array — whatever!

Note the empty array ([]) on line 5 – that's our "accumulator".

ids.reduce(function (ids, id) {
    var obj = repository.retrieve(id);
    if (obj && obj.name) ids.push(obj.name);
    return ids;
}, []).join('\n');

Not bad. It's not as elegant as the Ruby code, but it's not "inside out" the way the Python code is.

Self-caching functions in JavaScript and Python

Earlier I wrote some code which repeatedly calls a function which performs a database query – often the same query. This encouraged me to explore various ways to cache the results of function calls in both Python (to solve my immediate problem) and JavaScript (because I find that language endlessly fascinating).

I played around with Fibonacci, which is a well suited to the task: it can be described in just a couple of lines of code yet benefits enormously from caching due to its recursive nature.

JavaScript Fibonacci without caching

function fibonacci(n) {
    if (n <= 1) return n;
    return fibonacci(n - 2) + fibonacci(n - 1);
}

Python loops can have else clause?!

I write a lot of Python. I also write a lot of JavaScript. As I switch between the two (often several times in a day) I sometimes find myself trying to do something in one using the syntax of the other. The most common example is joining a list.

# Python

' '.join(['foo', 'bar'])

// JavaScript

['foo', 'bar'].join(' ')

Often — as is the case above — the syntactical differences are minor, but there are times when there's no direct translation.

MooTools, for example, adds the every method to the Array object. This makes it possible to write some rather terse conditional statements.

var numbers = [87, 33, 21, 75];
if (numbers.every(function (n) { return n % 3 == 0; })) {
    window.alert('The numbers are all divisible by 3.');
}

Python lists have no comparable method, so how would one write this in Python?

numbers = [87, 33, 21, 75]
if [n for n in numbers if n % 3 == 0] == numbers:
    print 'The numbers are all divisible by 3.'

This approach involves using a list comprehension to create a list of numbers which are divisible by 3, and comparing this list to numbers. If the lists are equal, everything in numbers is divisible by 3.

Now for something a bit more challenging

Assume that we have a list of documents, and we want to know which of the documents contain all the terms in a list of search terms.

// (MooTools) JavaScript

var terms = ['python', 'list', 'methods'], matches = [];
documents.each(function (document) {
    if (terms.every(function (term) {
        return document.body.indexOf(term) != -1;
    })) matches.append(document);
});

Here, we could use the list comprehension approach as before.

# Python

terms = ['python', 'list', 'methods']
matches = []
for document in documents:
    if [t for t in terms if document.body.find(t) != -1] == terms:
        matches.append(document)

This is reasonably succinct, but not terribly efficient since each document is checked for every search term. Given that we're not interested in documents that lack even a single search term, it should be possible to rewrite this code so that we don't waste time on lost causes.

It turns out that Python has just the thing for the job: in Python, a loop statements may have an else clause!

terms = ['python', 'list', 'methods']
matches = []
for document in documents:
    for term in terms:
        if document.body.find(term) == -1:
            break
    else: # every term was found
        matches.append(document)

From 4. More Control Flow Tools:

Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement.

I'm looking forward to finding more good spots to make use of else clauses with my Python loops. :D

Optimization via stringification

One way to reduce the number of HTTP requests a page requires is to group (non-content) images into sprites. An even better way is to remove these images from the server altogether; instead include them as encoded strings in your style sheet.

Serializing Django model instances

One might expect the following code to serialize a Django model instance:

import simplejson
simplejson.dumps(instance)

Unforunately, this raises a TypeError, as the instance is not JSON serializable. I don't understand why model instances are not serializable, but I do have a solution: define a serialization method on the instance's model.

def toJSON(self):
    import simplejson
    return simplejson.dumps(dict([(attr, getattr(self, attr)) for attr in [f.name for f in self._meta.fields]]))

Here's the verbose equivalent for those averse to one-liners:

def toJSON(self):
    fields = []
    for field in self._meta.fields:
        fields.append(field.name)

    d = {}
    for attr in fields:
        d[attr] = getattr(self, attr)

    import simplejson
    return simplejson.dumps(d)

_meta.fields is an ordered list of model fields which can be accessed from instances and from the model itself. _meta.fields is one of the few features not covered in Django's excellent documentation.

Linkify tweets with regex

Regular expressions are powerful, useful, and -- in my opinion -- lots of fun! Thanks to the prevalence of Twitter, every web developer will be exposed to regex sooner or later: before outputting tweets in HTML, Twitter names and hyperlinks must be wrapped in anchor tags.

Matching @names

Here's the gist: a match will begin with "@" and the at sign must be followed by one or more word (letter / number / underscore) characters. The @name must either appear at the beginning of the tweet or be preceded by a space. This prevents the regular expression from matching "@example" in "me@example.com".

Get attributes of Django model or instance

What is the best way to get the attributes of a Django model or instance?

from django.db import models

class Musician(models.Model):
    first_name = models.CharField()
    last_name  = models.CharField()
    instrument = models.CharField()

One option is to use __dict__.keys():

>>> m = Musician(first_name='Norah', last_name='Jones', instrument='piano')
>>> print m.__dict__.keys()
['last_name', 'instrument', 'first_name', 'id']

Another options is to use _meta.fields:

>>> print [f.name for f in m._meta.fields]
['id', 'first_name', 'last_name', 'instrument']

This approach also works on models directly:

>>> print [f.name for f in Musician._meta.fields]
['id', 'first_name', 'last_name', 'instrument']

Advantages of using _meta.fields

  • items in returned list are correctly ordered
  • applicable to both models and instances
  • only fields are returned

The fact that only fields are returned is extremely useful. Django appears to add its own attributes to instances in certain circumstances; using _meta.fields prevents these from interfering with one's own code.