Back to the design phase

“It’s a dangerous business going out your front door.”
– Bilbo Baggins

VintagePost.app is out of control.  It *could* do so many things, and yet right now, it can’t actually do any of them.  Time to step back and stop coding, and reengage with the XD part of my brain.

I’ll probably make an inventory of all the pieces of technology I have “working” at some point, but first I have to channel my inner agilist.

Nearest neighbor for geo data

Given a latitude and longitude, I need to pull the nearest neighbors out of the database and return them in proximity order.  To get this to work we use:

To do the query, we need to compute a bounding box:

swhash = Geohash.encode(latitude-delta, longitude-delta)
nehash = Geohash.encode(latitude+delta, longitude+delta)

This produced two hashcodes, one to the southwest of the orig, and one the northeast.  A delta of 0.5 is about 138 KM, in North America, at least.  Geohash codes are great because they can be sorted in lexical order, so my boto query now looks like this:

sdb = boto.connect_sdb(AWS_ACCESS_KEY, AWS_SECRET_KEY])
domain = sdb.get_domain(PLACES_DOMAIN)
places = domain.select("select * from `Places` where
    `geohash` >= '%s' and `geohash` <= '%s' LIMIT 10"
    %(swhash, nehash), max_items=10)

The places object is actually an iterator much thanks to Chris Moyer for documenting this. We want the array.  NOTE, the query is lazily performed when you interpret the first token from the iterator.

results = [place for place in places]

The distance function is very simple:

def distance(a,b):
    R = 6371 # radius of Earth in KM
    lat1 = radians(a[0])
    lon1 = radians(a[1])
    lat2 = radians(b[0])
    lon2 = radians(b[1])
    d = math.acos(math.sin(lat1)*math.sin(lat2) +
        math.cos(lat1)*math.cos(lat2) * math.cos(lon2-lon1)) * R;
    return d

Finally, python sorts the result:

results = sorted(results, key=lambda place: distance(orig,
    (float(place['latitude']), float(place['longitude']))))

This actually worked on the second try.

Notes: there are better distance functions, but this one works well and is quite fast.  I have not put in the error handling, or the back off code if you get too many or too few results.

YAML > JSON > XML

My app gets images from various photo sharing sites.  After checking for copyright correctness, I download images onto my own machine for review and Photoshopping. Then I upload them into my S3/SimpleDB cloud storage.

The metadata I get back from the photo sharing sites is generally in JSON format, but I frequently want to edit this metadata and even though JSON is the bee’s knees, it’s not that great to edit, plus grepping for metadata kinda sucks too.

Enter Yaml:

city: ''
country: ''
filtered: 0
geohash: dhwfq5vbb02h
id: '3077164973'
kind: scene
latitude: 25.728738
license: '4'
longitude: -80.236244
owner: 11211909@N00
ownername: corsi photo
state: ''
tags: miami floridawaterbeachskylineshorelineoceanbuildings
title: Miami shoreline
url_z: http://farm4.staticflickr.com/3149/30771649c6a6c_z.jpg
vp_id: flickr.scene.11211909@N00-3077164973

Exit JSON:

{'tags': 'miami floridawaterbeachskylineshorelineoceanbuildings', 'vp_id': 'flickr.scene.11211909@N00-3077164973', 'owner': '11211909@N00', 'id': '3077164973', 'city': '', 'kind': 'scene', 'url_z': 'http://farm4.staticflickr.com/3149/3077164973_a31b4c6a6c_z.jpg', 'license': '4', 'title': 'Miami shoreline', 'country': '', 'longitude': -80.236244, 'state': '', 'geohash': 'dhwfq5vbb02h', 'ownername': 'corsi photo', 'latitude': 25.728738, 'filtered': 0}

PyYAML even has type inferencing, so when I reload the data, it seems to do just what I want.

Sandboxitude

Homebrew and Virtualenv go together like cognac and chocolate (YMMMV: your metaphorical mileage may vary!).

By using mac ports and being unaware of virtualenv, my upstairs mac had become an absolute mess.  Finally, I removed all traces of mac ports from the system, I think I did anyway :-) .

Now, I am installing and uninstalling recipes, never once sudo’ing.

Even better is virtualenv.  Putting Python code into production is a breeze, since you’re already running in a sandbox, you’ve documented your dependencies, and Python is portable.  What a great world!

Blocked! Unblocked!

It all started with a test…

I wanted to add image renditions to my app server (who doesn’t?), so I thought I would use PIL.

I wrote a unit test:

def testRenditions(self): 
    rv1 = self.app.get('/images/%s/120' % self.test_scene_name)
    assert rv1.mimetype == 'image/jpeg'
    assert len(rv1.data) < len(open(self.test_scene_path).read())
    assert len(rv1.data) > 0

And then I tried to get the test to pass by implementing the server method.  Renditions are trivial in PIL, but getting PIL to work on my Mac, not so much…

I vanished down the rabbit hole of problems with Lion, Xcode, PIL, virtualenv, jpeglib, arghh.  Lots and lots of people have had trouble getting this to work, but seemingly they are all smarter / luckier than me, as most people eventually got it to work.  Every single time, the problem boiled down to my inability to import the _imaging module:

Symbol not found: _jpeg_resync_to_restart

There is a mysterious binary incompatibility that causes the _imaging module (which builds fine) to not load. Even more maddening, this worked in my downstairs computer, and I cannot spot a difference. It also worked great on the server, but I really didn’t want to have untested methods, and I am (so far) committed to running the tests in my local server first.

In the end, I have given up, theorizing that someday someone awesome like the Pillow maintainers will fix the problem.

The solution: use ImageMagick and two lines of Python:

p = subprocess.Popen(["convert", "-size", dimension, "-resize",     dimension, "-", "-"], stdin=subprocess.PIPE, 
    stdout=subprocess.PIPE) 
dst = p.communicate(src)[0]

This might actually be a better solution, but at any rate I’m unblocked.

In a very bad bout of mismanagement, I’ve spent more time on this single issue than literally any other aspect of the project.

PyPi server down, that’s bad!

My morning’s hacking has been a bit slowed by PyPI being down.  A little research leads me to conclude:

  1. this is an ongoing problem
  2. you can’t even think about using the public PyPI as part of your automated build process

Thankfully, there are mirrors in the form of [b|c|d].pypi.python.org, but the main site itself cannot fail over.  Anyway, this excellent blog post http://jacobian.org/writing/when-pypi-goes-down/ from Jacob Kaplan-Moss tells you all about it.  I was not able to get the –use-mirrors option to actually work, but using an explicitly named mirror worked fine.

If you want to have a completely automated build and not rely on the internet, which you do, you’ve got options.  I think you could run your own mirror, or you can use virtualenv (which you are already doing, right?).  Then use pip install -r requirements.dat –index-url …

When I first learned of Maven, this was exactly the house of mirrors that I imagined I would find myself in!

Getting mod_rewrite to work

I finally got my main page to redirect to my blog page.

For posterity, here’s what did the trick:

emacs /etc/httpd/conf/httpd.conf:

<Directory "/var/www/html">
.
.
.
# 
# AllowOverride controls what directives may be placed in
#   .htacaess files.
# It can be "All", "None", or any combination of the keywords:
# Options FileInfo AuthConfig Limit
#
 AllowOverride All # Change this to be All from None

Then edit /var/www/html/.htaccess:

RewriteEngine On
RewriteRule ^$ blog # this was non intuitive
 # no slash in either part of rule

Then restart your server, and voila!