January / February hiatus bearing fruit

I find it devilishly difficult to simplify my own ideas. Other people’s ideas, I whittle them down to their essence in no time. I guess this is what Lennon/McCartney, Rodgers/Hart was all about, but I just have me!

As I regroup from the deadend I created for myself, I am asking:

  • what is super popular in the iPhone-o-spere that I can cooperate with?
  • how do I eliminate just about all of the processing steps in my design
  • how can I get this thing designed and coded in the 4 hours / week I seem to have available?

I think I’ve got a plan, but I’m staying in stealth mode for a little bit more. Stay tuned!

In the meantime, I’ll probably start blogging about work a bit, since there are some pretty cool things happening that I can share.

Give blood – Learn Parallelism

A couple weeks ago, my pal Bob Treitman got the bloodmobile from Mass General Hospital to come to Adobe at Waltham.  It was a great experience, thanks Bob!  Lots of people showed up, and even people from neighboring offices came down.

I was amazed by how well organized it was, and how capable the staff was.

The bloodmobile also presents itself as a lovely example of how to build a robust, parallel processing system.

  1. They schedule visitors in 15 minute increments, but if more, or less, show, this is not a problem because…
  2. They can wait inside, and overflow outside
  3. They give you privacy, while they get your data, check your vitals, ask you personal questions
  4. If need be back to the waiting area, although in my case they put me on the couch right away
  5. My blood sugar was a little low, so they sent me to the …
  6. Fridge and then back again
  7. When I was done, I got a card and felt great about the whole thing!

I believe that if you can get more than 30 people to show, they will probably come to you.  Visit their site for more information!

 

Back to the design phase

“It’s a dangerous business going out your front door.”
– Bilbo Baggins

VintagePost.app is out of control.  It *could* do so many things, and yet right now, it can’t actually do any of them.  Time to step back and stop coding, and reengage with the XD part of my brain.

I’ll probably make an inventory of all the pieces of technology I have “working” at some point, but first I have to channel my inner agilist.

Nearest neighbor for geo data

Given a latitude and longitude, I need to pull the nearest neighbors out of the database and return them in proximity order.  To get this to work we use:

To do the query, we need to compute a bounding box:

swhash = Geohash.encode(latitude-delta, longitude-delta)
nehash = Geohash.encode(latitude+delta, longitude+delta)

This produced two hashcodes, one to the southwest of the orig, and one the northeast.  A delta of 0.5 is about 138 KM, in North America, at least.  Geohash codes are great because they can be sorted in lexical order, so my boto query now looks like this:

sdb = boto.connect_sdb(AWS_ACCESS_KEY, AWS_SECRET_KEY])
domain = sdb.get_domain(PLACES_DOMAIN)
places = domain.select("select * from `Places` where
    `geohash` >= '%s' and `geohash` <= '%s' LIMIT 10"
    %(swhash, nehash), max_items=10)

The places object is actually an iterator much thanks to Chris Moyer for documenting this. We want the array.  NOTE, the query is lazily performed when you interpret the first token from the iterator.

results = [place for place in places]

The distance function is very simple:

def distance(a,b):
    R = 6371 # radius of Earth in KM
    lat1 = radians(a[0])
    lon1 = radians(a[1])
    lat2 = radians(b[0])
    lon2 = radians(b[1])
    d = math.acos(math.sin(lat1)*math.sin(lat2) +
        math.cos(lat1)*math.cos(lat2) * math.cos(lon2-lon1)) * R;
    return d

Finally, python sorts the result:

results = sorted(results, key=lambda place: distance(orig,
    (float(place['latitude']), float(place['longitude']))))

This actually worked on the second try.

Notes: there are better distance functions, but this one works well and is quite fast.  I have not put in the error handling, or the back off code if you get too many or too few results.

YAML > JSON > XML

My app gets images from various photo sharing sites.  After checking for copyright correctness, I download images onto my own machine for review and Photoshopping. Then I upload them into my S3/SimpleDB cloud storage.

The metadata I get back from the photo sharing sites is generally in JSON format, but I frequently want to edit this metadata and even though JSON is the bee’s knees, it’s not that great to edit, plus grepping for metadata kinda sucks too.

Enter Yaml:

city: ''
country: ''
filtered: 0
geohash: dhwfq5vbb02h
id: '3077164973'
kind: scene
latitude: 25.728738
license: '4'
longitude: -80.236244
owner: 11211909@N00
ownername: corsi photo
state: ''
tags: miami floridawaterbeachskylineshorelineoceanbuildings
title: Miami shoreline
url_z: http://farm4.staticflickr.com/3149/30771649c6a6c_z.jpg
vp_id: flickr.scene.11211909@N00-3077164973

Exit JSON:

{'tags': 'miami floridawaterbeachskylineshorelineoceanbuildings', 'vp_id': 'flickr.scene.11211909@N00-3077164973', 'owner': '11211909@N00', 'id': '3077164973', 'city': '', 'kind': 'scene', 'url_z': 'http://farm4.staticflickr.com/3149/3077164973_a31b4c6a6c_z.jpg', 'license': '4', 'title': 'Miami shoreline', 'country': '', 'longitude': -80.236244, 'state': '', 'geohash': 'dhwfq5vbb02h', 'ownername': 'corsi photo', 'latitude': 25.728738, 'filtered': 0}

PyYAML even has type inferencing, so when I reload the data, it seems to do just what I want.

Sandboxitude

Homebrew and Virtualenv go together like cognac and chocolate (YMMMV: your metaphorical mileage may vary!).

By using mac ports and being unaware of virtualenv, my upstairs mac had become an absolute mess.  Finally, I removed all traces of mac ports from the system, I think I did anyway :-) .

Now, I am installing and uninstalling recipes, never once sudo’ing.

Even better is virtualenv.  Putting Python code into production is a breeze, since you’re already running in a sandbox, you’ve documented your dependencies, and Python is portable.  What a great world!

Blocked! Unblocked!

It all started with a test…

I wanted to add image renditions to my app server (who doesn’t?), so I thought I would use PIL.

I wrote a unit test:

def testRenditions(self): 
    rv1 = self.app.get('/images/%s/120' % self.test_scene_name)
    assert rv1.mimetype == 'image/jpeg'
    assert len(rv1.data) < len(open(self.test_scene_path).read())
    assert len(rv1.data) > 0

And then I tried to get the test to pass by implementing the server method.  Renditions are trivial in PIL, but getting PIL to work on my Mac, not so much…

I vanished down the rabbit hole of problems with Lion, Xcode, PIL, virtualenv, jpeglib, arghh.  Lots and lots of people have had trouble getting this to work, but seemingly they are all smarter / luckier than me, as most people eventually got it to work.  Every single time, the problem boiled down to my inability to import the _imaging module:

Symbol not found: _jpeg_resync_to_restart

There is a mysterious binary incompatibility that causes the _imaging module (which builds fine) to not load. Even more maddening, this worked in my downstairs computer, and I cannot spot a difference. It also worked great on the server, but I really didn’t want to have untested methods, and I am (so far) committed to running the tests in my local server first.

In the end, I have given up, theorizing that someday someone awesome like the Pillow maintainers will fix the problem.

The solution: use ImageMagick and two lines of Python:

p = subprocess.Popen(["convert", "-size", dimension, "-resize",     dimension, "-", "-"], stdin=subprocess.PIPE, 
    stdout=subprocess.PIPE) 
dst = p.communicate(src)[0]

This might actually be a better solution, but at any rate I’m unblocked.

In a very bad bout of mismanagement, I’ve spent more time on this single issue than literally any other aspect of the project.