Python Futures and Promises · 26 November, 10:31 PM

Recently, I ran into a situation where I had an application downloading content from a number of URLs and combining the results. My initial implementation was simply retrieving the content serially and then accumulating the results. This worked OK, but turned out to be unacceptably slow, so I started researching alternatives.

What I wanted was a parallelization of the effort in a way that was syntactically clean. Ideally, I'd issue a bunch of calls and then come back to collect and the results at some future point. This kind of thing is sometimes described as a future (or sometimes, a "promise"). The essence is that you make a function call which is asynchronous, and at some later point you go ask the function for its result. In the Python world, this is a bit confusing because Python has a future keyword. Don't be confused. The Python Cookbook has an implementation for Futures in the parallel programming sense and it turns out to be easy to use. Here's an example from my stub program:

def netcall():
  # download some document
  url = 'http://www.example.com'
  http = httplib2.Http()
  result = http.request(url,"GET")
  return result[0]

futures = []
results = []
for i in range(0,15):
  a = Future(netcall)
  futures.append(a)
isRunning = True
while isRunning:
  isRunning = False
  for future in futures:
    if future.isDone():
      results.append(future()['status'])
      futures.remove(future)
    else: 
      isRunning = True
# at this point, results contains the HTTP status codes of all our HTTP requests
The polling code is a bit cumbersome but it was the best I could think of at the moment, I'm guessing that it could be made nicer with a few more brain cells working. In any case, it's boilerplate code: write once in a routine that manages your future calls.

— Gordon Weakliem

---

Comment

  1. If you’re using Python 2.5 or more recent, why not use Queues and Threads?

    — masklinn · Nov 27, 02:53 AM · #

  2. For the polling loop, why not do the loop with “while futures:” and pop things off the list each time, then put it back at the front if it’s not done?

    Obviously, that would be more efficient if you use the Queue.deque object in the standard library.

    — Carl · Nov 27, 03:48 AM · #

  3. In VisualWorks Smalltalk, when you ask for the value of a promise (ie. future()), the thread blocks untill the value isDone(), then resumes.
    Ie. the above polling wouldn’t be necessary, unless you want them ordered by earliest received (can still do that though)
    Are you sure the same isn’t possible in Python, as in, what happens if you call future() before it isDone()?

    — Henry · Nov 27, 04:05 AM · #

  4. Have you seen:

    http://docs.python.org/dev/library/multiprocessing.html

    — Mike Thompson · Nov 27, 05:02 AM · #

  5. Artificially keeping the CPU 100% busy with constant polling is impolite; please insert a time.sleep(0.01) in the poll loop.

    Marius Gedminas · Nov 27, 11:49 AM · #

  6. masklinn – I didn’t know about that, I’ll have to look into it. Though I’d rather not have to muck with threads directly. In .NET you’d use ThreadPool.QueueUserWorkItem, which is more what I’m after.
    carl – that’s probably a cleaner approach, true.
    Henry – this Future implementation isn’t part of a Pytohn package, it’s just how the cookbook example happened to be written. I’d rather not poll, but I’m just using the example as provided.
    Mike – I have not, but I spent time looking at the options provided in http://wiki.python.org/moin/ParallelProcessing
    Marius – thanks for the tip, I hadn’t observed the CPU getting pegged but I was probably just lucky.

    Gordon Weakliem · Nov 27, 08:51 PM · #

Commenting is closed for this article.