About Me | Hunch | Twitter

Python Memory Views

written by matt, on Mar 9, 2010 8:56:00 AM.

There are cool new built-in objects added to python 3.0 (and backported to 2.7), called memoryview's. They basically represent mutable subsections of data. Here's some examples of why they are awesome:

A naive way to write a large string of bytes to a socket:

sent = 0
while sent < len(message):
    sent += sock.send(message[sent:])

This makes many unnecessary copies of message. You can imagine how bad it performs if message is 1GB long and the client only receives 1K at a time.

You can improve this by using Python's lesser known buffer built-in, which is like a memoryview except it's read-only:

buf = buffer(message,0)
while len(buf):
    buf = buffer(message,len(buf) + sock.send(buf))

This doesn't make any copies of message.

But what if I want the equivalent for efficiently receiving data from a socket into a single buffer without having to allocate a bunch of small strings in the loop? Naive way:

msgparts = []
while True:
    chunk = sock.recv(1024)
    if len(chunk): msgparts.append(chunk)
    else: break
message = ''.join(msgparts)

Awesome memoryview way:

view = memoryview(bytearray(bufsize))
while len(view):
    view = view[sock.recv_into(view,1024):]

Slicing view just returns a new memoryview, so there aren't any unnecessary string allocations.

I think mutable views of data are one of the major missing pieces of python. A language shouldn't force you to make copies of data as it can be very inefficient. memoryview is a good first step. Next I'd like to see more built-in objects support operations involving views. For instance, a string operation like string.vsplit that is the same as string.split, except instead of returning a list of new immutable strings, returns a list of memoryviews so that the data is never duplicated.