Python Memory Views
There are cool new built-in objects added to python 3.0 (and backported to 2.7), called memoryview's. They basically represent mutable subsections of data. Here's some examples of why they are awesome:
A naive way to write a large string of bytes to a socket:
sent = 0
while sent < len(message):
sent += sock.send(message[sent:])
This makes many unnecessary copies of message. You can imagine how bad it performs if message is 1GB long and the client only receives 1K at a time.
You can improve this by using Python's lesser known buffer built-in, which is like a memoryview except it's read-only:
buf = buffer(message,0)
while len(buf):
buf = buffer(message,len(buf) + sock.send(buf))
This doesn't make any copies of message.
But what if I want the equivalent for efficiently receiving data from a socket into a single buffer without having to allocate a bunch of small strings in the loop? Naive way:
msgparts = []
while True:
chunk = sock.recv(1024)
if len(chunk): msgparts.append(chunk)
else: break
message = ''.join(msgparts)
Awesome memoryview way:
view = memoryview(bytearray(bufsize))
while len(view):
view = view[sock.recv_into(view,1024):]
Slicing view just returns a new memoryview, so there aren't any unnecessary string allocations.
I think mutable views of data are one of the major missing pieces of python. A language shouldn't force you to make copies of data as it can be very inefficient. memoryview is a good first step. Next I'd like to see more built-in objects support operations involving views. For instance, a string operation like string.vsplit that is the same as string.split, except instead of returning a list of new immutable strings, returns a list of memoryviews so that the data is never duplicated.
