====================================================================== Fast I/O Buffers for Network Access Without Copy/String Creation ====================================================================== :Updated Revision: 46090, 46193, 46211, 46281, 46284, 46295, 46311 :Author: Martin Blais :Abstract: A buffer class similar to Java NIO ByteBuffer that is meant to be used for network I/O that avoids creating temporary strings and from which binary data can be directly decoded. .. contents:: .. 1 TODO 1.1 Question 1.2 Other Features 1.3 Ideas TODO ==== * Should getbyterel() implement getting negative offsets from the end of the window rather than move backwards from the current position? I think it should. * Implement the entire string protocol, since that will be a fast path to contents (no dict lookup is necessary) * Also, getbyterel() should be implemented using the slicing operators, but you need to check if they allow returning an int rather than a string. * Make it possible to read from a file directly into a hotbuf - Implement fromfile() and tofile() on the hotbuf object - Check if there a tp_file protocol and if there is, use that instead to provide the interface. See file_read() from fileobject.c, you need to make it file_read_guts() and create file_read() that allocates a string like before, but also add file_readbuf() that reads directly into a writable buffer. Note the presence of readinto() as well, as present but deprecated. Search the ML for readinto()'s story. * Fix the changes to struct and socket with the right types: * Implement absolute get/put methods (getabs(n), putabs(n, data)) * The hot buffer can unpack in C, similarly implement pack() in C. * Implement some of the other sequence methods. * Add methods to parse ints, longs, floats and doubles directly from/to the buffer contents, without using a temporary string. getlong(), putlong(), etc. * Documentation: write a small PEP about this, when all is said and done. - hot buffers are really just a fancy string that can change on top of a fixed-allocated memory buffer, and should provide the same interface - Note for common use cases: the buffer should have at least the size of the minimal line/message that you may ever encounter, otherwise you will have to write special parsing routines. Question -------- * In the buffer protocol, what is the difference between the read-only/read-write buffer interfaces and the char-buffer interface? > > In the buffer protocol, there are four functions:: > > typedef int (*getreadbufferproc)(PyObject *, int, void **); > typedef int (*getwritebufferproc)(PyObject *, int, void **); > typedef int (*getsegcountproc)(PyObject *, int *); > typedef int (*getcharbufferproc)(PyObject *, int, char **); > > What is the difference between the read-only/read-write and the char-buffer > functions? (apart from the type) What was the original intention of this > interface? > > Also, why is there a Py_ssize_t version of this? > Other Features -------------- - Consider the implications of adding a hash function - Maybe implement a resize function - Should we support weakrefs? - Add with protocol for guards Ideas ----- - (From Bob): Automatically do the compact-and-read when something hits the limit of the buffer. To discriminate between coding errors (i.e. a message parsing much more than it should) and hitting the end of the file, different exceptions would be needed. - Write the loop in C, compacting and windowing as you go, calling a function to process for each message.