Python list internals

List is a powerful and dynamic data structure in Python, you don’t need to worry about data type/sizing while dealing with lists. it’s a heterogenious structure bla bla, all these things you can find in Google search with high string matching rate.

I really wanted to share how list implemented in Cpython and how it manages sizing, how sorting works and how list index works without contigious memory allocation…

consider that I’m talking about 64bit type data processing..

List definition

typedef struct {
    PyObject_VAR_HEAD
    /* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
Py_ssize_t allocated;
} PyListObject;

As the list definition says it is a vector of pointers to list items, so list stores memory id’s of the respective objects which gets inserted into the list.

###List Sizing

As per the theory list is dynamic data structure which means it manages it’s memory with out explicit allocation/deallocation.

Lets define a simple list and see when and where it allocates and deallocates the memory…

>>> lst = []
>>> import sys
>>> lst = []
>>> sys.getsizeof(lst)
72

Idealy for an empty list memory should be zero, but here 72 bytes of memory got allocated for list definition. This is mainly to store the list definition and header information. Same would be 36 bytes for a 32 type data processing.

>>> lst.append(1)
>>> sys.getsizeof(lst)
104
>>> lst.append(2)
>>> sys.getsizeof(lst)
104
>>> lst.append(3)
>>> sys.getsizeof(lst)
104
>>> lst.append(4)
>>> sys.getsizeof(lst)
104
>>> lst.append(5)
>>> sys.getsizeof(lst)
136

Here we appended values to the list, just after adding the first element allocated memory got raised to 104 bytes from 72 bytes and the newly added memory is 32 bytes (104 - 72). So these 32 bytes can accommodate 4 more elements (4 * 8 bytes = 32 bytes), if you see upto appending 4 to lst memory is still 104 bytes. But after adding the fifth element memory got raised to 136 bytes (diff 136 - 104 = 32) and again added another 32 bytes, this means this memory is sufficient upto 8 elements.

In the above example resize happened twise.. 1. 72 –> 104 at length 1 /32 bytes
2. 104 –> 136 at length 5 /32 bytes

Do you believe that the same pattern will be followed throughout the list creation? Answer is no, below is the growth pattern of list object.

0, 4, 8, 16, 25, 35, 46, 58, 72, 88, …

Lets verify how list memory gets allocated for first 100 appends…

import sys
lst, resize_points = [], {}
for i in range(100):
    current_size = sys.getsizeof(lst)
    lst.append(i)
    list_size = sys.getsizeof(lst)
    if not list_size in resize_points:
        resize_points[list_size] = len(lst)
        print "Resized after %s + 1 append new size is %s, Growth is %s bytes" %(str(len(lst) - 1), str(list_size), str((list_size - current_size))) 

--------------------------output-----------------------------
Resized after 0 + 1 append, new size is 104 bytes, Growth is 32 bytes
Resized after 4 + 1 append, new size is 136 bytes, Growth is 32 bytes
Resized after 8 + 1 append, new size is 200 bytes, Growth is 64 bytes
Resized after 16 + 1 append, new size is 272 bytes, Growth is 72 bytes
Resized after 25 + 1 append, new size is 352 bytes, Growth is 80 bytes
Resized after 35 + 1 append, new size is 440 bytes, Growth is 88 bytes
Resized after 46 + 1 append, new size is 536 bytes, Growth is 96 bytes
Resized after 58 + 1 append, new size is 648 bytes, Growth is 112 bytes
Resized after 72 + 1 append, new size is 776 bytes, Growth is 128 bytes
Resized after 88 + 1 append, new size is 920 bytes, Growth is 144 bytes

If you abserve the above output growth patttern is clearly visible, but the growth is not even all over..

Below is the logic which Python follows to decide the growth in list memory..

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize

==Note==: List resizing happens when ever Append/Insert/Pop/Remove operations occured on a list object.

List Operations

Append : Complexity - O(1)

Appends a new element at end of the list, ideally appending at the end needs List resize and adding the new element at the newly added position.

This is how list append has been implemented in Cpython.

static int
app1(PyListObject *self, PyObject *v)
{
    Py_ssize_t n = PyList_GET_SIZE(self);
    ...
    if (list_resize(self, n+1) == -1)
       return -1;
    ....
    PyList_SET_ITEM(self, n, v);
    return 0;
} Here it is clear that append is not replacing/removing any existing items, it is just adding new element at the end with O(1) complexity.
**Insert** :

Complexity - O(n) Insert is a bit complicated compared with append in Python, append adds the new item at the end of the list but insert adds item at the given position in the list.

if (where < 0) {
    where += n;
    if (where < 0)
        where = 0;
}
if (where > n)
    where = n;
items = self->ob_item;
for (i = n; --i >= where; )
    items[i+1] = items[i];
Py_INCREF(v);
items[where] = v;

Lets assume the below list and we’ll see how insert happens. My helpful screenshot Lets try inserting.. Value: 5 Position (where to insert): 1

As per the list insert logic before inserting the value at position 1 first the existing values needs to be swapped to the next positions.

for (i = 4; --i >= 1; )
    items[i+1] = items[i]; As per the above loop values starting at position 4 until position 1 needs to be swapped with the next position. I.e: value at position 3  = 4 this needs to be swapped to position 4 and position 2 to position 3...... After swapping list would look like the below image with an empty cell where the item needs to be inserted. 

alt

items[where] = value; // items[1] = 5

alt

It is clear that if you’ve to insert value at position ‘0’ you got to swap all the other elements to next positions before inserting at position ‘0’, so ideally it requires O(n) exchanges.

**Pop** :

Pop(): Complexity - O(1) & pop(i): Complexity - O(n)

Pop works in two different ways, one can pop the last element without passing any index and other way is by passing the index where that value needs to be removed and returned .

>>> l = [0, 1, 2]
>>> l.pop()
2
>>> l.pop(0)
0
>>> l
[1]

If you see the above example by default it removed the last element and when we mentioned the index it worked.

Published: June 24 2015

  • category:
blog comments powered by Disqus