Sorting data structures

Sorting data structures is more prevalent when dealing with dictionaries. Lists by definition are ordered in the way they are entered. In the following example we have a dictionary that is indexed by a series of numbers.


data={
        0: {"mac":"0004.9ac8.7680", "vlan":"44",  "int": "Po520", "learned": "dynamic", "pos":1},
        1: {"mac":"002a.6a5f.e701", "vlan":"64",  "int": "Po540", "learned": "dynamic", "pos":2},
        2: {"mac":"0009.7cee.91e0", "vlan":"08",  "int": "Po520", "learned": "dynamic", "pos":3},
        3: {"mac":"0005.73da.9560", "vlan":"55",  "int": "Eth4/5", "learned": "dynamic", "pos":4},
        4: {"mac":"0022.4da3.f82c", "vlan":"15",  "int": "Eth4/1", "learned": "dynamic", "pos":5},
        5: {"mac":"1cdf.0f76.ffa0", "vlan":"18",  "int": "Eth1/2", "learned": "dynamic", "pos":6},
        6: {"mac":"f4cf.e211.217e", "vlan":"18",  "int": "Eth2/2", "learned": "dynamic", "pos":7},
}
for index, row in data.iteritems():
    print row

When you iterate through this dictionary it will be ordered.

user@learnpy.cisco.com:

$ python test.py
{'int': 'Po520', 'mac': '0004.9ac8.7680', 'vlan': '44', 'learned': 'dynamic', 'pos': 1}
{'int': 'Po540', 'mac': '002a.6a5f.e701', 'vlan': '64', 'learned': 'dynamic', 'pos': 2}
{'int': 'Po520', 'mac': '0009.7cee.91e0', 'vlan': '08', 'learned': 'dynamic', 'pos': 3}
{'int': 'Eth4/5', 'mac': '0005.73da.9560', 'vlan': '55', 'learned': 'dynamic', 'pos': 4}
{'int': 'Eth4/1', 'mac': '0022.4da3.f82c', 'vlan': '15', 'learned': 'dynamic', 'pos': 5}
{'int': 'Eth1/2', 'mac': '1cdf.0f76.ffa0', 'vlan': '18', 'learned': 'dynamic', 'pos': 6}
{'int': 'Eth2/2', 'mac': 'f4cf.e211.217e', 'vlan': '18', 'learned': 'dynamic', 'pos': 7}

This is no different ( and doing it more dificult ) than just defining this as a list as follows:


data=[
        {"mac":"0004.9ac8.7680", "vlan":"44",  "int": "Po520", "learned": "dynamic", "pos":1},
        {"mac":"002a.6a5f.e701", "vlan":"64",  "int": "Po540", "learned": "dynamic", "pos":2},
        {"mac":"0009.7cee.91e0", "vlan":"08",  "int": "Po520", "learned": "dynamic", "pos":3},
        {"mac":"0005.73da.9560", "vlan":"55",  "int": "Eth4/5", "learned": "dynamic", "pos":4},
        {"mac":"0022.4da3.f82c", "vlan":"15",  "int": "Eth4/1", "learned": "dynamic", "pos":5},
        {"mac":"1cdf.0f76.ffa0", "vlan":"18",  "int": "Eth1/2", "learned": "dynamic", "pos":6},
        {"mac":"f4cf.e211.217e", "vlan":"18",  "int": "Eth2/2", "learned": "dynamic", "pos":7},
]
for row in data:
    print row

For which the results would be:

user@learnpy.cisco.com:

$ python test.py
{'int': 'Po520', 'mac': '0004.9ac8.7680', 'vlan': '44', 'learned': 'dynamic', 'pos': 1}
{'int': 'Po540', 'mac': '002a.6a5f.e701', 'vlan': '64', 'learned': 'dynamic', 'pos': 2}
{'int': 'Po520', 'mac': '0009.7cee.91e0', 'vlan': '08', 'learned': 'dynamic', 'pos': 3}
{'int': 'Eth4/5', 'mac': '0005.73da.9560', 'vlan': '55', 'learned': 'dynamic', 'pos': 4}
{'int': 'Eth4/1', 'mac': '0022.4da3.f82c', 'vlan': '15', 'learned': 'dynamic', 'pos': 5}
{'int': 'Eth1/2', 'mac': '1cdf.0f76.ffa0', 'vlan': '18', 'learned': 'dynamic', 'pos': 6}
{'int': 'Eth2/2', 'mac': 'f4cf.e211.217e', 'vlan': '18', 'learned': 'dynamic', 'pos': 7}

Now while this is useful, what if you wanted to INDEX the array based on a particular piece of data from the list of devices. How about if you wanted to index the array based on the MAC address of the devices.


data={
        "0004.9ac8.7680": {"mac":"0004.9ac8.7680", "vlan":"44",  "int": "Po520", "learned": "dynamic", "pos":1},
        "002a.6a5f.e701": {"mac":"002a.6a5f.e701", "vlan":"64",  "int": "Po540", "learned": "dynamic", "pos":2},
        "0009.7cee.91e0": {"mac":"0009.7cee.91e0", "vlan":"08",  "int": "Po520", "learned": "dynamic", "pos":3},
        "0005.73da.9560": {"mac":"0005.73da.9560", "vlan":"55",  "int": "Eth4/5", "learned": "dynamic", "pos":4},
        "0022.4da3.f82c": {"mac":"0022.4da3.f82c", "vlan":"15",  "int": "Eth4/1", "learned": "dynamic", "pos":5},
        "1cdf.0f76.ffa0": {"mac":"1cdf.0f76.ffa0", "vlan":"18",  "int": "Eth1/2", "learned": "dynamic", "pos":6},
        "f4cf.e211.217e": {"mac":"f4cf.e211.217e", "vlan":"18",  "int": "Eth2/2", "learned": "dynamic", "pos":7},
}

for index, row in data.items():
    print "For MAC:%s data is: %s" %  (index, row)

That will produce the output of:

user@learnpy.cisco.com:

$ python test.py
For MAC:0004.9ac8.7680 data is: {'int': 'Po520', 'mac': '0004.9ac8.7680', 'vlan': '44', 'learned': 'dynamic', 'pos': 1}
For MAC:0009.7cee.91e0 data is: {'int': 'Po520', 'mac': '0009.7cee.91e0', 'vlan': '08', 'learned': 'dynamic', 'pos': 3}
For MAC:1cdf.0f76.ffa0 data is: {'int': 'Eth1/2', 'mac': '1cdf.0f76.ffa0', 'vlan': '18', 'learned': 'dynamic', 'pos': 6}
For MAC:0022.4da3.f82c data is: {'int': 'Eth4/1', 'mac': '0022.4da3.f82c', 'vlan': '15', 'learned': 'dynamic', 'pos': 5}
For MAC:002a.6a5f.e701 data is: {'int': 'Po540', 'mac': '002a.6a5f.e701', 'vlan': '64', 'learned': 'dynamic', 'pos': 2}
For MAC:0005.73da.9560 data is: {'int': 'Eth4/5', 'mac': '0005.73da.9560', 'vlan': '55', 'learned': 'dynamic', 'pos': 4}
For MAC:f4cf.e211.217e data is: {'int': 'Eth2/2', 'mac': 'f4cf.e211.217e', 'vlan': '18', 'learned': 'dynamic', 'pos': 7}

If you observe the list you will see that the position column isn't following the same order that is defined. This can cause problems with the way your data is output, even when reading from databases if you pass the read into a dictionary.

The reason for this is because dictionaries in Python are by definition not ordered and hence the reason that this happens. Now in normal conditions this might not be that big of a problem as you can reference the information directly.


data={
        "0004.9ac8.7680": {"mac":"0004.9ac8.7680", "vlan":"44",  "int": "Po520", "learned": "dynamic", "pos":1},
        "002a.6a5f.e701": {"mac":"002a.6a5f.e701", "vlan":"64",  "int": "Po540", "learned": "dynamic", "pos":2},
        "0009.7cee.91e0": {"mac":"0009.7cee.91e0", "vlan":"08",  "int": "Po520", "learned": "dynamic", "pos":3},
        "0005.73da.9560": {"mac":"0005.73da.9560", "vlan":"55",  "int": "Eth4/5", "learned": "dynamic", "pos":4},
        "0022.4da3.f82c": {"mac":"0022.4da3.f82c", "vlan":"15",  "int": "Eth4/1", "learned": "dynamic", "pos":5},
        "1cdf.0f76.ffa0": {"mac":"1cdf.0f76.ffa0", "vlan":"18",  "int": "Eth1/2", "learned": "dynamic", "pos":6},
        "f4cf.e211.217e": {"mac":"f4cf.e211.217e", "vlan":"18",  "int": "Eth2/2", "learned": "dynamic", "pos":7},
}

print "For MAC:%s the VLAN is:%s" % ( data["0005.73da.9560"]["mac"], data["0005.73da.9560"]["vlan"])

And the output would be:

user@learnpy.cisco.com:

$ python test.py
For MAC:0005.73da.9560 the VLAN is:55

But it might be that you would wish to order this list in some fashion before displaying. For example if you have a web page that you would wish to sort the items based on a particular column of data. Python does make this possible but will require a little work and more in-depth understanding to get there.

Basic Sorts

Python has the sorted method that makes it possible to sort dictionaries. Let's look at this with a simple dictionary. Let's create a simple dictionary and iterate through it:


mydict = { 'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"}
for key,value in mydict.items():
    print "%s: %s" % ( key, value )

when you execute this code:

user@learnpy.cisco.com:

$ python test.py
color: red
car: ferrari
hp: 540hp
location: Raleigh

You can see that the order is messed up completely. We can then introduce the same code with the sorted method.


mydict = { 'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"}
for key,value in sorted(mydict.items()):
    print "%s: %s" % ( key, value )

when you execute this code:

user@learnpy.cisco.com:

$ python test.py
car: ferrari
color: red
hp: 540hp
location: Raleigh

Now the dictionary is iterated through in the order of the keys! ( alphabetically ). You can specify in the sorted method to get more specific on what you wish to sort on. Since we are iterating through the dictionary we have to bring in the operator methods, in particular itemgetter. This will make it possible to define the search of a value instead of the key.


mydict = {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"}

from operator import itemgetter
for key, value in sorted(mydict.iteritems(), key=itemgetter(0)):
    print "%s: %s" % (key, value)

This code is sorting the dictionary based on the first item ( zero ) from the dictionary that is the key.

user@learnpy.cisco.com:

$ python test.py
car: ferrari
color: red
hp: 540hp
location: Raleigh


mydict = {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"}

from operator import itemgetter
for key, value in sorted(mydict.iteritems(), key=itemgetter(1)):
    print "%s: %s" % (key, value)

On this side we are now sorting based on the value of 1, that is the value of the entry in the dictionary.

user@learnpy.cisco.com:

$ python test.py
hp: 540hp
location: Raleigh
car: ferrari
color: red

Sorting Lists

Now let's take a different approach. Let's build a list of dictionary items that would be very similar to what you would get if you queried a database. The list would look like the following:


mydict = [
    {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"},
    {'car': "porsche", 'hp': "420hp", 'color':"blue", 'location': "Durham"},
    {'car': "maserati", 'hp': "390hp", 'color':"black", 'location': "Durham"},
    {'car': "toyota", 'hp': "130hp", 'color':"yellow", 'location': "Raleigh"},
]

for row in mydict:
    print "The car is: %s: and is located at: %s" % (row["car"],row["location"])

When you execute this you will get the list in the order that is in the list because lists are ordered in Python.

user@learnpy.cisco.com:

$ python test.py
The car is: ferrari: and is located at: Raleigh
The car is: porsche: and is located at: Durham
The car is: maserati: and is located at: Durham
The car is: toyota: and is located at: Raleigh

So now we can order this list based on the columns that are defined: car,hp,color,location. Using the itemgetter attribute we can sort the list based on the columns. In this example we will utilize the key with itemgetter('location') to sort based on the location of the car.


mydict = [
    {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"},
    {'car': "porsche", 'hp': "420hp", 'color':"blue", 'location': "Durham"},
    {'car': "maserati", 'hp': "390hp", 'color':"black", 'location': "Durham"},
    {'car': "toyota", 'hp': "130hp", 'color':"yellow", 'location': "Raleigh"},
]

from operator import itemgetter
for row in sorted(mydict, key=itemgetter('location')):
    print "The car is: %s: and is located at: %s" % (row["car"],row["location"])

And the output would look like:

user@learnpy.cisco.com:

$ python test.py
The car is: porsche: and is located at: Durham
The car is: maserati: and is located at: Durham
The car is: ferrari: and is located at: Raleigh
The car is: toyota: and is located at: Raleigh

Using this list, let's introduce an important concept around sorted. This is the ability to pass to a function the search to provide more complex results. The first example would be to do this with lambda. Yet it is easier to understand when viewing this side by side of a lambda and a function.

Function

In this case you build the function that receives just one value, that is the row and returns to sorted the column that it wants to sort on.


def sortonlocation(row):
    return row["location"]

sorted(mydict, key=sortonlocation )

So looking at all the code you would see the following:


def sortonlocation(row):
    return row["location"]

mydict = [
    {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"},
    {'car': "porsche", 'hp': "420hp", 'color':"blue", 'location': "Durham"},
    {'car': "maserati", 'hp': "390hp", 'color':"black", 'location': "Durham"},
    {'car': "toyota", 'hp': "130hp", 'color':"yellow", 'location': "Raleigh"},
]

for row in sorted(mydict, key=sortonlocation ):
    print "The car is: %s: and is located at: %s" % (row["car"],row["location"])

Lambda

In this case we use a lambda to do exactly the same functionality in a single line. lambda sends r and in return it has r["location"] as the key. Then sorted used that key to sort


sorted(dictionary.items(), key=lambda r: r["location"]

So looking at all the code you would see the following:


mydict = [
    {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"},
    {'car': "porsche", 'hp': "420hp", 'color':"blue", 'location': "Durham"},
    {'car': "maserati", 'hp': "390hp", 'color':"black", 'location': "Durham"},
    {'car': "toyota", 'hp': "130hp", 'color':"yellow", 'location': "Raleigh"},
]

for row in sorted(mydict, key=lambda r: r["location"] ):
    print "The car is: %s: and is located at: %s" % (row["car"],row["location"])

more on lambda's is available at here

These have been pretty simple examples, these can get more complicated or provide additional functionality that might be useful. Let's make a change to this list of dictionaries.


mydict = [
    {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"},
    {'car': "Porsche", 'hp': "420hp", 'color':"blue", 'location': "Durham"},
    {'car': "Maserati", 'hp': "390hp", 'color':"black", 'location': "Durham"},
    {'car': "toyota", 'hp': "130hp", 'color':"yellow", 'location': "Raleigh"},
]

We have capitalized the name of a couple of cars. If we were to sort this based on the car column, the results would not be match a human perceived order.

user@learnpy.cisco.com:

$ python test.py
The car is: Maserati: and is located at: Durham
The car is: Porsche: and is located at: Durham
The car is: ferrari: and is located at: Raleigh
The car is: toyota: and is located at: Raleigh

This is because those capital letters have a different ASCII value than lower letters. We can fix this by passing either a lower case or upper case version of the field. Using functions we can do this.


def sort_on_car(row):
    lower_case = str.lower(row["car"])
    return lower_case

mydict = [
    {'car': "ferrari", 'hp': "540hp", 'color':"red", 'location': "Raleigh"},
    {'car': "Porsche", 'hp': "420hp", 'color':"blue", 'location': "Durham"},
    {'car': "Maserati", 'hp': "390hp", 'color':"black", 'location': "Durham"},
    {'car': "toyota", 'hp': "130hp", 'color':"yellow", 'location': "Raleigh"},
]

for row in sorted(mydict, key=sort_on_car):
    print "The car is: %s: and is located at: %s" % (row["car"],row["location"])

Producing the result that you would expect:

user@learnpy.cisco.com:

$ python test.py
The car is: ferrari: and is located at: Raleigh
The car is: Maserati: and is located at: Durham
The car is: Porsche: and is located at: Durham
The car is: toyota: and is located at: Raleigh

Sorting indexed dictionaries

If you have a datastructure in memory, having to iterate through the data structure every time you want to extract a specific value makes no sense. For this reason dictionaries are better when they are indexed by something. In the same way a normal dictionary has a word and a definition.

Going back to the example we had in the beginning, we had a list of MAC addresses in a table with ports, states, vlans.


data={
        "0004.9ac8.7680": {"mac":"0004.9ac8.7680", "vlan":44,  "int": "Po520", "learned": "dynamic", "pos":1},
        "002a.6a5f.e701": {"mac":"002a.6a5f.e701", "vlan":64,  "int": "Po540", "learned": "dynamic", "pos":2},
        "0009.7cee.91e0": {"mac":"0009.7cee.91e0", "vlan":08,  "int": "Po520", "learned": "dynamic", "pos":3},
        "0005.73da.9560": {"mac":"0005.73da.9560", "vlan":55,  "int": "Eth4/5", "learned": "dynamic", "pos":4},
        "0022.4da3.f82c": {"mac":"0022.4da3.f82c", "vlan":15,  "int": "Eth4/1", "learned": "dynamic", "pos":5},
        "1cdf.0f76.ffa0": {"mac":"1cdf.0f76.ffa0", "vlan":18,  "int": "Eth1/2", "learned": "dynamic", "pos":6},
        "f4cf.e211.217e": {"mac":"f4cf.e211.217e", "vlan":18,  "int": "Eth2/2", "learned": "dynamic", "pos":7},
}

Instead of a list, we have a dictionary that the INDEX is the MAC address and then that contains a dictionary that contains information related to that mac. We can utilize the same mechanism to sort this list, but have to make some changes since now the iteration isn't only a row but also includes an index.


data={
        "0004.9ac8.7680": {"mac":"0004.9ac8.7680", "vlan": 44,  "int": "Po520", "learned": "dynamic", "pos":1},
        "002a.6a5f.e701": {"mac":"002a.6a5f.e701", "vlan": 64,  "int": "Po540", "learned": "dynamic", "pos":2},
        "0009.7cee.91e0": {"mac":"0009.7cee.91e0", "vlan": 11,  "int": "Po520", "learned": "dynamic", "pos":3},
        "0005.73da.9560": {"mac":"0005.73da.9560", "vlan": 55,  "int": "Eth4/5", "learned": "dynamic", "pos":4},
        "0022.4da3.f82c": {"mac":"0022.4da3.f82c", "vlan": 15,  "int": "Eth4/1", "learned": "dynamic", "pos":5},
        "1cdf.0f76.ffa0": {"mac":"1cdf.0f76.ffa0", "vlan": 18,  "int": "Eth1/2", "learned": "dynamic", "pos":6},
        "f4cf.e211.217e": {"mac":"f4cf.e211.217e", "vlan": 18,  "int": "Eth2/2", "learned": "dynamic", "pos":7},
}

for mac,macdata in sorted(data.items()):
    print "The MAC is: %s and has VLAN: %s" % ( mac, macdata["vlan"])

The result would be:

user@learnpy.cisco.com:

$ python test.py
The MAC is: 0004.9ac8.7680 and has VLAN: 44
The MAC is: 0005.73da.9560 and has VLAN: 55
The MAC is: 0009.7cee.91e0 and has VLAN: 11
The MAC is: 0022.4da3.f82c and has VLAN: 15
The MAC is: 002a.6a5f.e701 and has VLAN: 64
The MAC is: 1cdf.0f76.ffa0 and has VLAN: 18
The MAC is: f4cf.e211.217e and has VLAN: 18

As you can see we have sorted based on the MAC address that is the key to the iterated dictionary. That isn't as usefull as sorting based on the ports or the VLAN. For this we now have to get a little more creative with the function call to sort. Now we have to pass two components. Both the index and the value. The value in this case is going to be a dictionary in itself.

To do ths we will utilize a lambda as follows: lambda ( key, value): (value, key). Looking at this code we haven't done much. But we can specify a column on the lambda return value key. That would make the call something aslambda (key, value): (value["column"], key). Let's write this code into our sorting example for the mac address table.


data={
        "0004.9ac8.7680": {"mac":"0004.9ac8.7680", "vlan": 44,  "int": "Po520", "learned": "dynamic", "pos":1},
        "002a.6a5f.e701": {"mac":"002a.6a5f.e701", "vlan": 64,  "int": "Po540", "learned": "dynamic", "pos":2},
        "0009.7cee.91e0": {"mac":"0009.7cee.91e0", "vlan": 11,  "int": "Po520", "learned": "dynamic", "pos":3},
        "0005.73da.9560": {"mac":"0005.73da.9560", "vlan": 55,  "int": "Eth4/5", "learned": "dynamic", "pos":4},
        "0022.4da3.f82c": {"mac":"0022.4da3.f82c", "vlan": 15,  "int": "Eth4/1", "learned": "dynamic", "pos":5},
        "1cdf.0f76.ffa0": {"mac":"1cdf.0f76.ffa0", "vlan": 18,  "int": "Eth1/2", "learned": "dynamic", "pos":6},
        "f4cf.e211.217e": {"mac":"f4cf.e211.217e", "vlan": 18,  "int": "Eth2/2", "learned": "dynamic", "pos":7},
}

for mac,macdata in sorted(data.items(), key=lambda (key, value): (value["vlan"],key) ):
    print "The MAC is: %s and has VLAN: %s" % ( mac, macdata["vlan"])

Which produces the result we had been looking for:

user@learnpy.cisco.com:

$ python test.py
The MAC is: 0009.7cee.91e0 and has VLAN: 11
The MAC is: 0022.4da3.f82c and has VLAN: 15
The MAC is: 1cdf.0f76.ffa0 and has VLAN: 18
The MAC is: f4cf.e211.217e and has VLAN: 18
The MAC is: 0004.9ac8.7680 and has VLAN: 44
The MAC is: 0005.73da.9560 and has VLAN: 55
The MAC is: 002a.6a5f.e701 and has VLAN: 64