4. Object Oriented Programming using Python Classes

In this tutorial we provide an overview of Object oriented programming in Python, including what a class is and how to define it. This tutorial is derived from a session during the Winter 2024 SU OSPO Advanced Python Workshop.

In addition, the Appendix provides some more background on common decorators and different ways to define and call functions.

Namespaces and scope

Before introducing classes, it is instructive to understand how mappings between objects and values are handled in python.

Suppose we create a function called linef that calculate the y-value of a line given a slope, y-intercept, and x-value:

def linef(x, slope, intercept):
    y = slope * x + intercept
    return y

We can then evaluate the y-value of a line with say, a slope of 1 and y-interecept of 2 at x=3 with:

m = 1
b = 2
linef(3, m, b)

In the definition of linef we created an argument called y, which we modified within the function. No variable y was defined prior to the function definition. And even though we have called linef, no variable y exists in the notebook:

print(y)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(y)

NameError: name 'y' is not defined

To understand what’s going on here, let’s establish a few definitions. When we set m = 1, we created a mapping from the object called m to the value 1. We created a similar mappings for b. Likewise, when we defined the function linef, we created a mapping from the object called linef to the block of code defined above. A collection of mappings is called a namespace. Currently, the namespace of our notebook/module (called the module’s global namespace) is m, b, and linef. Our notebook also has access to Python’s builtins namespace. The scope of the notebook — i.e., the collection of namespaces that are searched to resolve objects — is the notebook’s namespace plus the builtins.

When we define a function, we define a new, local namespace within that function. The function’s local namespace includes all of the arguments that the function takes, plus any mappings that are created within that function. So, when we called linef(3, m, b), a namespace was created with local objects x, slope and intercept, which were mapped to 3 and the values of the global mappings m, and b, respectively. The code within linef was then evaluated, which created the local variable y. The line return y then passed the value that y is mapped to (not the variable y) back to the global namespace. At this point, the function’s namespace is deleted, meaning that the objects x, slope, intercept, and y cease to exist.

When we called linef, the value returned was just printed to screen; it was not mapped to anything. If we want the result of linef to persist, we need to map a variable to it so that it will exist in the notebook’s namespace; e.g.:

liney = linef(3, m, b)
print(liney)

Now let’s define a variable x = 0 in the global namespace:

x = 0

If we call linef again with the same arguments, we still get the same result:

linef(3, m, b)

But we now have an x defined in the notebook’s global namespace! Why did 3 get used for x inside the function instead of the global x = 0? The global x = 0 was not used because whenever Python is evaluating a block of code, local mappings take precedence. So, when we called linef(3, m, b), a local namespace was created in which x was mapped to 3. Inside the function, the local x is not the same thing as the global x even though they share the same name.

If a function uses an object that is not defined in the local namespace, then Python will step out to the next local namespace to look for it. This process will repeat until it gets to the module’s global namespace, followed by the builtins; if nothing can be found there, a NameError is raised.

For example, when the code inside linef(3, m, b) is being evaluated, 3 namespaces exist. In order of object resolution, they are:

linef namespace: {x: 3, slope: 1, intercept: 2}
global namespace: {x: 0, m: 1, b: 2, linef: <linef code>, ... }
builtin namespace: {+: <add>, *: <multiply>, ... }

Understanding nested namespaces is key to understanding how classes work.

Warning

Due to the way mappings are resolved, it is possible to define a function that uses a variable from outside of its local namespace. For example:

def dontdothis(a):
    return a + b

dontdothis(1)

Here, because b was not listed in dontdothis’s arguments (and was not defined anywhere in the function), Python ends up using the b defined in the notebook’s global namespace. We therefore end up with 1 + 2.

If it wasn’t clear from the function name, don’t do this. The danger is the output of the function can change depending on where it is called in the code even if the arguments are the same. For example, if at some later point we set b to a different value:

b = 6

Then calling dontdothis with the same argument will yield a different result:

dontdothis(1)

This can lead to unexpected bugs that are hard to track down. Thus, generally, you should include all variables needed by a function in its list of arguments. There are some cases where this rule may need to be broken, but these are rare, and should be avoided if possible.

Classes

The need for classes

It is possible to write code in Python that only uses functions (this is known as functional programming). However, in certain cases this can lead to unwieldy and difficult to manage code.

As a simple example (borrowed from here), say we wanted some code to keep track of a student’s progress in a class. We could do this using a dictionary:

# create student Jane
jane = {}
jane['name'] = 'Jane'
jane['homework_grades'] = [87., 90., 82., 75., 97.]
jane['exam_grades'] = [88., 80., 94.]

You also give your students the opportunity to earn extra credit by writing a report. Jane, the hardworking student that she is, takes advantage of this and writes an excellent report. You add that to her grades:

jane['extra_credit'] = 99.

You now write a function to calculate averages:

def average(grades):
    if len(grades) == 0:
        return 0
    return float(sum(grades)) / len(grades)

… which you use in a function you write to cacluate your students’ GPA:

def gpa(student):
    homeworkavg = average(student['homework_grades'])
    examavg = average(student['exam_grades'])
    total = average([homeworkavg, examavg]) + student['extra_credit'] 
    return min(4.0 * total / 100., 4.0)

You use this to get Jane’s score:

print(gpa(jane))

4.0

Now you want to do the same for your other student, Susie:

susie = {}
susie['name'] = 'Susie'
susie['homework_grades'] = [80., 73., 77., 50., 0.]
susie['exam_grades'] = [70., 63., 50.]

print(gpa(susie))

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[16], line 1
----> 1 print(gpa(susie))

Cell In[13], line 4, in gpa(student)
      2 homeworkavg = average(student['homework_grades'])
      3 examavg = average(student['exam_grades'])
----> 4 total = average([homeworkavg, examavg]) + student['extra_credit'] 
      5 return min(4.0 * total / 100., 4.0)

KeyError: 'extra_credit'

Oh! Susie - a bit of a slacker - didn’t do the extra credit (even though she needed it more). As a result, you forgot to add an extra credit entry for her, leading to a problem with your function. You can fix this by going back and changing your gpa function, or by adding the missing data to Susie’s dictionary. However, this highlights a disadvantage to this type of programming: the object (in this case a student, represented by a dictionary) is not well defined. This can lead to pitfalls when trying to write functions that will manipulate your objects.

Defining classes

Python classes are a way to bundle data and functions together into logically coherent objects (this is known as object oriented programming). Like functions, they create their own local namespaces within your program within which you can manipulate objects. Unlike functions, these namespaces persist even when you are not interacting with the class.

To illustrate how classes work, let’s revisit the above problem. We want an object that represents a student, which has a number of attributes, such as name, homework_grades, etc. To do that, we define a class:

class Student:
    """Stores information about students."""

    def __init__(self, name):
        self.name = name
        self.homework_grades = []
        self.exam_grades = []
        self.extra_credit = 0

    def homework_avg(self):
        """Average homework grade."""
        return average(self.homework_grades)

    def exam_avg(self):
        """Average exam grade."""
        return average(self.exam_grades)

    def gpa(self):
        """GPA (on a 4.0 scale)."""
        total = average([self.homework_avg(), self.exam_avg()]) + self.extra_credit
        return min(4.0 * total / 100., 4.0)

Now we create Student instances, one for Jane, and one for Susie:

jane = Student('Jane')
jane.homework_grades = [87., 90., 82., 75., 97.]
jane.exam_grades = [88., 80., 94.]
jane.extra_credit = 99.

susie = Student('Susie')
susie.homework_grades = [80., 73., 77., 50., 0.]
susie.exam_grades = [70., 63., 50.]

We can now get their GPAs:

jane.gpa()

4.0

susie.gpa()

2.34

What did we do here? First, we defined the class Student, to which we added methods __init__, homework_avg, exam_avg, and gpa. No code was executed until we created an instance of Student by calling:

jane = Student('Jane')

At this point, a local namespace called jane was created, which contained the mappings:

{name: 'Jane', homework_grades: [], exam_grades: [], extra_credit: None,
 homework_avg: <Student.homework_avg code>, exam_avg: <Student.exam_avg code>,
 gpa: <Student.gpa code>}

This is similar to what happens when a function is called. However, unlike a function, the namespace persisted after this line, allowing us to access its attributes with .; e.g., jane.homework_grades. Once we populated the relevant attributes, we were then able to call gpa to get Jane’s GPA.

What’s the deal with `self`?

Functions that are defined in a class are called methods. Their purpose is to act on a class instance’s attributes. In our Student example there were four methods: __init__, homework_avg, exam_avg, and gpa. All of these took self as their first argument. Why?

In order for a method to act on an instance’s attributes, it must have some way to reference the class instance. For example, when we call jane.gpa(), we want the function Student.gpa to act on Jane’s grades. This is the purpose of self: it represents the class instance, so that we can access attributes of the instance within the function definition.

Note that when we called jane.gpa() we did not need to provide any arguments. This is because Python automatically adds the class instance as the first argument when a method is called. So:

All class method definitions must have self as their first argument. (With two exceptions; see Appendix, below.)
When calling a method of a class instance, you do not pass the instance in the function arguments.

In other words…

Correct:

jane.gpa()

0.0

Not correct:

jane.gpa(jane)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 jane.gpa(jane)

TypeError: Student.gpa() takes 1 positional argument but 2 were given

Although you can do:

Student.gpa(jane)

0.0

Note: The name self in a method definition isn’t special. What matters is the argument order: the first argument of the class method is assumed to be the class instance, regardless of what it is named. Using self for this is just a convention.

The `init` method

For the most part, you can add any number of methods with various names to a class. However, there are a few special method names that Python recognizes (all of which begin and end with __). The most common of these is the __init__ method.

The __init__ method is called while the class is initialized. The arguments of __init__ determine what arguments need to be passed to the class when it is initialized. In our example, Student.__init__ was defined as needing name (in addition to self, which is always required). As a result, when we initialized the class we had to pass in a string representing the student’s name.

The __init__ method is used to add attributes to a class and set their initial values. It is not necessary to have an __init__ method, however. Nor must attributes be assigned in __init__. Attributes may be added to a class instance after the class is initialized. For example:

class Foo:
    pass


foo = Foo()
foo.bar = 10.
print(foo.bar)

10.0

The catch is that any attribute added to an instance only exists for that instance. This means that if we create another instance of Foo:

foo2 = Foo()

It will not have a bar attribute:

print(foo2.bar)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[27], line 1
----> 1 print(foo2.bar)

AttributeError: 'Foo' object has no attribute 'bar'

The benefit of __init__ is that it ensures that all instances of a class have at least the attributes that are set within it.

Class vs. instance variables

In the Student example above, we created four attributes in the __init__ method – name, homework_grades, exam_grades, and extra_credit – and assigned values to them. Since the values of the variables are assigned at class initialization these are known as instance variables.

It is also possible to create what are called class variables. These are variables whose values are assigned in the class definition. For example, let’s redefine our Student class to hava class variable called university, which we’ll set to Syracuse:

class Student:
    """Stores information about students."""

    university = 'Syracuse'

    def __init__(self, name):
        self.name = name
        self.homework_grades = []
        self.exam_grades = []
        self.extra_credit = 0

    def homework_avg(self):
        """Average homework grade."""
        return average(self.homework_grades)

    def exam_avg(self):
        """Average exam grade."""
        return average(self.exam_grades)

    def gpa(self):
        """GPA (on a 4.0 scale)."""
        total = average([self.homework_avg(), self.exam_avg()]) + self.extra_credit
        return min(4.0 * total / 100., 4.0)

Now create two more students and see what their universities are:

bob = Student('Bob')
joe = Student('Joe')

print(bob.name, bob.university)
print(joe.name, joe.university)

Bob Syracuse
Joe Syracuse

We can see that despite having different names, both Bob and Joe have the same university without having to set it. What happens if we change Bob’s university? Does that affect Joe, or any other new Student?

bob.university = 'Cornell'
miranda = Student('Miranda')
for student in [bob, joe, miranda]:
    print(student.name, student.university)

Bob Cornell
Joe Syracuse
Miranda Syracuse

Nope! When we changed Bob’s university, it only reset Bob’s university attribute, but left Joe and Miranda alone. But what happens if we change Student.university? Let’s see how that affects a new student, as well as our current students:

Student.university = 'Stanford'
sam = Student('Sam')
for student in [bob, joe, miranda, sam]:
    print(student.name, student.university)

Bob Cornell
Joe Stanford
Miranda Stanford
Sam Stanford

Joe and Miranda’s university have now also changed to Stanford! What’s going on? When the class is initialized, the values for all class variables point to value that is saved in the class definition. In other words, joe.university points to whatever Student.university is set to. Consequently, if you change the value of the class variable, all instances of that class will immediately have the same value. However, if you change the value of an instance’s class variable (as we did with Bob’s university) then that instance’s variable now points to whatever you set the new value to. In Bob’s case, the university attribute now points to the string ‘Cornell’ rather than to Student.university. That is why Bob was unaffected by the change.

Class variables can be extremely useful for setting common parameters across multiple instances. Just be careful about modifying a class variable directly!

Appendix

Some extra stuff that you may or may not find interesting.

A. Decorators

Consider the following modification of our Student class:

class Student:
    """Stores information about students."""

    def __init__(self, name):
        self.name = name
        self.homework_grades = []
        self.exam_grades = []
        self.extra_credit = 0

    @classmethod
    def from_dict(cls, sdict):
        """Initialize a Student using the given dictionary."""
        student = cls(sdict['name'])
        if 'homework_grades' in sdict:
            student.homework_grades = sdict['homework_grades']
        if 'exam_grades' in sdict:
            student.exam_grades = sdict['exam_grades']
        if 'extra_credit' in sdict:
            student.extra_credit = sdict['extra_credit']
        return student

    @staticmethod
    def avg(values):
        if len(values) == 0:
            return 0
        return sum(values) / float(len(values))

    @property
    def homework_avg(self):
        """Average homework grade."""
        return self.avg(self.homework_grades)

    @property
    def exam_avg(self):
        """Average exam grade."""
        return self.avg(self.exam_grades)

    @property
    def gpa(self):
        """GPA (on a 4.0 scale)."""
        score = self.avg([self.homework_avg, self.exam_avg]) + self.extra_credit
        return min(4.0 * score / 100., 4.0)

What are all of those things with @ before the function definitions? And why don’t the from_dict and avg methods not have self as their first arguments?!?

The names beginning with @, such as @classmethod, are called decorators. Essentially, decorators are wrappers around functions that modify the behavior of that function. They can be used on any Python function, but they are most often seen in classes.

Using and creating decorators is a large topic itself, the details of which we won’t get into here (if you’re interested, see here for an excellent tutorial). However, there are three predefined decorators in the Python builtins that often come up in class definitions that I want highlight here: @classmethod, @staticmethod, and @property.

@classmethod

The @classmethod decorator modifies methods so that instead of taking an instance of a class as the first argument (what we normally call self) it takes the class itself (which we normally call cls). This is typically used to provide a way to instantiate a class using alternate input arguments than what is defined in __init__. For example, say we had a dictionary that specifies all of the information about Susie:

susie_dict = {}
susie_dict['name'] = 'Susie'
susie_dict['homework_grades'] = [80., 73., 77., 50., 0.]
susie_dict['exam_grades'] = [70., 63., 50.]

We can now instantiate a Student representation of Susie using the from_dict method:

susie = Student.from_dict(susie_dict)

print(susie.name)
print(susie.homework_grades)
print(susie.exam_grades)

Susie
[80.0, 73.0, 77.0, 50.0, 0.0]
[70.0, 63.0, 50.0]

Notice that when we used from_dict we only provided the dictionary, even though the definition of from_dict had two arguments, cls and sdict. As with normal methods, @classmethod automatically adds in the class as the first argument.

@staticmethod

The @staticmethod decorator modifies methods so that the class instance self is not automatically added when the method is called. As a result, we do not need to reserve the first argument in the definition of a @staticmethod. We see that in the above example: avg takes a single argument values, which is a list of values to calculate an average for.

Since methods wrapped with @staticmethod do not do any automatic substitutions, it’s possible to use them without needing to create a class instance. For example:

Student.avg([60., 70.])

65.0

In fact, we could have just defined avg outside of Student in the global namespace (which is what we did above, using our global function average). In that case we would of called avg(...) instead of self.avg(...) inside of Student.

So why use @staticmethod? Its main purpose is to provide a function that is heavily used by a class, but may have little meaning outside of the class. This can make code a easier to read and understand (“syntactic sugar”). Basically, @staticmethod is a way of logically organizing functions.

@property

The @property decorator is another form of syntactic sugar that modifies how a method is called. Basically, it makes a method look like an attribute.

For example, in the above homework_avg, exam_avg, and gpa were all methods. Normally you would call them like susie.gpa() (as we did in the Classes section). However, because we stuck the @property decorator on each of these methods, we instead do:

susie.gpa

2.34

In other words, we no longer include the () after the name. Note that @property only works with methods that take no arguments.

The @property is used when we want to run some additional code under the hood when an attribute is accessed. It is typically paired with a “setter”, which allows us to also run some code when an attribute is set. For example:

class Star:
    _mass = None

    @property
    def mass(self):
        if self._mass is None:
            raise ValueError("no mass set")
        return self._mass

    @mass.setter
    def mass(self, value):
        if value <= 0:
            raise ValueError("mass must be > 0")
        self._mass = value

star = Star()

print(star.mass)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[39], line 1
----> 1 print(star.mass)

Cell In[37], line 7, in Star.mass(self)
      4 @property
      5 def mass(self):
      6     if self._mass is None:
----> 7         raise ValueError("no mass set")
      8     return self._mass

ValueError: no mass set

star.mass = 10.
print(star.mass)

10.0

star2 = Star()
star2.mass = -3

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[41], line 2
      1 star2 = Star()
----> 2 star2.mass = -3

Cell In[37], line 13, in Star.mass(self, value)
     10 @mass.setter
     11 def mass(self, value):
     12     if value <= 0:
---> 13         raise ValueError("mass must be > 0")
     14     self._mass = value

ValueError: mass must be > 0