Python Interfaces: Why should a Data Scientist Care? | by Diego Barba | Jun, 2022

By Jessie Hobb On Jun 13, 2022

Class interfaces, abstraction layers, inheritance, isn’t that a software developer problem? Why should you, as a data scientist, care?

Image by author.

Interfaces make almost all of our favorite data science libraries possible. That is a good enough reason, at least for me, to care. But let us go deep into the subject. In the context of the present story, interfaces are an Object-Oriented (OO) concept to define other objects’ properties and behavior.

Interfaces are handy when we are to design a piece of software that:

depends on objects that don’t exist yet, but will exist in the future (e.g., plug-ins or user-defined objects)
allows for interchangeable objects with the same core behavior but slightly different functionality and internal logic
isolates core logic from external dependencies, such as databases or external APIs

These ideas sound promising but honestly, they sound more like software dev’s jargon (an OO software dev, in fact). Why should you, a data scientist, care?

Most libraries we love, such as Keras or scikit-learn, use interfaces to define model properties. Most of them allow you to code custom objects in your models.
If you ever need to code your own tools with APIs that are such a joy to use as sklearn’s API, you will need interfaces.
In Python, everything is an object, so knowing the basics of OO is a must.

Why an Interface?

Think about the following problem. We have two models (each one a class, Layer1 and Layer2) with fit and predict methods. And we have another model (another class, LayeredModel) that takes the two Layer models and combines them somehow. At this point, we test your code, and everything works well; or does it?

Class diagram: model composition with no interface. Image by author.

Although LayeredModel works, it has several weaknesses:

If you change something in Layer1 or Layer2, it is very likely that you will need to change LayerdModel as well; this is a recipe for disaster.
Both models, Layer1 and Layer2, have the same behavior (fit and predict), but there is no clear way of stating this.
What happens if we want to add more layers?
If LayerdModel uses layers, wouldn’t it make sense to let LayerdModel be a layer all by itself? Doing so would allow for constructing more complex compositions.

How do we solve such problems?

Enter the interface.

We define an interface (BaseLayer) for the models. A composite model, such as LayeredModel, will depend on models that implement such interface. We then code the layers such that they all implement (adhere to) the interface. Doing so solves most of our problems.

Class diagram: model composition with an interface dependency on the middle. Image by author.

The only thing missing is to make LayerdModel implement the interface as well, and we will have created a simple and clean way to build pipelines of stackable models.

There are many ways to define and use interfaces in Python, and all of them have their pros and cons. In this story, we will review the most common ways of declaring interfaces and go through some examples and common patterns.

Story Structure

Declaring interfaces
– Informal interface
– Abstract base class (ABC)
– Protocol
– zope.interface
– Summary of pros and cons
Building complex models with ABC
Building complex models with Protocols
ABC: partial implementation
ABC vs. Protocol and the multiple inheritance problem
Final words

Declaring Interfaces — Informal interface

The easiest way to define an interface in Python is through a regular class; the following example defines a class to be used as an interface for a sklearn’s API style model:

You can see that the fit and predict methods are not implemented, just described in terms of types (annotations) and a docstring with what they are supposed to do.

We would use the interface by inheriting from it and overriding the fit and predict methods. We would like to assume that all classes that inherit from the interface implement the interface:

This method has a severe disadvantage; if we inherit from the interface but do nothing (“pass”), then the child class will have the methods, but they will not be implemented (they return None). Hence, we cannot assume that all child classes implement the interface. This behavior can cause problems in our code:

A way to solve this is to define a stronger interface where we raise a “NotImplementedError” exception in all the methods. Doing so will not allow us to repeat the same antipattern of inheriting and then doing nothing (“pass”):

This way of declaring interfaces (informal) has a significant drawback; there is no clear way to state that the class is an interface except by including the word interface in the name. Remember, we were using interfaces to achieve more clarity in our code. Another drawback is that these interface classes can still be instantiated like a regular class, which is not great.

Declaring Interfaces — Abstract base class (ABC)

Python’s ABC (abstract base class) from the abc module solves most of the problems that arise from informal interfaces.

To create the same interface, we create a class and inherit from ABC, then use the abstractmethod decorator for the not implemented methods:

By doing so, we achieve:

Clarity: it is clear that this is not a regular class; it is clearly an abstraction layer and is meant to be inherited.
Implementation constraint: it will raise an error if we inherit from the interface and do not implement the methods.
Instance constraint: an exception will be raised if we try to instantiate the interface.

Example of implementation constraint:

Example of correct implementations:

Using abstract base classes is preferred for declaring interfaces, but there is just one small caveat. They are far too general; ABCs can be used for many other things. It is not always clear when an ABC is just an interface or something more (we will examine this in a subsequent section).

Declaring Interfaces — Protocol

Protocol is the new kid on the block regarding interfaces. It first came about with PEP 544 for Python 3.8 within the typing module. It is an implicit way of defining interfaces; however, it is only helpful when using type hints and doing static type checking (e.g., mypy). Nevertheless, I find this way of declaring interfaces my personal favorite. Anyway, if you are not using type hints and static type checking these days, you probably should.

The way to declare the interface is very similar to that of ABC, but instead, we inherit from Protocol and do not use the abatractmethod decorator.

To implement the interface, we just follow the protocol typing:

You can see that there is no reference whatsoever (except for the docstring) that this class is somehow related to the interface.

Here is another example:

And here is an example of a class that does not adhere to the Protocol:

As you can see, there is no actual reference to the Protocol; in fact, you could ask yourself what the use for such a thing as Protocols is.

Finally, here comes Protocol’s use.

Let us now define some code (with static type info) that uses the model’s interface (Protocol), in this case, a function:

Then use the function with DummyLayer and AnotherDummyLayer if we do static type checking, everything is golden. However, there will be an error if we use WrongDummyLayer as a function argument; since we declared the function’s arguments as Protocol compliant, and WrongDummyLayer does not adhere to it.

The main criticism of Protocol is that it does not comply with the Zen of Python (PEP 20)

… Explicit is better than implicit…

— Zen of Python

Declaring Interfaces — zope.interface

The last stop in the interface tour is the zope.interface. It is a third-party library. I have not used this way of declaring interfaces; however, I feel compelled to include it for legacy reasons. It has been around since Python 2.

Here is how we declare the interface:

We inherit from zope.interface.Interface. So far, so good. We use the zope.interface.implementer decorator to declare that some class implements the interface:

Here is where I found the first problem. In this particular case, the fit method returns an instance of the class itself (“return self”); the purpose is to do chaining like fit().predict(). In the previous cases, the type for the returned object of the fit method of the implementer was the interface type. There was no error with the static type checking. However, in the present example, there is an error. We have to modify it to this:

It does not sound like a big thing, but it is. In terms of types, saying that a class implements an interface means that the implemented is the same type as the interface. So to be general, we want to say that the fit method returns an object of the interface type.

Declaring Interfaces — Summary of pros and cons

Informal interface pros:

intuitive
no need for dependencies

Informal interface cons:

not very clear in the intent while declaring the interface

ABC pros:

intuitive
great functionality

ABC cons:

hard to say whether the base class is just an interface or a more general layer of abstraction and hence another design pattern

Protocol pros:

Protocol cons:

implicit (messes with the Zen of Python)
useless without static type checking

zope.inteface pros:

very clear in intent, the name says it “interface”

zope.interface cons:

third-party dependency
not great with type hints

Building complex models with ABC

Now we turn to a practical example using interfaces declared with ABCs. Let us define our BaseLayer interface; this interface is very similar to the previous examples:

Then we build a model which uses models (layers) which adhere to the BaseLayer interface. This model, LayeredMeanModel, is itself an interface implementer:

Then we build two simple models, one that always predicts zeros and another that always predicts ones:

And we use the models:

Notice that we could have done many things with our interface definition. We could have used N layers in LayeredMeanModel; in fact, we could have used LayeredMeanModels as layers for another LayeredMeanModel. That is the power of interfaces.

Building complex models with Protocols

Now we repeat the same example as the previous one but using Protocols. The code is very similar, except for the ABC inheritance and abstractmethod decorator.

Once again, the interface (Protocol) remains implicit, and it will only arise if there is a type error in a static type checker.

ABC: partial implementation

When we talked about declaring interfaces with ABCs, we said we could use ABCs for much more than an interface. One of the primary uses of ABC is for partial implementation. It is common to create an ABC with some abstract methods (not implemented) and some implemented methods that possibly use the abstract methods.

Let us code the talk, we use a similar example as before, but now we include a fit_predict method (implemented) that uses the fit and the predict methods (not implemented):

Hence, when we inherit from the base class and implement the fit and predict method, the fit_predict method becomes available by inheritance:

This example clearly shows how ABCs are used more generally than an interface.

ABC vs. Protocol and the multiple inheritance problem

A potential “problem” that could arise when using ABCs as mere interfaces is multiple inheritance. When we have two objects that implement the same interface:

and are very similar between them:

except for some attributes (in this case, the name property), we would like to avoid duplicated code and instead make the two LayerZeros inherit the fit and predict method from a fourth class, in this case, LayerZerosMixin:

We end up with both LayerZeros having multiple parents. Most OO devs will frown upon multiple inheritance; some even consider it an antipattern. That is why there are many OO languages where multiple inheritance is not allowed. In Python, adding the word Mixin to the title of the extra inherited classes somehow, magically, fixes the problem. It is an ongoing debate in the OO community.

The problem at a logical level, the problem is that multiple inheritance sometimes breaks the canonical meaning of inheritance, an is a relationship. For example, suppose that we have employees and a cash register to pay them in a business. Suppose we have a Person class and a CashRegister class that pays employees. If we did something like Employee(Person), i.e., the Employee class inherits from Person, everything is ok since the Employee is a person (for now at least). But we might do something like Employee(Person, CashRegister) to include the payments functionality inside the Employee class. After all, payment defines employment. This is tricky because Employee is not a CashRegister. In Python and other languages that support multiple inheritance, a way to indicate that Employee is not a CashRegister would be to rewrite the class name to CashRegisterMixin.

As I said, it is an ongoing debate.

Final words

The right choice is a matter of preference. Going with either ABCs or Protocol is up to you. Personally, I use Protocol for interfaces and ABCs for partial implementation; that is my own rule.

The days when the data scientist’s only job was to create jupyter notebooks and pickled models are long gone. The odds are that you will be expected to develop finished products instead of models and plots. This means coding your way through data acquisition, all the way to deployed APIs of your models. In other words, you will need to merge your data science skills with software development. At that point, interfaces will make your life easier and your code a lot cleaner.