How to add type hints into Python 2.7 project
Posted on Mon, 11 Dec 2017 in Python
Many times I wrote that type hints in Python help to work with big or moderate projects. However, if you decide to add them to your project, you have to check your project regularly using CI. And this kind of checks is not easy to implement. This article is my story about obstacles in this process.
I'm trying to add type hints to our project for a while. And now I have a bunch of methods how to do it more efficiently, fast, and painless. I spend many days trying to find them. Now I want to share my experience.
I have to notice that the most painful problems are connected with Python 2.7 syntax. Many of them vanish after switching to Python 3.x. Another big problem is lack of good examples. It easy to find some good code examples in our codebase. However, all of them are under the NDA and they are relatively big. On the other hand, small examples are not enough illustrative.
Small pieces
Do not try to add type hints for the whole project. Firstly, if you can add hints for the whole project in a reasonable time, possibly your project isn't big enough. Secondly, if your project is big, you will dramatically slow your down. You will get a zillion of errors.
It is much better to divide a project into smaller pieces and add types to them. Ideally, that peace should be small enough to fix all errors in a half an hour or so. For me, it means that mypy should return no more than 10-15 errors. So I usually check one middle size module a time. There is no anything wrong if you slice the project into thin pieces.
It is better to fix errors bottom-up way. I check submodules first. After them, I check their parent module.
Make type checks for every pull request you made
It's very easy to check types for a couple of files. It's a bit harder to do the same for a dozen. Usually, pull request consist of less than dozen files. So checking types for that files doesn't take you much time. However, it could increase a velocity of adding type annotation into your project.
This way have a bonus. It could show you several potential problems in the project. It is usually hard to understand where to start adding type checks first. A pull request shows errors that should be fixed first. Files from PRs are good candidates to check. They are under active development.
Weak checks first
It's cool to add type hints with maximum details. It's cool to have code without any Any
annotations. However, it's next to impossible to do from scratch. Some parts could require refactoring. Sometimes there is no way to pass all checks with a current version of the code. Then you have two options: rewrite that parts or ignore checks.
Meanwhile, weak checks are better than no checks. General type hints are better than no hints. In most cases marking a parameter or a variable with the most generic type is more useful than a variable without any type hint.
Any
is a good option sometimes. At least it allows you to add some annotations. And mypy allows you to find all Any
in your project later. So there is no reason to spend much time trying to find an accurate type for every line in a code. Make your project pass all weak checks first.
CI first
One of my worst mistakes was adding type hints before implementing automatic tests in CI. mypy shows a significant amount of errors even on a project without any type hints. Most of them are something like "Need type annotation for a variable" or error shows that some variable has different value type then it has in a base class.
class Foo(object):
boo = None
class Boo(Foo):
boo = 'boo'
example01.py:6: error: Incompatible types in assignment
boo = []
foo = {}
example02.py:1: error: Need type annotation for variable
example02.py:3: error: Need type annotation for variable
Usually, there are a lot of places in a project with that kind of errors. But there are no problems to fix them. It'll take no more than a couple of days to fix them all.
But if you have type annotations and no CI checks, errors could be more difficult to solve.
Firstly, sometimes function interface does not match type annotations.
This type of errors is common in Python 2.7 projects where comments are used for type hints. And such errors are not easiest to deal with. Reading function interface is not enough to understand types of its parameters.
import datetime
def convert_to_timedelta(step):
# does something and return timedelta
def foo(since, until, step):
# type: (datetime, datetime) -> Generator[datetime, None, None]
step = convert_to_timedelta(step)
while since < until:
yield since
since += step
example03.py:8: error: Type signature has too few arguments
What type should be step
in this context? Could it be timedelta
? Perhaps. Or is it int
? Why not?. It depends on convert_to_timedelta
realisation. But there is no type hint for this function. So, fix this error is not easy.
Of course, you always can just delete hint for foo
or choose Any
as a type of step
. It looks like a step backward. In some really difficult situation, it is an option.
If an unannotated parameter appears somewhere in the middle of function interface, it becomes even more difficult. For example, let's use the same function. But this time step
appears in the middle.
def foo(since, step, until):
# type: (datetime, datetime) -> Generator[datetime, None, None]
step = convert_to_timedelta(step)
while since < until:
yield since
since += step
Let's assume that there was a good reason to put step
parameter in the middle. It is just an example. Could step
be a datetime
or not this time? In general, it could. So (datetime, datetime, datetime) -> datetime
is a good type for this function. However, looking through the function foo
I doubt that.
It could be much harder to deal with a real function that is more longer and has more external calls. In this case, the only thing you can do is to analyze source code and all calls to find a glue.
Secondly, without automatic checks there is no way how to understand if type hints are OK.
Ok. Now we have this function:
def foo(since, until, step):
# type: (datetime, datetime, int) -> Generator[datetime, None, None]
step = convert_to_timedelta(step)
while since < until:
yield since
since += step
Later, we decide that int
isn't a proper type for step
, timedelta
fits better. In all calls, we carefully change int
to timedelta
. Also, we've thrown away a line with int
to timedelta
conversion from the function. But how it usually happens, we forgot to change its type hint:
def foo(since, until, step):
# type: (datetime, datetime, int) -> Generator[datetime, None, None]
while since < until:
yield since
since += step
example05.py:9: error: Unsupported operand types for + ("datetime" and "int")
example05.py:9: error: Incompatible types in assignment (expression has type "int", variable has type "datetime")
Even in this tiny example mypy
gives wrong recomendations. datetime
is wrong type for step
. The right variant is that:
def foo(since, until, step):
# type: (datetime, datetime, timedelta) -> Generator[datetime, None, None]
while since < until:
yield since
since += step
So it's better to use workflow more o less similar to TTD - check code firstly -> make cheks green -> add new type hints.
Be ready to ignore type checks for some parts of a code
Unfortunately, there are complete type hints not for all libraries. Despite a huge amount of work made by a community, not all of them are accurate.
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
def compose_email():
# type: () -> MIMEMultipart
msg = MIMEMultipart()
msg['Subject'] = 'Subject'
msg['From'] = 'test@domain.com'
msg['To'] = 'test@domain.com'
msg.attach(MIMEText('Body'))
return msg
If you check this code with mypy с with the parameter --py2
, it returns some errors:
example06.py:8: error: Unsupported target for indexed assignment
example06.py:9: error: Unsupported target for indexed assignment
example06.py:10: error: Unsupported target for indexed assignment
example06.py:11: error: "MIMEMultipart" has no attribute "attach"
But it doesn't mean that the library is used in a wrong way. No! Stab file for it isn't complete.
If you get such error, you have two options:
- Fix stab files by yourselves.
- Turn off checks for that part of the project.
Of course, it's better to make a contribution to the community work and fix type hints for the library. But often there is no time for it. There are deadlines and an infinite list of task. In this case, there are decorators @typing.no_type_check
and @typing.no_type_check_decorator
in typing library. Using them you can turn off checks for a function, a class or a decorator. For one line there is a type comment: # type: ignore
.
# type: (...) -> bool
In Python is OK then predicate returns not exactly a boolean value. Usually, it returns a non-empty object or False. So typical predicate look like this:
from typing import Dict, Any
def predicate(elem):
# type: (Dict[str, Any]) -> bool
return elem and 'something' in elem
In a context of using such function bool
as a type for the predicate result is a good hint. It fully describes expected predicate behavior. However, the return type is different. In this example it is Union[Dict[str, Any], bool]
. So if you add type hinting, you have to cast results to bool
explicitly:
from typing import Dict, Any
def predicate(elem):
# type: (Dict[str, Any]) -> bool
return bool(elem and 'something' in elem)
I don't know is it good or bad. On one hand, such code looks strange in Python. On the other hand, it limits function possible usage. Now it's only a predicate, you can't use it in any other context.
In conclusion, I want to remind you that type hinting worth adding only to big projects. For small projects there is the only reason to add type hints - the project is a library.
Got a question? Hit me on Twitter: avkorablev