A One-liner That Validates Input Types to Your Functions At Runtime
Last Updated on January 13, 2023 by Editorial Team
Last Updated on January 13, 2023 by Editorial Team
Author(s): Shangjie Lyu
Originally published on Towards AI.
with source code explanation and simplified implementation
Note: this article is part of my series “Simple Techniques and Tools for Data Science”, which is a collection of complementary powerful tools. This article is based on: mypy 0.991 pydantic 1.10.2 beartype 0.11.0.
The one-liner to be discussed here is the validate_arguments decorator provided by pydantic. I will also explain the source code and implement a simplified version at the end!
1. Type hints
If you are already familiar with type hints, please feel free to skip this section; but in case you don’t, I’ll explain it with a simple example.
Most static languages like Java require argument and return types to be specified for functions.
Here in the addTwoNums Java function, we know that the two arguments num1 num2 and the return are all int typed. In its Python equivalent, we can declare the types in the docstring or documentation.
However, there is a more native way to define them — type hints.
As the above code shows, we can define the types of the arguments and the return using type hints (: for arguments and -> for return), so we don’t need to write them again in the docstring.
Type hinting is a relatively new feature and was introduced in Python 3.5. There are debates around whether or not to use type hints because, after all, Python is a dynamically typed language, and the types are by no means enforced. Personally, I think it’s good practice to add type hints, as it not only improves the code readability and makes documentation easier but also allows for various type-related checks, as I will introduce below, and I do use type hints in my work.
2. Validate input types against type hints
There are several tools that allow us to check the code based on type hints.
2.1. Static checker
As the name suggests, a static type checker checks your code statically, in other words, it checks the input types of your functions if those function calls already exist in your code base.
MyPy (14.4k stars on GitHub) is probably the most popular tool of this kind, with some alternatives, including Pyright, Pytype, and Pyre.
For example, if we have a script that calls a function with type hints but has invalid input types:
Mypy is able to scan the scripts and detect the type errors. However, if the function is called at runtime, mypy will not be able to check it.
For more information about static code checking and linting, please follow up on my series “Write Production-ready Code for Data Science”, where I will create an article on this topic.
2.2. Runtime checker
Assuming we are developing a Python package for the team, and we want to validate the input types to our functions at runtime when the colleagues are using them, we can simply add a validate_arguments decorator from Pydantic (12k stars on GitHub) to our function.
The examples are quite self-explanatory, and we can see that the decorator checks the inputs against the type hints and raises a ValidationError if the types mismatch.
Pydantic is not the only one that offers this functionality; there are other decorators achieving the same thing, such as beartype (1.4k stars on GitHub).
However, beartype’s error messages are clearly not as readable as the ones from pydantic.
3. Allow arbitrary types with pydantic
By default, the validate_arguments decorator only supports Python built-in types, but often we might want to pass a Pandas DataFrame as an argument, and we can do so by allowing arbitrary types in config as below:
This example function simply returns the first n rows of a dataframe, and we can see from the examples that the validate_arguments decorator can now validate the pd.DataFrame type as well.
Caveat: as the official documentation stated, the validate_arguments decorator is in beta, is has been added to pydantic in v1.5 on a provisional basis. It may change in future releases and its interface will not be concrete until v2.
4. (Optional) How does it work
If you are an intermediate to advanced user of Python, then the last two sections are for you!
While a detailed breakdown of the source code is certainly beyond this article’s scope, let’s look at the two most fundamental pieces to the puzzle: type hints and user inputs.
4.1. Access type hints from function’s attribute
If we have a function with type hints, we can access the type information in the function’s __annotations__ attribute. For example:
And this is used in the _typing_extra.get_type_hints function in pydantic.
4.2. Access type hints from the function’s signature
We can also access the annotations from a function’s signature with Python’s built-in inspect module. For example:
This is also used in the ValidatedFunction class in pydantic.
4.3. Access user inputs in a decorator
This will be easy to understand if you have experience with Python decorators. In short, a decorator is (mostly) a function that takes another function as its input, does something about it, and then returns the original function. We can use a @ symbol to add a decorator to a function, but really it just means the decorator takes the function as its input, and the below two methods (line 13 and line 19 — with or without the @ symbol) are equivalent.
In this simple example, my_decorator prints a message before and after executing the original greet function. Without diving into more details about decorators, the important takeaway here is that all the arguments (args and kwargs, in this example, “John”) to the original function are also fed into the decorator itself, and that’s how we can get the user inputs.
5. (Advanced) Let’s implement our own!
Having learned the fundamental parts of the validate_arguments decorator, why don’t we implement a simplified version of our own?
Perfect! We have now implemented a validate_arguments decorator ourselves, and as we can see from the examples provided, it works exactly as we would expect. A few key points here:
- Firstly we use signature(func).parameters (line 9) to obtain the expected argument names and types
- We then use signature(func).bind(*args, **kwargs) (lines 12–14) to read all parameter names and values to the function call, including arguments, keyword arguments, and default values
- Finally, check all the input values against the expected types, and call the original function if all types are valid, otherwise raise a TypeError (lines 19–24)
You might have noticed the @wraps decorator from functools (built-in Python module) in line 6, which also appeared in the third screenshot of pydantic source code above. It essentially copies the metadata (name, docstring, annotations, etc.) of the original function into the wrapper so that we can preserve the information in our original function after it’s being decorated, which is good practice when developing decorators. And in lines 62–65, we can see that the metadata e.g. __doc__, __annotations__ attributes of the decorated get_dataframe_head function is preserved, rather than getting the metadata of the wrapper.
Of course, I don’t expect my simplified implementation to beat pydantic’s, which is far more detailed and has been tested with various edge cases, but nevertheless, I hope you found this helpful and have learned something new or gained some inspiration. (That said, our simple decorator actually works pretty well in most cases we would encounter, but below is just an example of the edge case tests — classmethod— that pydantic’s decorator passes and ours fails.)
Thanks for reading and your feedback is more than welcome. You could also follow or connect with me on LinkedIn, where I created the hashtag #100ArticlesForDS (100 Articles for Data Scientists) to share data science articles that I find insightful and helpful, alongside my comments, thoughts, and additional practical tips.
A One-liner That Validates Input Types to Your Functions At Runtime was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI