# Python libraries and tools to make your scripts suck less If you're an SRE or DevOps engineer, you'll usually find yourself writing scripts to automate tasks. But a collection of scripts, probably written by different members of the team, with different standards and tools, will soon become unmaintainable and trigger conversations such as "Can the script do this?" or "How do I make this change?". Sometimes, you may even have those conversations with yourself because after a while you no longer understand your own code. If you think of your scripts not so much as a set of commands run one after another but as command-line applications, which should meet certain standards and follow good coding practices, they'll become much more user-friendly and maintainable. **Disclaimers**: - This list should be specially useful for people writing scripts from scratch. If you're working on a big project as a developer, it's likely that many of the tools I'll describe (or some alternative) are already in use. But if they aren't, keep reading! - Many of the libraries I'll list are extremely popular, and some of them are even part of the standard library. My goal here is not to introduce them but to explain why you should be using them. --- ## 1. `PyYAML`: load your configs and data from yaml files In the infrastructure as code world, YAML is everywhere, and you and your team are probably very familiar with it. So why not use it in your CLI apps? For example, instead of passing arguments that are not very likely to change with every run (such as auth credentials for an API), you can create a YAML configuration file and load that info from there. If your CLI app requires certain data (an example could be setting up monitoring alerts, with all their parameters), you can also define it in YAML format. That way, people used to dealing with infrastructure definitions will find the data easier to read and modify. Some useful features: - `yaml.load`: loads YAML from a file and converts it to a Python dictionary. - `yaml.dump`: creates a YAML document from a Python dictionary. You can find a detailed tutorial and the full list of features in the [official documentation](https://pyyaml.org/wiki/PyYAMLDocumentation). Don't like YAML? Check out [TOML][] and [its Python library](https://pypi.org/project/toml/). ## 2. `click`: create intuitive command-line interfaces Everybody's using [click][], and there's a reason for that. Not only does it make developing command-line applications much easier than doing it with the `argparse` package, but also it's so flexible and has so many features that [a whole book](https://paiml.com/docs/home/books/python-command-line-tools/) has been written about it. Moreover, [its official documentation](https://click.palletsprojects.com/en/8.0.x/) is very clear and complete. Some useful features: - `click.options` and `click.arguments`: you can use these to add options and arguments to your app with plenty of cool features such as autogenerated help strings, type validation, etc. - [Nested commands](https://click.palletsprojects.com/en/8.0.x/commands/): this is a game changer in turning your scripts into a full-blown CLI app. If [click][] looks a bit overwhelming, see [python-fire][]. ## 3. `sh`: run shell commands as functions In many cases, you'll need to run shell commands from your script/app. The `subprocess` module is the obvious choice, but there are other alternatives that can make your life easier (and your code cleaner). As their authors state, [sh][] is a "full-fledged subprocess replacement for Python 2.6-3.8, PyPy and PyPy3 that allows you to call any program as if it were a function". For example, compare how you can get the following output with `subprocess` vs *sh*: ``` $ ls -l total 32 drwxrwxr-x 8 user user 4096 jul 7 08:22 Code drwxr-xr-x 2 user user 4096 mar 29 19:02 Desktop drwxr-xr-x 9 user user 4096 jul 4 07:27 Documents drwxr-xr-x 7 user user 4096 jun 29 21:45 Downloads drwxr-xr-x 2 user user 4096 mar 29 19:02 Public drwxr-xr-x 3 user user 4096 mar 29 19:25 snap ``` ```python import subprocess p = subprocess.Popen( ["ls", "-l", "/home/user"], stdout=subprocess.PIPE, stderr=subprocess.PIPE ) output = p.communicate() print(output[0].decode('utf-8')) ``` ```python from sh import ls output = ls("-l", "/home/user") print(output) ``` As you can see, the same result can be achieved with fewer lines of code and it's more readable since the commands look like more functions. It provides a much simpler interface to interact with the shell. Some useful features: - [Baking](https://amoffat.github.io/sh/sections/baking.html): if you're using certain arguments over and over again with the same command, you can "bake" them into the command to generate a new one. - It also supports [piping](https://amoffat.github.io/sh/sections/piping.html). If you mostly need to run commands to monitor and manage system resources, [psutil][] could be a better choice. ## 4. `logging`: get better execution and debug info As the script runs, it's very important to know what it's doing, so if it fails or produces an unexpected result you can have information about what happened. You may be tempted to use print statements, since they are friendly and straightforward. Using the python [logging][] module, however, is a better option. You can find a complete tutorial in the official Python documentation: the [logging cookbook][]. Some useful features: - Levels: you can log at different levels, which allows you to add very detailed messages but choose at execution time if you want to see them all or just the most important ones (for example, adding a `--debug` flag). - Handlers: you can define multiple handlers that allow you, for example, to log to the console *and* to a file. - Contextual information: this is a bit more advanced, but if you need to add contextual information to your logs (such as process or networking information), [logging][] provides a few different ways to do it. ## 5. `pathlib`: handle your file paths in a simple and consistent way Most scripts will likely include a decent amount of file handling, and since file paths are basically strings, there are plenty of low-level details to pay attention to. For example, to form a path from a directory specified by the user and a file name, you need to check if the path to the directory ends in `/` before joining them to avoid duplicating it (or trust that the users will always do what is expected, which we know is not a good option!). Also, different operating systems follow different standards, so your script will not be portable. The [pathlib][] module solves all those problems by providing an intuitive interface to deal with file paths. There's no need for paths to be represented as strings anymore. If you're familiar with `os.path`, you'll find [pathlib][] quite similar, but with some extra features that are really handy. Some useful features: - Create paths from strings by passing arguments to the `Path` or join them with `/`: for example, `Path('home/user', 'code', 'script.py')` will create the `Path` object `/home/user/code/script.py`, without the need to manipulate strings. - Read or write to files with a single line: `Path("my-file.txt").read_text()`. ## 6. `concurrent.futures`: speed up your code Many of your CLI apps will need to read multiple files or make many API requests. Using a loop is fine if you only need to make a few, but when they start piling up they'll really slow down execution. For example, imagine you need to scrape events from all your Github repositories. You create a repo list and the following functions: ```python import json import time import requests repos = ["my-first-app", "my-docs", "my-helper-scripts"] def get_repo_events(repo: str) -> list: url = f"https://api.github.com/repos/muripic/{repo}/events" response = requests.get(url) return json.loads(response.content.decode("utf-8")) def print_repo_events(events: list): for e in events: print(f"EVENT: Repo: {e['repo']['name']}, Type: {e['type']}") ``` You could do something like this: ```python start = time.time() for r in repos: events = get_repo_events(r) print_repo_events(events) end = time.time() print(f"Time elapsed: {end - start}") ``` This code will make each request, wait for the response, get the content and then move on to the next one. If you use threading, you can make many requests at once. Concurrency can be complex and Python offers many tools, but for simple cases like this the [concurrent.futures][] module includes a `ThreadPoolExecutor` that should be enough for your needs and is relatively easy to use. ```python start = time.time() from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor(max_workers=3) as executor: results = executor.map(get_repo_events, repos) for events in results: print_repo_events(events) end = time.time() print(end - start) ``` The `time.time()` statements are included only to measure how long each version takes. I encourage you to run these examples for your own Github repos and see the difference. If you want to learn more about concurrency and threading, check out these really cool tutorials from RealPython: - [Speed up your Python program with concurrency](https://realpython.com/python-concurrency/) - [Intro to Python threading](https://realpython.com/intro-to-python-threading/) ## 7. `typing` + `mypy`: improve code readability and prevent bugs Python is a dynamically-typed language, so there's no need to specify the type of each variable. However, the [typing][] module, introduced in version 3.5, allows you to add very flexible and informative type annotations that can improve the readability of your code. Type hints will contribute to cleaner code, but if some of the types are wrong or do not reflect the runtime behavior of your variables, you won't find out unless you also use [mypy][]. [mypy][] is a static code-analyzer that checks type hints to ensure that they are consistent. But why would you want to check typing? Well, just to give an example, imagine that you assume that a certain variable is of type `list` and you use the `append` method on it, only to find out at runtime, after an exception is thrown, that it's `None`. Not cool. [mypy][] will notice this and let you know, so you can prevent this kind of bug, which is extremely common. Some useful features: - `Union`: really handy when you want the type of your variable to be flexible but still restricted to a limited set of options. For example, if a function takes a path as a parameter, its type could be `Union[str, Path]`. - `Optional`: if you have a variable whose value could also be `None`, use it like this: `Optional[List]`. That way, if you try to use a method that won't work on `None`, [mypy][] will let you know so you can handle that case properly. - `Dict` and `List` embedded typing: dictionaries and lists in Python can have anything inside them, so for some complex variables it's very easy to forget what's supposed to be in there. For example, if you are using a dictionary as a registry for classes and methods, you might have something like this: ```python my_food_registry = { "noodles": { "class": , "method": }, "cookies": { "class": , "method": } } ``` The type hint for this object would be `Dict[str, Dict[str, Union[Type, Callable]]]`. This works as a reminder that its keys are strings and its values are dictionaries whose values can be functions (`Callable`) or objects (`Type`). If you accidentally include something else, [mypy][] will complain. Check out this [Real Python tutorial](https://realpython.com/python-type-checking/) for a complete guide on how to use [typing][] and [mypy][]. ## 8. `black`: format your code with a single command When many people are working on a project, their formatting styles will inevitably vary. You can even make a mess in your own code if you need to make changes quickly. To keep everything clean and readable, consistent formatting is key. And to be consistent *and* painless, it need to be automatic. [black][] does exactly that: it formats your code with "sensible defaults" so you can stop sweating the small stuff such as spacing and line length. Some useful features: - Although [black][] is meant to be opinionated and using the defaults should be fine, it provides configuration options, both through command line options or a configuration file. - You can pass directories to [black][] and skip certain files using regexes. [black][] will do everything for you except sorting your imports: to do that, use [isort][]. ## 9. `pipenv`: manage your environments and dependencies Have you ever been to [dependency hell](https://xkcd.com/1987/)? Even if you haven't, because scripts usually don't have enough dependencies to cause that kind of chaos, it's quite common to install your dependencies locally and then forget to add them to your requirements files (sometimes, you don't even have one!). To make life easier for others (and yourself, should you need to recreate that virtualenv), keeping dependencies in order is crucial. [pipenv][] does this for you: it creates a virtualenv and it generates a `Pipfile` with your dependencies. Every time you install a package, it is added there. Some useful features: - View your dependency tree with `pipenv --graph`: this can be very useful for debugging when you have dependency conflicts. - If you miss your requirement files (or do not want to force others to use [pipenv][]), you can generate them with `pipenv lock -r` and `pipenv lock -r --dev` (for the *dev-requirements*). - You can scan your dependency tree for vulnerabilities with `pipenv check`. If you want to take it to the next level, [poetry][] is what you are looking for. Not only does it manage your dependencies but you can also use it to build and package your app. ## 10. `cookiecutter`: automate project creation using templates By now, you're probably thinking that all this looks really nice but who has the time to install all these packages, create logger objects and create click commands for every single script? Is all this really worth doing over and over again? Fortunately, there is [cookiecutter][]. [cookiecutter][] is a tool that allows you to create Python projects from project templates. This will not only save you a lot of time and effort but also will enforce certain standards across your command-line tools, since every project will have the same structure. Some useful features: - There are plenty of [templates available](https://cookiecutter.readthedocs.io/en/1.7.2/README.html#available-cookiecutters), so there's a good chance that one of those suits your needs or can be used as a starting point (and as a bonus you'll surely discover new amazing tools in the process of finding the right one). - It's also a command line tool so you can get started by just running `cookiecutter PATH_TO_YOUR_TEMPLATE`. - You can create as many files and directory as you need (with limitless nesting). --- Hope you found this helpful to share with your team and start working together on building better ~~scripts~~ CLI apps! Happy (clean, effective and efficient) coding! --- *Acknowledgements*: I'd like to thank [Rodrigo Loredo](https://github.com/rloredo) and [Cesar Dutten](https://github.com/cdutten) for their very helpful and encouraging reviews, and [Carlos Duelo](https://github.com/carlosduelo) and [Javier Santacruz](https://github.com/jvrsantacruz), who taught me many of these thingsĀ :) [black]: https://black.readthedocs.io/en/stable/ [click]: https://click.palletsprojects.com/en/8.0.x/ [cookiecutter]: https://cookiecutter.readthedocs.io/en/1.7.2/ [concurrent.futures]: https://docs.python.org/3/library/concurrent.futures.html [isort]: https://pycqa.github.io/isort/ [logging]: https://docs.python.org/3/library/logging.html [logging cookbook]: https://docs.python.org/3/howto/logging-cookbook.html [mypy]: https://mypy.readthedocs.io/en/stable/ [pathlib]: https://docs.python.org/3/library/pathlib.html [pipenv]: https://github.com/pypa/pipenv [poetry]: https://python-poetry.org/docs/ [psutil]: https://pypi.org/project/psutil/ [python-fire]: https://github.com/google/python-fire [sh]: https://amoffat.github.io/sh/ [threading]: https://docs.python.org/3/library/threading.html [TOML]: https://github.com/toml-lang/toml [typing]: https://docs.python.org/3/library/typing.html