Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 0xack13/c617888bdfaaf331cbb6bdff0879a864 to your computer and use it in GitHub Desktop.
Save 0xack13/c617888bdfaaf331cbb6bdff0879a864 to your computer and use it in GitHub Desktop.
Article: Python libraries and tools to make your scripts suck less

Python libraries and tools to make your scripts suck less

If you're an SRE or DevOps engineer, you'll usually find yourself writing scripts to automate tasks. But a collection of scripts, probably written by different members of the team, with different standards and tools, will soon become unmaintainable and trigger conversations such as "Can the script do this?" or "How do I make this change?". Sometimes, you may even have those conversations with yourself because after a while you no longer understand your own code.

If you think of your scripts not so much as a set of commands run one after another but as command-line applications, which should meet certain standards and follow good coding practices, they'll become much more user-friendly and maintainable.

Disclaimers:

  • This list should be specially useful for people writing scripts from scratch. If you're working on a big project as a developer, it's likely that many of the tools I'll describe (or some alternative) are already in use. But if they aren't, keep reading!
  • Many of the libraries I'll list are extremely popular, and some of them are even part of the standard library. My goal here is not to introduce them but to explain why you should be using them.

1. PyYAML: load your configs and data from yaml files

In the infrastructure as code world, YAML is everywhere, and you and your team are probably very familiar with it. So why not use it in your CLI apps? For example, instead of passing arguments that are not very likely to change with every run (such as auth credentials for an API), you can create a YAML configuration file and load that info from there. If your CLI app requires certain data (an example could be setting up monitoring alerts, with all their parameters), you can also define it in YAML format. That way, people used to dealing with infrastructure definitions will find the data easier to read and modify.

Some useful features:

  • yaml.load: loads YAML from a file and converts it to a Python dictionary.
  • yaml.dump: creates a YAML document from a Python dictionary.

You can find a detailed tutorial and the full list of features in the official documentation.

Don't like YAML? Check out TOML and its Python library.

2. click: create intuitive command-line interfaces

Everybody's using click, and there's a reason for that. Not only does it make developing command-line applications much easier than doing it with the argparse package, but also it's so flexible and has so many features that a whole book has been written about it. Moreover, its official documentation is very clear and complete.

Some useful features:

  • click.options and click.arguments: you can use these to add options and arguments to your app with plenty of cool features such as autogenerated help strings, type validation, etc.
  • Nested commands: this is a game changer in turning your scripts into a full-blown CLI app.

If click looks a bit overwhelming, see python-fire.

3. sh: run shell commands as functions

In many cases, you'll need to run shell commands from your script/app. The subprocess module is the obvious choice, but there are other alternatives that can make your life easier (and your code cleaner). As their authors state, sh is a "full-fledged subprocess replacement for Python 2.6-3.8, PyPy and PyPy3 that allows you to call any program as if it were a function".

For example, compare how you can get the following output with subprocess vs sh:

$ ls -l
total 32
drwxrwxr-x 8 user user 4096 jul  7 08:22 Code
drwxr-xr-x 2 user user 4096 mar 29 19:02 Desktop
drwxr-xr-x 9 user user 4096 jul  4 07:27 Documents
drwxr-xr-x 7 user user 4096 jun 29 21:45 Downloads
drwxr-xr-x 2 user user 4096 mar 29 19:02 Public
drwxr-xr-x 3 user user 4096 mar 29 19:25 snap
import subprocess

p = subprocess.Popen(
    ["ls", "-l", "/home/user"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)
output = p.communicate()
print(output[0].decode('utf-8'))
from sh import ls
output = ls("-l", "/home/user")
print(output)

As you can see, the same result can be achieved with fewer lines of code and it's more readable since the commands look like more functions. It provides a much simpler interface to interact with the shell.

Some useful features:

  • Baking: if you're using certain arguments over and over again with the same command, you can "bake" them into the command to generate a new one.
  • It also supports piping.

If you mostly need to run commands to monitor and manage system resources, psutil could be a better choice.

4. logging: get better execution and debug info

As the script runs, it's very important to know what it's doing, so if it fails or produces an unexpected result you can have information about what happened. You may be tempted to use print statements, since they are friendly and straightforward. Using the python logging module, however, is a better option. You can find a complete tutorial in the official Python documentation: the logging cookbook.

Some useful features:

  • Levels: you can log at different levels, which allows you to add very detailed messages but choose at execution time if you want to see them all or just the most important ones (for example, adding a --debug flag).
  • Handlers: you can define multiple handlers that allow you, for example, to log to the console and to a file.
  • Contextual information: this is a bit more advanced, but if you need to add contextual information to your logs (such as process or networking information), logging provides a few different ways to do it.

5. pathlib: handle your file paths in a simple and consistent way

Most scripts will likely include a decent amount of file handling, and since file paths are basically strings, there are plenty of low-level details to pay attention to. For example, to form a path from a directory specified by the user and a file name, you need to check if the path to the directory ends in / before joining them to avoid duplicating it (or trust that the users will always do what is expected, which we know is not a good option!). Also, different operating systems follow different standards, so your script will not be portable.

The pathlib module solves all those problems by providing an intuitive interface to deal with file paths. There's no need for paths to be represented as strings anymore. If you're familiar with os.path, you'll find pathlib quite similar, but with some extra features that are really handy.

Some useful features:

  • Create paths from strings by passing arguments to the Path or join them with /: for example, Path('home/user', 'code', 'script.py') will create the Path object /home/user/code/script.py, without the need to manipulate strings.
  • Read or write to files with a single line: Path("my-file.txt").read_text().

6. concurrent.futures: speed up your code

Many of your CLI apps will need to read multiple files or make many API requests. Using a loop is fine if you only need to make a few, but when they start piling up they'll really slow down execution.

For example, imagine you need to scrape events from all your Github repositories. You create a repo list and the following functions:

import json
import time

import requests

repos = ["my-first-app", "my-docs", "my-helper-scripts"]


def get_repo_events(repo: str) -> list:
    url = f"https://api.github.com/repos/muripic/{repo}/events"
    response = requests.get(url)
    return json.loads(response.content.decode("utf-8"))


def print_repo_events(events: list):
    for e in events:
        print(f"EVENT: Repo: {e['repo']['name']}, Type: {e['type']}")

You could do something like this:

start = time.time()

for r in repos:
    events = get_repo_events(r)
    print_repo_events(events)

end = time.time()
print(f"Time elapsed: {end - start}")

This code will make each request, wait for the response, get the content and then move on to the next one.

If you use threading, you can make many requests at once. Concurrency can be complex and Python offers many tools, but for simple cases like this the concurrent.futures module includes a ThreadPoolExecutor that should be enough for your needs and is relatively easy to use.

start = time.time()

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(get_repo_events, repos)

for events in results:
    print_repo_events(events)

end = time.time()
print(end - start)

The time.time() statements are included only to measure how long each version takes. I encourage you to run these examples for your own Github repos and see the difference.

If you want to learn more about concurrency and threading, check out these really cool tutorials from RealPython:

7. typing + mypy: improve code readability and prevent bugs

Python is a dynamically-typed language, so there's no need to specify the type of each variable. However, the typing module, introduced in version 3.5, allows you to add very flexible and informative type annotations that can improve the readability of your code.

Type hints will contribute to cleaner code, but if some of the types are wrong or do not reflect the runtime behavior of your variables, you won't find out unless you also use mypy. mypy is a static code-analyzer that checks type hints to ensure that they are consistent.

But why would you want to check typing? Well, just to give an example, imagine that you assume that a certain variable is of type list and you use the append method on it, only to find out at runtime, after an exception is thrown, that it's None. Not cool. mypy will notice this and let you know, so you can prevent this kind of bug, which is extremely common.

Some useful features:

  • Union: really handy when you want the type of your variable to be flexible but still restricted to a limited set of options. For example, if a function takes a path as a parameter, its type could be Union[str, Path].
  • Optional: if you have a variable whose value could also be None, use it like this: Optional[List]. That way, if you try to use a method that won't work on None, mypy will let you know so you can handle that case properly.
  • Dict and List embedded typing: dictionaries and lists in Python can have anything inside them, so for some complex variables it's very easy to forget what's supposed to be in there. For example, if you are using a dictionary as a registry for classes and methods, you might have something like this:
my_food_registry = {
    "noodles": {
        "class": <class '__main__.Noodles'>,
        "method": <function Noodles.boil at 0x7fb6ab538670>
    },
    "cookies": {
        "class": <class '__main__.Cookies'>,
        "method": <function Cookies.bake at 0x7fb6ab538700>
    }
}

The type hint for this object would be Dict[str, Dict[str, Union[Type, Callable]]]. This works as a reminder that its keys are strings and its values are dictionaries whose values can be functions (Callable) or objects (Type). If you accidentally include something else, mypy will complain.

Check out this Real Python tutorial for a complete guide on how to use typing and mypy.

8. black: format your code with a single command

When many people are working on a project, their formatting styles will inevitably vary. You can even make a mess in your own code if you need to make changes quickly. To keep everything clean and readable, consistent formatting is key. And to be consistent and painless, it need to be automatic. black does exactly that: it formats your code with "sensible defaults" so you can stop sweating the small stuff such as spacing and line length.

Some useful features:

  • Although black is meant to be opinionated and using the defaults should be fine, it provides configuration options, both through command line options or a configuration file.
  • You can pass directories to black and skip certain files using regexes.

black will do everything for you except sorting your imports: to do that, use isort.

9. pipenv: manage your environments and dependencies

Have you ever been to dependency hell? Even if you haven't, because scripts usually don't have enough dependencies to cause that kind of chaos, it's quite common to install your dependencies locally and then forget to add them to your requirements files (sometimes, you don't even have one!). To make life easier for others (and yourself, should you need to recreate that virtualenv), keeping dependencies in order is crucial. pipenv does this for you: it creates a virtualenv and it generates a Pipfile with your dependencies. Every time you install a package, it is added there.

Some useful features:

  • View your dependency tree with pipenv --graph: this can be very useful for debugging when you have dependency conflicts.
  • If you miss your requirement files (or do not want to force others to use pipenv), you can generate them with pipenv lock -r and pipenv lock -r --dev (for the dev-requirements).
  • You can scan your dependency tree for vulnerabilities with pipenv check.

If you want to take it to the next level, poetry is what you are looking for. Not only does it manage your dependencies but you can also use it to build and package your app.

10. cookiecutter: automate project creation using templates

By now, you're probably thinking that all this looks really nice but who has the time to install all these packages, create logger objects and create click commands for every single script? Is all this really worth doing over and over again?

Fortunately, there is cookiecutter. cookiecutter is a tool that allows you to create Python projects from project templates. This will not only save you a lot of time and effort but also will enforce certain standards across your command-line tools, since every project will have the same structure.

Some useful features:

  • There are plenty of templates available, so there's a good chance that one of those suits your needs or can be used as a starting point (and as a bonus you'll surely discover new amazing tools in the process of finding the right one).
  • It's also a command line tool so you can get started by just running cookiecutter PATH_TO_YOUR_TEMPLATE.
  • You can create as many files and directory as you need (with limitless nesting).

Hope you found this helpful to share with your team and start working together on building better scripts CLI apps! Happy (clean, effective and efficient) coding!


Acknowledgements: I'd like to thank Rodrigo Loredo and Cesar Dutten for their very helpful and encouraging reviews, and Carlos Duelo and Javier Santacruz, who taught me many of these things :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment