Python for teams
@ Openthought · Saturday, Jun 27, 2020 · 13 minute read · Update at Jun 27, 2020

If you’re a beginner or even if you’ve been doing it for years it’s easy to get lazy and just get by, on doing what you know. Churning out code or maintaining a creaking code base is a bit depressing, but relatively easy to do and you can plod along through that for a while. But there comes a time when it feels like you need to up your game. You can feel the technical debt racking up and you think there should be a better way to do this.

It’s time to be a responsible coding citizen. Even if you’re a team of one, at some point lazy choices will come back and bite you.

You should always code as if you’re in a team, even if that team is just you.

Documentation: Even a little is better than nothing

We don’t like doing it and it feels like it’s something that gets in the way of doing all the more important, urgent and let’s not be coy here, interesting things that could be done. But documentation is one of the things that grease the wheels of productivity once all the fun experimentation and architecture design are all but forgotten. You’ll always have to share your codebase with somebody unfamiliar with it even if it’s yourself in 6 months time when you have to fix something.

PyDoc and type hints

The first thing to start with is commenting your code and using inline documentation for tools like Sphinx to pick up and generate nice looking and searchable documentation from. My recommendation for python is to use the google style of pydoc comments. One of the main reasons I’ve found for putting clear documentation in the code is for language aware editors. This allows them to be able to pop up with the parameters list and description when typing a function name. And in that cramped pop-up view, the Google style pydoc looks the least bad.

So what does it do again?
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    def set_location(self, latitude, longitude):
        # type: (float, float) -> float
        """Sets the map location of the object.
        
        The latitude and longitude are set for the vehicle so it
        can be used for distance calculation.
        
        Args:
            latitude (float): latitude of the map location
            longitude (float): longitude of the map location
        """
        self._latitude = latitude
        self._longitude = longitude

The documentation is very useful for giving a clear idea of the data types that a developer should be passing to a function. But in addition to the developer documentation, type hinting comments give the intellisense/autocomplete features of the editor enough information to give class methods, parameters and properties as you type. All of this speeds up the development process, for yourself and people coming to the code cold.

Is it just the right type?
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
 from typing import List

...
    def __init__(self, model, colour):
        # type: (str, str)
        self._model = model                 # type: str
        self._colour = colour               # type: str
        self._latitude = None               # type: float
        self._longitude = None              # type: float
        self._gears = []                    # type: List[str]

You can help your code editor along with its Jedi mind trick of divining classes and types by using inline type hints during variable assignment. For collections and mappings use the typing module. Here’s a summary of the way to use type hints that are compatible with Python 2 and 3.

Getting it all up and running

With a reasonably complex project, the most difficult thing to do is getting another developer up and running with a working development environment so they can get onto the business of development. Writing a good setup guide and making clear what the pre-requisites are is crucial. Ideally, it would be supplemented by a bunch of setup scripts that help them install dependencies. When setting up a git repo, having a default README file is usually one of the options and for good reason, use it! It’s your project’s front page, make a good impression.

A bit of explanation

In addition to Sphinx generating all your API documentation it can also be used to create nice looking HTML from reStructured formatted text. The way it handles linking between pages and navigation menus is a bit obtuse and it will leave you scratching your head a few times. But it’s useful and it’s a lot quicker than writing your own HTML.

You don’t have to be the next Hemmingway to start writing documentation, just write as you would explain it to someone like yourself. Don’t over explain every detail if it’s based on common principles and libraries, but do explain what you’ve added and any architectural decisions you’ve made. And don’t worry too much about poor grammar and spelling mistakes as long as you can be understood it’ll be fine. If you’re really worried about your English, copy and paste into Grammerly to have that check it for you.

As they say, a picture is worth a thousand words, and in a lot of cases, diagrams are a life saver. I use Lucidchart for quick, easy and clear technical diagrams. I export the diagrams as PNGs and include them in the repo so they are part of any generated documentation. It’s amazing what a hand full of boxes and some arrows can convey.

Testing: If you can’t test it you can’t trust it.

We all know, or at least should know that writing test code is something we should all be doing, but like all good things from healthy eating to regular exercise, it’s not that fun. We all want to get on to the next fun piece of work and spend less time writing tests, but it just needs to be done.

Testing should be automatic

Automated testing is the only real way you know for sure that any changes you have made, do what they’re supposed to, and didn’t break anything it shouldn’t. It should not rely on somebody spending hours or even minutes going through a checklist of tests to make sure your code works. Tests should be run often and automatically. In the realm of CI/CD, this should be done on every commit to your git repo, every merge, and every tag.

If you’re new to automated testing it will seem like an absolute pain in the butt to start with. But start off small. Instantiate some of the smaller classes in your code, load in some initial data, assert some properties. It’ll be fun! Ok, not fun, but start small.

A Gitlab CI file that runs tests every time you check in code.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
image: python:3

stages:
  - test
  - install
  - documentation

services:
  - name: mailhog/mailhog
    alias: smtp
  
unit_test:
  stage: test
  script:
    - pip install -r requirements.txt
    - nose2 --verbose -s tests/unit -t .
 
integration_test:
  stage: test
  script:
    - pip install -r requirements.txt
    - nose2 --verbose -s tests/integration -t .

install_test:
  stage: install
  script:
    - python setup.py bdist_wheel
    - pip install dist/mymaillib-*.whl

docs:
  stage: documentation
  script:
    - pip install -r dev_requirements.txt
    - cd docs
    - make html

Sounds great, but it’s not for me…

A few common complaints from people not already on the testing train are, that they’re time-consuming and their code isn’t easy to test. Firstly the time will be spent anyway, it will either be up front in building automated tests or it will be pushed further down the deployment pipe to debugging production systems. And doing it up front is orders of magnitude more efficient. Secondly, if the code isn’t easy to test, then it is showing problems in the way the application code has been structured, and it’ll show cracks further down the line. Refactoring the application code to be easier to test will make it better.

How does this work?

One of the overlooked benefits of writing lots of test cases is that it becomes a helpful addition to your documentation. With higher level tests you can illustrate the entire setup, usage and best practice of how to use the application classes that have been created. One of the first stops along the guided tour you give new developers is through the automated tests, it should walk them through the usage in a verbose and clear manner. Test code is not like application code and should not be refactored, it’s there for stepping through and being explicit. It’s not meant to be optimised. And seriously, go nuts with the long variable names, pretend your a Java developer for a day. It’ll look ugly as sin, but it’ll be clear what the code is doing.

You’ve got it covered

After you’ve got yourself up to speed on getting a few tests in there, you’ll want to look at doing a coverage report. It’s exceptionally useful for working out how much of your application code is covered by your tests. You can’t test every possibility that your application will encounter but at least you can make sure every line executes as it’s supposed to. A test runner like nose2 can generate HTML coverage reports that can then be published as part of your automated testing. The coverage report can then be used as a test quality indicator, which you can then use to gatekeeper feature branches from going into master before they’re fully tested.

Automation: Once is an experiment twice is a script

Automate everything. If you find that you’ve run a command a few times to do something then it should probably be in a script. If you need to set up a development environment more than zero times, then you should script it. If you’re in a corporate environment and limited to using a Windows laptop for development then you could do a lot worse than look at Vagrant. It allows you to create a development environment running as a VM using VirtualBox and use all your fancy desktop code editors to actually develop with. Giving you the best of both worlds. I’ve had a few buggy moments with getting it to work reliably behind a corporate proxy so I’ve moved to an Ubuntu VM running a full desktop environment. It’s helpful if you have the horsepower on your laptop to do it, which isn’t always the case. There is a utopian future on the horizon with the windows subsystem for Linux (WSL) but it’s not there yet.

Let me set that up for you
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/bash

if [[ -f "/etc/redhat-release" ]]; then
  echo "Identified Redhat based release running setup"
  yum install -y epel-release
  yum install -y mongodb-server python2-pip
  pip install --trusted-host pypi.python.org --trusted-host pypi.org --trusted-host files.pythonhosted.org --upgrade pip
  systemctl enable mongodb
  systemctl start mongodb
  pip install -r requirements.txt
  pip install -r dev_requirements.txt
else
  echo "Unsupported Linux version not running setup"
fi

Creating golden image boxes is dangerous and should never be done, but everyone still does it. If you have servers that have been tweaked over time and everything is hand installed, you’re going to hit a world of pain when it comes time to add another node or reinstalling after a drive failure. With the use of Vagrant, Docker or just VMs you can iterate the setup scripts quite quickly without needing a herd of servers to test on. The best kind of setup script should leave you with an environment ready to go and be able to run all the automated tests you have set up.

The same setup script can then be used in your Vagrant setup, on a VM and used inside a Dockerfile when you’re ready to move into the world of containers.

I’ll just change that variable name…

Modularization: Make building blocks not cathedrals

One of the most common things to do when working on a project is to keep on adding classes and modules to it. Growing it larger and larger over time. This is a recipe for disaster.

Once the code base gets sufficiently large it becomes too big to maintain in one’s head. Once it reaches that size the cognitive load of working on it becomes stressful and that’s where mistakes, shortcuts and technical debt comes in. A goal of refactoring should be to remove as much functionality as is practical out of the core application and into reusable modules. The core application should handle the orchestration of modules and shouldn’t itself contain code that goes more than a couple of layers deep. Ideally, the modules should be kept in completely separate git repos to the main application.

The breaking up of software into modules gives a few advantages.

  • It creates conceptually manageable chunks of code that can be worked on without the complexity becoming overwhelming.

  • Having a distinct separation between a module and the core application necessitates clear documentation and demarcation of responsibilities.

  • The isolation of modules from the main application makes distributing tasks amongst a team a lot easier with any major merge headaches.

One benefit of breaking code into modules is that the effort put into developing those modules can be easily reused by other projects without having the huge dependency of the core application. In small companies with only one developer, it’s good practice. In a large company with multiple development teams, this can be hugely beneficial. Not all projects get to completion, and some have a limited lifetime. So being able to repurpose the work for other projects is extremely valuable.

Think! Does this function need to be part of the core? Or is it useful on its own?

Meetings are a waste of time!

Communication: Meetings! We don’t need no stinkin’ meetings!

It’s a reasonably accurate stereotype that a lot of developers err on the more introverted side and are especially prone to anti-meeting sentiment.

The general grumbles are, they’re long, they’re boring, a waste of time and I really should be doing something more important. All of those are more often than not true. Try and change the kinds of meetings you have.

Take a leaf from the Agile Scrum playbook and use meetings specifically, sparingly and in time-constrained ways. The principle is, that you can’t avoid having some kind of meeting so do it in a focused and limited way, and don’t waste anyone’s time.

You don’t need to jump straight into an agile styled revamping of your entire team’s communication, start small. A catchup meeting should last 15min. More detailed meetings can be longer but need a tight agenda and should be limited in scope. The more people there are the shorter the meeting should be, less likely to meander and go off topic. People’s time is precious, look around the room next time you’re in a yawner and see how many man-hours are being wasted. I run daily meetings with developers over a group video chat session for 15min and it involves developers in multiple locations in different time zones. If you want to work well in a team then you need to communicate, be aware of what other people are working on and share. Even if they were all in the same building, I’d still prefer a group video chat for dailies as it is less disruptive to their work cycle. But if you’re all on one floor, do a stand-up meeting, it’s a good excuse to get up and move about.

Be agile

Agile has been a buzz word for a while now and it has its detractors, but despite the hype, it can be highly effective at keeping a team happy and engaged. It’s difficult to know if it produces significantly better projects, but it does give more clarity and control to the developers. The use of time-limited sprints and clear gauging of workload reduces stress on development teams and helps them communicate with management better. Use something like Jira to manage software projects and it will feel less like you’re swimming in a sea of issues. You can actually see yourself making progress. Gotta love a souped-up Todo list!

In Conclusion

Working in a team, or just working like you’re working in a team is about not skimping on the detail and being generous to the next person to touch the code. Nine times out of ten the next person to touch the code will be you, so it’s the best way to make your own life a little bit easier.

Openthought.com

Open Thoughts
Articles for the technology minded

apache-spark career conflit containers data devops docker documentation download games getting-things-done git gitlab gtd helm home how-to inspire java javascript kubernetes management meeting microsoft office pandas programming pyspark python remote-working scala scripting spark teams tech4good tensorflow testing tutorial typing windows

Social Links