Saving £millions for the NHS with Pandas
@ Openthought · Thursday, Jun 25, 2020 · 5 minute read · Update at Jun 25, 2020

There has been an open data initiative in the UK since 2010 when was created. After 9 years we now have a huge amount of browseable datasets on the website that can be downloaded and used for your own analysis.

One of the larger data sets available on is the GP Practice prescription data coming in at around 10 million rows of data every month. This is a lot of data for your average spreadsheet to handle, so this is where tools like Pandas comes in. Pandas is a data analysis library for python that can handle many millions of rows of data and run statistical analysis on them to try and extract useful information.

So what did we find out?

The total prescription expenditure for the NHS across the UK is approximately £700 million a month. It varies up and down, month to month, but it stays around the same value. Of that £700 million, just over half that expenditure is spent on generic drugs and the remainder is spent Branded drugs. The full month by month analysis is documented here.

To be clear the average cost to the NHS of Branded and generic drugs is very similar, and in a lot of cases drugs labelled as branded are cheaper than the equivalent generics. But this average hides a great deal of detail.

Total Prescription costs by month

If we look at the raw data you’ll see how it is presented and the information we can extract from it. The Primary care trust and the individual prescribing practice are useful information and will be used in future analysis on geographic distribution, but will be ignored for this analysis. The principal information for this analysis is the BNF Code, actual cost and the quantity.

First few rows of the raw prescription data available.

BNF What?

The BNF or British National Formulary has been a foundational piece of medical information near the hand of doctors since the first edition in 1949. It is published in book form every 3 years and is available in multiple digital formats. Part of this publication is a coding standard that describes every medicine delivered to patients through the National Health Service. It follows a standard hierarchical structure which can be useful for analysing the dispensing of drug groups, there is an excellent resource on this from the University of Oxford.

One of the features of the BNF coding is that every code describes a method to get to the BNF coding of the equivalent generic drug. The last two letters of the BNF code give a strength and formulation code for the generic equivalent. e.g

  • 040702040BIAC(AM) — Tradorec XL Tablets 300mg
  • 040702040AA(AM)(AM) — The equivalent generic version

The challenge we set ourselves

Can we save the NHS a significant amount of money by switching a handful of drugs to using generics instead of the branded alternatives? Our source information for calculating this is all contained in this single dataset.

Using this singular data set we are limited to:

  • Consider only generics that have already been prescribed by at least one NHS prescribing practice.
  • Use the existing pricing for branded and generic drugs as it has already been costed by the NHS.

How we did the analysis

We took a recent dataset selected at random in this case October 2018 and we processed the data in the following ways:

  • We generated the equivalent generic code for every item and flagged if it was already a generic drug.
  • We created a sum of all the quantities and all the costs for each generic BNF code, one set of sums for the branded equivalents and another for the generic version.
  • We calculated the unit cost difference between the equivalent branded and generic drugs and multiplied it by the quantity of branded drug prescribed.
  • This created an excess cost calculation for every generic drug available where a branded drug was also being prescribed.

The results

If all possible substitutions of generic for branded were made it would save the NHS £11,347,859.11, just over £11 million pounds a month. This was extracting all possible savings for all possible products. If we just took the 10 largest cost-saving drug substitutions and calculated a total for just substituting these branded items we would save £5,469,671.30, or approximately £5.5 million a month.


To be clear I AM NOT A HEALTH PROFESSIONAL. The cost savings we have outlined here are just a numerical analysis and have not taken into account any medical decisions that a prescribing or dispensing practice may have made.

Considering the amount spent by the NHS every month on prescriptions and we were only able to find a 1–2% saving on prescription costs makes me think that they have been doing a pretty good job at keeping drug costs down and should be highly commended for the work they have already done.

For the full analysis and commented code used for these calculations, the project can be found at

Directions for further analysis

This is just scratching the surface of the prescribing dataset and there is plenty that can still be done. Here are some of the future analysis I’m considering for this dataset:

  • Using the geographical information in the dataset to create a heat map of health condition prevalence. e.g. anti-depressants, statins, diabetes etc.
  • Identify the practices that consistently favour branded over generic drugs.
  • Using the US and SNOMED datasets to calculated how much more the US Medicare system is paying for equivalent drugs compared to the NHS.
  • Identify branded drugs where generics are available but not currently being used in the NHS.

Open Thoughts
Articles for the technology minded

apache-spark career conflit containers data devops docker documentation download games getting-things-done git gitlab gtd helm home how-to inspire java javascript kubernetes management meeting microsoft office pandas programming pyspark python remote-working scala scripting spark teams tech4good tensorflow testing tutorial typing windows

Social Links