Summarizing the average performance of a set of things under different loads is a particularly tricky thing. The correct way to summarize performance is to use the geometric mean instead of the arithmetic mean. The tricky part is that the difference between the arithmetic and geometric mean is only significant under a certain condition, so the impact of using the arithmetic mean instead of the geometric may not be painfully obvious. Let’s start with an example.
When the Geometric and Arithmetic Mean Look the Same
Let’s assume we have times for a Medium, Slow, and Fast product:
Medium | Slow | Fast | |
Test A | 5.66 | 3.43 | 8.33 |
Test B | 6.28 | 2.51 | 9.45 |
Test C | 58.78 | 31.12 | 89.83 |
We would like to say things like, “The Medium product is 100% faster than the Slow product”. To do this, we need to introduce normalized factors so that we can describe the running time of one product relative to a baseline product. In our example, if the baseline product is the Slow product, then we would add some columns for the ratios of the times relative to the time of the Slow product.
Medium | Med/Slow | Slow | Slow/Slow | Fast | Fast/Slow | |
Test A | 5.66 | 1.65 | 3.43 | 1.00 | 8.33 | 2.43 |
Test B | 6.28 | 2.50 | 2.51 | 1.00 | 9.45 | 3.76 |
Test C | 58.78 | 1.89 | 31.12 | 1.00 | 89.83 | 2.89 |
We’ll now compute and add the arithmetic and geometric means.
Medium | Med/Slow | Slow | Slow/Slow | Fast | Fast/Slow | |
Test A | 5.66 | 1.65 | 3.43 | 1.00 | 8.33 | 2.43 |
Test B | 6.28 | 2.50 | 2.51 | 1.00 | 9.45 | 3.76 |
Test C | 58.78 | 1.89 | 31.12 | 1.00 | 89.83 | 2.89 |
Arithmetic Mean | 2.01 | 1.00 | 3.03 | |||
Geometric Mean | 1.98 | 1.00 | 2.98 |
Here, we don’t see a any big difference between the arithmetic and the geometric means. This is because all of the products react a similar way to each test. However, when one or more tests break this pattern, then we see a discrepancy between the two means, and an inconsistency in the results of the arithmetic mean.
When the Geometric and Arithmetic Differ
In this example, we’ve changed the performance of the Slow and Fast products on Test B. The Slow and Fast products now take roughly twice as long as the Medium product on Test B.
Medium | Med/Slow | Slow | Slow/Slow | Fast | Fast/Slow | |
Test A | 5.66 | 1.65 | 3.43 | 1.00 | 8.33 | 2.43 |
Test B | 6.28 | 0.49 | 12.73 | 1.00 | 12.91 | 1.01 |
Test C | 58.78 | 1.89 | 31.12 | 1.00 | 89.83 | 2.89 |
Arithmetic Mean | 1.34 | 1.00 | 2.11 | |||
Geometric Mean | 1.15 | 1.00 | 1.92 |
Now, let’s consider using a different baseline. We would expect that using a different baseline would not change the overall picture of the performance. The next table uses Medium as the baseline, instead of Slow.
Medium | Med/Med | Slow | Slow/Med | Fast | Fast/Med | |
Test A | 5.66 | 1.00 | 3.43 | 0.61 | 8.33 | 1.47 |
Test B | 6.28 | 1.00 | 12.73 | 2.03 | 12.91 | 2.06 |
Test C | 58.78 | 1.00 | 31.12 | 0.53 | 89.83 | 1.53 |
Arithmetic Mean | 1.00 | 1.05 | 1.69 | |||
Geometric Mean | 1.00 | 0.87 | 1.67 |
Looking at the two tables in this section we see that, in the first table, the arithmetic mean claims that the Medium product is 34% faster than the slow product, but in the second table, using a different baseline, the Slow product is 5% faster than the Medium product, which is inconsistent. Note that the geometric mean does not exhibit this inconsistent behavior; it shows in the first table that the Medium product is 15% faster than the Slow product, and in the second table it shows that the Slow product is 13% slower than the Medium product.
Conclusion
When reporting the average of normalized values, use a geometric mean, or a weighted geometric mean. When reporting raw values, use an arithmetic mean, or a weighted arithmetic mean.
Reference:
Phillip J. Flemming, John J. Wallace. How not to lie with statistics: the correct way to summarize benchmark results