Thursday, 17 November 2022

Hands up! Measure! And keep it simple!

At the end of this post I will expose an example of a speedup of 8000x. This means that a program that at first took almost five hours to complete, ended up in two second.
A ratio of 8000:1.
Pretty impressive!

Wait! What?!

In my previous post I discussed how C++ can be a clear language if the right constructs are used and if readability is considered a priority in code production.
Unfortunately, it's common to do preemptive optimization, trying to get the fastest possible algorithm at version one. I did the same mistake too, many times.
Once I made a simple shooter game in C. It was something really simple and I developed it on a 2.5GHz machine. So, not exactly the kind of hardware Zero~Wing was supposed to run. But my awesome code, made in C, with -O2 optimization, with structs, malloc, free and OpenGL begun to slow horribly with fifty (50!) sprites on screen.
And on the very same hardware, Quake 3 run perfectly, Jake 2 was a charm and even some 3D games on browser didn't show any slowdown.
So, yes: Java and JavaScript were faster than pure C.

What the fish?!

Well, if you wrote bad C code isn't that surprising!
It's true that C gives you the fastest1 implementation of an algorithm, but my code wasn't correct. To find the mistake I had to understand where the program was slow. I began logging the duration of some functions using the SDL_GetTicks. After some attempts, I was able to find the culprit.
Can you believe it? It was the malloc.
I assumed malloc was a fast operation and I called it for every new sprite needed to be displayed, calling free every time a unit was destroyed. But what malloc does is requiring for a precise amount of memory to the operating system and this operation is relatively slow. Repeated for fifty times it accumulated a sensible delay on the execution.
Pinpointed the problem, I created an array of ~Sprite~s, using them when required. And guess what? My performances raised so good I had to increase the time of some delays.

Prototype. Measure. Repeat.

At the begin of this article I teased you with Antonio Larrosa's impressive work: speeding up his Bard music manager by a factor of 8000.
I suggest you to read the full article because it's really a great piece. What I can anticipate is how Mr. Larrosa worked.

Version Speedup Key Changes Execution time
1 1x Python program 5 hours
2 2x Better comparison algorithm 2 hour 30 minutes
3 127x Refactoring parts in C++17 2.36 minutes
4 1897x Introducing threads 9.48 seconds
5 2442x Copying data from std::map to a std::vector 7.37 seconds
6 6552x Enabling optimization flags on the compiler 2.74 seconds
7 7714x Updating from gcc 7.3 to gcc 8.1 2.33 seconds
8 7998x Use just std::vector instead of \\ std::map (no copy operations) 2.25 seconds

Five hours compared to two seconds.
I wonder how much energy is saved and how much carbon emission. But that's material for a new article.

Science is measuring

The impressive results of optimizing Bard should not mislead you. In the article, Mr. Larrosa says:

[…]the most important note: Measure everything. You can’t improve what you can’t measure (well, technically you can, but you won’t know for sure).

Don't be a bad boy!

The path to optimization is measuring. This will allow you to understand which part is slow and why.
Bard needed to be faster: five hours is truly too much. To measure this, the best option is to use a profiler. This will tell you how long functions and methods last of their calls, how often they are called and how much memory they require. Calculating time-deltas of random code-blocks with

exec_time = GetMillisecs() - start_exec;

is to a profiler as printf is to a debugger: good for a one-shot check, but not for a serious analysis.

So:

  1. Clean code: Writing a clean, with functions and methods doing one thing and doing it well, will make the code more readable and easier to be modified. Also, this will make more granular and precise the work for the…
  2. Profiler: using a profiler will help you to find the bottleneck of your code and decide what must be optimized
  3. Efforts: do you really need more speed? In Bard, from v.6, each improvement was measured in fractions of seconds. Were they necessary? Were they not? It depends.

What does it means, it depends? Well, how many times you will run that program this week and how much total time will take? What's the size of the input and how is gonna affect the execution time? 20.000 records analyzed in 1.5 seconds are probably a good performance; 200.000 will take 7.5 seconds and can be still acceptable if run once a week. But if that program runs on your server and you have ten thousands calls of 5k every second, then it's clearly insufficient and we must work a bit to improve speed.

Software guys are always eager to write super-fast code, it's in our DNA. But this mislead the production of good code to deliver something that "we think it will be fast". That's not computer science, but computer folklore. Optimization must be done after measuring and after evaluating if it will be useful.
Prototype, measure and evaluate. Make many version, improve them!

And may Mr. Larrosa work inspires you.

Footnotes:

1

This assumption is a bit folkloristic too. Maybe some C++ or Rust compilers can optimize even more. We should measure!

No comments: