Wednesday 30 November 2011

The End of Multi Threaded Programming

I warn you: this is a provocation. I really want to have replies to this post. Because I'm talking about one of most famous themes for computer-science students: threads and multithreading programming.
A special thanks to Andrea "MEgrez" Talon whom solved some doubts I had about Node.js



Why Threads

First, what's a thread? A thread is parallelization inside a process. Why threads were created? The answer is funny: to resolve fork's low performances. There were many tries to improve fork: Apache's web server used preforking, then with Apache 2.0 it uses threads.

Felix von Leitner wrote a very interesting document about various techniques to improve servers performances.

Why Threads are Hard

Threads are hard and even good programmers have difficulties to write multithreaded applications. Why?

  • Synchronization: it's foundamental to manage a good synchronization among threads. You must protect critical-section. Doing it with Java is easier (it has monitors), but in C/C++ it's harder
  • Debug: most of unusual software bugs, come from error in multithreading. Also, just think about how hard could be to reproduce an error condition in a multithreaded program: it means to recreate crash condition, with threads executed in the same sequence in the same time. How hard it could be?

Enter Node.js

Some servers are really fast and easy to write. They're mono-thread servers whom manages multiple connections using select system call. These servers work with many non-blocking sockets. When a event happens on a socket, select calls the function you wrote to manage a single connection. There's no critical-sections, 'cause connections are executed once at time. It's not portable, but a particular library, libevent, grant this.
Node.js is V8 Javascript engine merged with libevent. It's a event-based I/O manager. So you can:

  • Write a server in Javascript
  • It will be fast, 'cause it hasn't threads, neither forks
  • Memory usage is constant in time
  • V8 is one of most efficient Javascript interpreters on earth

Node.js isn't the only event-based I/O manager. There's also Twisted and gevent for Python enthisiasts. But it seems there's more hype around Node.js

So, the meaning of this post?

Threads, mainly invented for performance reasons can't compete aganist simpler solutions like Node.js. If I should create a new server, probably I'll use Node.js or Twisted instead of a custom implementation using Java threads or another thing.
But, pay attention! Node.js is mono-thread, so it serves a single request once at time. For small requestes (a HTML page, a message, coordinates) this approach is unbeatable. But if you're working with heavy transmissions (e.g. files of many hundreds of MB), every client will be served once at time and will wait until all clients before it will be served sequentially. This means, in a scenario where 5 clients ask for a file with download during 10 seconds:

  1. first client will be served immediatly
  2. second in 10 seconds
  3. third after 20 seconds
  4. fourth after 30 seconds
  5. last after 40 seconds

You see: in this situation a multithreaded server is the only choice.

Conclusions

Node.js (and other similiar works) seems the best and easiest way to realize efficient real-time message servers. Obviously, it's not good for every problem. But in these days of tweets, status updates, embedded chats and, in general, of small messages, this approach is very promising.

No comments: