The performance optimization made early today morning (see previous post) was a major success; some testers claimed cpu usage drop from 62% down to 28% (P2/200Mhz), and I have been unable to detect any regressions whatsoever inherent from the optimization. Seems you can have the cake and eat it too :)
Gaining confidence from that success, I decided to introduce two more low-level optimizations. The first one is in Scheduler layer, and it tweaks the bandwidth distribution algorithm between sockets slightly, so that the average size of a low-level send()/recv() call is now up to ten times larger than it was before (from 150 bytes to 1500 bytes). This should reduce the TCP overhead considerably.
The second one was done at SocketWatcher, where we call select() every event loop. The trouble with select() is that it returns immediately as soon as ANY of the polled sockets receive an event; however this means with many sockets, we perform the expensive operation of adding all sockets to the list to be passed to select(), and then select() returns almost immediately, saying "socket #10 is readable", and we go over the entire thing again. Now I introduced a 50ms sleep cycle right BEFORE the select() call, thus forcing a delay into the main event loop. This has the effect of having select() return events on multiple sockets during one call, reducing overall CPU usage further.
I also discovered a memory leak in Boost.DateTime library, namely in ptime stream output operator. This caused the current GUI to leak roughly 4kb/s memory; this also causes the core to leak quite large quantities of memory when enabling trace masks (since the times are written to hydranode.log). Currently there's no available fix, since this is deep inside the Boost.DateTime library; however it's effect on release builds is minimal, since those perform very little logging.
Madcat, ZzZz