Notes on Bro 0.8a58 vs. 0.8a70

To help determine the differences in behavior of 0.8a58 and 0.8a70 .  While there were no direct indications that significant internal changes were made from the CHANGES file, it seems that something has changed which provides for less thrashing in both cpu usage and lag time (explained below).

The configuration of the two instances was identical, except for the addition of two policy files "flag-warez.bro" and "flag-irc.bro", neither of which makes a significant memory or cpu impact.

Also note that there were several changes to the base bro config. See here for details.

The data was collected from a single window with each running at the same time, so it should be treated with the skepticism that such a limited viewing deserves.

Memory Footprint
As seen here, the memory allocation pattern seems similar, but the overall use is significantly lower.

mem
The units on the vertical axis are KB, while the bottom are multiples of 5 second intervals.

CPU Footprint
In general, 0.8a70 took a higher cpu load than 0.8a58 .  What is interesting about the graphs are the consistency of usage - in the older version the average usage is lower (as seen by running top while both were running), but the variation is significantly different in the two versions.

In the new version:


compared to the older version:



note - I retained the line representation of data to provide for a better envelope effect.  Dots or hashes end up being striated based on the large number of data points vs the 100 possible values.  What is interesting is the relative stability of the cpu load on the new vs old versions.

Lag Time
Lag time is the difference between 'clock time' and the timestamp recorded in the pcap data structure.  The larger the value, the more bro has backed up computationally - ie a large value implies that packets are backing up somewhere in the analysis line.  Such behavior need not be indicated by a large cpu value given the significant amount of IO and interrupt behavior of the application.

Here I removed the lag data which occurred during and after the check point time.  Not only is it not particularly useful, but the values are several orders of magnitude over the data that we want to look at and swamp out the other data.


Here the units on the vertical are seconds.

The difference in lag time between the two versions is quite interesting.  I suspect that whatever caused the smoothing out in cpu usage, may also be responsible for this behavior(?).