2010-09-17 21:42:19
GNU grep speed comparison fixed strings
sfgrep was designed especially for searching log files for fixed strings. After a bugfix some tests with gigabytes of data must be made. GNU (e)grep was invoked with files directly as arguments. The same task was realized with multiple invocations of sfgrep and an additional cat-process.
The logfile sizes were about 540 MB when running the comparison:
root@log:~ > time egrep -h -v \ 'disconn|connect|localhost|timeout|2010-09-17T1[3456]' \ /var/log/cluster/mail*/*/postfix/smtpd |wc -l 217649 real 109m38.380s user 3m53.587s sys 105m9.866s
Ooops! Most of the time I use sfgrep and now I wondered if GNU grep would ever finish its task. But now give the little intruder its chance:
root@log:~ > time cat \ /var/log/cluster/mail*/*/postfix/smtpd \ |sfgrep -v disconn |sfgrep -v connect \ |sfgrep -v localhost |sfgrep -v timeout \ |sfgrep -v 2010-09-17T13 |sfgrep -v 2010-09-17T14 \ |sfgrep -v 2010-09-17T15 |sfgrep -v 2010-09-17T16 \ |wc -l 217649 real 0m13.734s user 0m4.272s sys 0m2.460s
sfgrep used open()/read() and has no alg like BMH or BM, but it is approx. 480 times faster than GNU grep. Funny thing. :-)
Don't trust them if they tell you "why GNU grep is fast". Always trust your stop watch and never just believe that your code is fast.