major collections
performance KB/s
* Test Overview

The objective of these tests is to place the performance of Ocamldap in perspective with other popular ldap libraries.

These tests were performed using two machines. One machine ran the ldap server, openldap, while the other machine acted as the ldap client. The machines were connected with a cat 5e cross over cable at a data rate 1Gb/s

Server Machine
Alienware area 51m laptop
Intel Pentium 4 @ 3.4Ghz 800Mhz FSB
1GB Coursair DDR400 CL2 RAM
linux 2.6.9-gentoo-r9 gcc 3.4
openldap 2.1.30

bdb-config (slapd.conf)
database bdb
suffix "o=test"
rootdn "o=test"
cachesize 10000000
checkpoint 65534 60
# Cleartext passwords, especially for the rootdn, should
# be avoid. See slappasswd(8) and slapd.conf(5) for details.
# Use of strong authentication encouraged.
rootpw secret
# The database directory MUST exist prior to running slapd AND
# should only be accessible by the slapd and slap tools.
# Mode 700 recommended.
directory /var/lib/openldap-data
# Indices to maintain
index objectClass eq
index cn pres,eq,sub

BDB config (DB_CONFIG file)
set_cachesize 0 268435456 1
set_lg_bsize 1048576
set_lg_max 10485760
set_flags DB_TXN_NOSYNC
set_flags DB_DIRECT_DB

Client Machine
Apple Powerbook G4
Motorola MPC7450 @ 800Mhz + 1MB L3 cache
Darwin 7.7.0 xnu-517.9.5

C Compiler used was gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)
Optimization level O2. Ocmal compiler used was Objective Caml 3.08+2, with -inline 1000 for native code. perl version 5.8.2 was used. All tests are the average of three trials, except perl which was just one. For Ocaml, tests using the byte code compiler are prefixed with "bytecode", all other tests used the native code compiler.
* Test 1: perl Net::LDAP NO IO
using Perl's Net::LDAP library just downloading each entry and throwing it away, not printing it at all
* Test 2: bytecode ocamldap 2.0.3 + IO optimizations
ocamldap 2.0.3 This is the stable version of ocamldap, the decoder is unoptimized, however the test.ml (ldapsearch) is using optimized IO functions.
* Test 3: bytecode ocamldap 2.0.3 + readbyte_of_ber_element + IO optimizations
ocamldap CVS (will be 2.1.0) decoder is heavily optimized, time spent garbage collecting has been reduced by a factor of 5.14, the overall speedup is 2.22
* Test 4: ocamldap 2.0.3
ocamldap 2.0.3 stable version, native code. Performance was not that bad to begin with, but there is a lot of consing. This run does NOT use optimized IO functions so a comparison with bytecode is impossible.
* Test 5: openldap 2.1.30 ldapsearch
ldapsearch from 2.1.30, purportedly (by some core developers) a huge pig, but not looking like a slouch on this chart.
* Test 6: ocamldap 2.0.3 + readbyte_of_ber_element
Decoding BER involves looking at the data stream in chunks, where each chunk is a BER element. The chunks, have chunks inside them, and so on. What we used to do is, copy each chunk, and nested chunk, etc out and deal with them individually. That involves making a lot of copies of the data stream, and burning through a lot of memory, making work for the garbage collector. What we now do is, deal with just ONE data stream, but build virtual chunks and nested chunks that APPEAR to be seperate elements, when they actually all use the same memory. For Native code (with no IO optimizations) this reduced the amount of memory we burn through by a factor of 3.74, and improved performance by a factor of 2.04. Wow! AND, because we didn't change the API at all, just the implementation, none of the significant elegance of the decoder has been lost.
* Test 7: openldap 2.2.20 ldapsearch
ldapsearch from 2.2.20 They got a huge performance improvement in 2.2, indeed they realized a speedup of 1.41, I wonder how much more headroom they have?
* Test 8: ocamldap 2.0.3 + readbyte_of_ber_element + IO optimizations
In the earlier test we were using print_endline and printf to actually output the entries. This test shows that there is room for a lot of improvement there. This time we wrote the entire entry into a buffer (which we allocated globally only ONCE) and then used Buffer.output_buffer stdout to print the entry, and Buffer.clear to prep the buffer for the next entry. Doing this allowed us to observe a speedup of 1.24, AND surpass openldap 2.2.20. This shows that using a static buffer and printing the entire entry as one syscall is superior for this entry size, however it may change with different entry sizes. (our test entries are relatively small). This test does show that the decoder in ocamldap is equivelant and possibly superior to the decoder present in openldap, which is quite an accomplishment. This shows without a doubt that Ocaml is perfectly suited to implementing network protocols, and network servers, and is being under applied to this very important field.
wedge Timings

timings of the two runs are also as interesting as the raw data rates. They provide some insight in the form of the user times. openldap actually consumes less user time than ocamldap, even though the real time is more. It makes up for this in system call time. This indicates that openldap could possibly be spead up further by optimizing its system call usage. Where as ocamldap is ideal for vendor unixes with unoptimized kernels because it makes very optimal use of system resources at some cost in user time.
* ocamldap 2.0.3 + readbyte_of_ber_element + IO optimizations

real 0m25.014s
user 0m22.260s
sys 0m0.770s
* openldap 2.2.20 ldapsearch

real 0m26.471s
user 0m16.140s
sys 0m8.170s