BTrDB: Berkeley Tree Database


BTrDB uses a K-ary tree to store timeseries data. The leaves of the tree store the individual time-value pairs. Each internal node stores associative statistics of the data in its subtree; currently, the statistics are the minimum value, mean value, maximum value, and number of data points. BTrDB uses the internal nodes to accelerate processing of these statistical aggregates over arbitrary time ranges. The statistics at an internal node, by virtue of being associative, can always be computed at a node using only the statistics stored at that node's immediate children. As new data arrives or existing data changes, the relevant statistics can be quickly recomputed by traversing nodes up to the root of the tree.

BTrDB is currently implemented in Go, an open-source programming language with built-in concurrency support that compiles to machine code. It can be configured to store data either using your computer's filesystem, or using Ceph, a distributed object store, that handles data replication and recovery. Raw data and statistical aggregates can be queried via either HTTP or Cap'n Proto.

By providing efficient computation of statistical aggregates, BTrDB enables a new class of algorithms for timeseries data analytics that scale logarithmically in the size of the data. It also facilitates powerful and responsive techniques for visualizing data, that scale linearly in the size of the visualization, instead of in the size of the data being visualized. You should try it out!