As applications monitoring software continues to evolve to handle ever-greater numbers of data points, as well as changes to how data is queried, and how it is stored, accessed, streamed, and maintained. Rather than utilize one of the many streaming-oriented database systems available today built to handle such tumultuous workloads, LogicMonitor has developed and released its time series database (TSDB) for its infrastructure monitoring service.
The question is: Why?
LogicMonitor co-founder and CTO Jie Song offered three main reasons behind this choice.
The first is performance. With the addition of its own TSDB, LogicMonitor can now support 800,000 insertions per second on a single node. This database also supports multivariable time series, which LogicMonitor reports will, for example, allow users to run potentially up to 7.5 metrics with an average of 6 million metrics per second.
“OpenTSDB and Sensu don’t support lossless compression. We use an adaptive compression technology, for floating numbers, we get a 10:1 compression ratio, and a higher ratio for other data types,” said Song. Adaptive compression technology changes compression algorithms based on the type of data being compressed, which allows for more efficient usage of system resources.
The second benefit to Logic Monitor’s time series database is its data model compatibility. While Logic Monitor has had a multivariable time series model since its inception, this approach allows users to complete schema-level operations with ease.
Song noted that LogicMonitor hopes to pave the way for the future of TSDB technology in monitoring, working on features such as time series correlation, forecasting, and stream-based analysis.
While its new TSDB creation has been exciting, LogicMonitor has also faced its share of challenges while doing so. These included tackling how to best work with a pod-based architecture when migrating customer data from one pod to another, large-scale schema changes, and improvements to query performance.
Getting the Finer Details
LogicMonitor’s shift away from uploading data to a traditional round robin database (RRD) is surprising. Traditionally, monitoring platforms have utilized RRD as their go-to approach for handling large quantities of data for over a decade. While some companies have also made avid use of SQL and NoSQL databases, LogicMonitor co-founder and chief product officer, Steve Francis explained that it’s not all sunshine and roses when working with RRD at scale.
“With RRD systems usually being set up to consolidate raw data after a day, graphs can now only show consolidated samples. Instead of looking at CPU, requests per second, API call latency, network traffic with a one-minute resolution, developers and administrators are now looking at consolidated data of 15-minute intervals. This makes it impossible to determine what came first at the start of the incident,” Francis said.
LogicMonitor’s TSDB was developed using Java with its garbage collection stealing the show for its type of data intensive application. “When full garbage collection happens, the whole application could hang for tens of seconds. We did a lot of work to handle this case by implementing our own slab allocation, using cuckoo hash to store large data sets, object pooling, and parameter tuning. Right now, the system triggers a full garbage collection every three months,” Song explained.
Working with unaggregated data offers a variety of options for monitoring today. Francis forecasts that the future for TSDB technology is bright, with LogicMonitor hoping to pave the way for future advancements in the technology. Threshold guidance, identifying time series streams correlated to a single user issue, and granular algorithm replay features are just a few things developers can expect as a result of LogicMonitor’s new TSDB-driven data sets.
LogicMonitor has no immediate plans to release the database software as open source.
Feature image via Pixabay.