Amid the flurry of new cloud logging vendors employing machine learning, Logz.io adds crowdsourcing to the mix.
“I’m not going to go through a terabyte of data a day just to find three log lines that are problematic. I’m not going to do it manually, but the system can. We’ll say, ‘Here’s three lines out of your data that you should look at and here is why you should,’” explained Asaf Yigal, co-founder and vice president of product at Logz.io.
The software will then direct you to discussion threads online about that issue and by monitoring your reaction to its suggestions, adjust its algorithm to learn to provide better insights in the future.
It’s no secret that IT operations and security teams are overwhelmed by the amount of data being produced in their monitoring and logging solutions. A survey from cloud access security broker Skyhigh Networks found the average enterprise deals with 2.7 billion cloud events per month, yet only 2,500 of those events constituted an “anomaly,” and just 23 of them were actual security incidents.
The cloud logging market has become crowded with entrants including Sumo Logic, Rapid7, which bought out Logentries; Graylog, Loggly, Papertrail, which was acquired by SolarWinds; LogDNA, Anodot, MoogSoft and Sematext.
Most have jumped on the machine learning bandwagon, Fixate IO analyst Chris Riley told InApps previously, though most machine learning capabilities have been merely smart statistics.
Yigal takes issue with the whole approach of monitoring for anomalies.
“[That approach says] if my load is increasing, this an anomaly and I should alert about it. If the CPU is increasing, I should alert about it. If I see an exception I’ve never seen before, I should care about it and get an alert about it. The problem with that approach is that IT environments are one big anomaly,” he said.
“A load can increase because I have more customers, and that’s a good thing. A problem can be because of a test that runs every day at 4 a.m., and that’s a good thing. Just because something is an anomaly doesn’t mean it’s an issue. And issues don’t always represent themselves as an anomaly, so this approach was not conducive to solving the problem.”
Its core offering is hosted ELK stack (Elasticsearch, Logstash and Kibana) with enterprise-grade features such as role-based access, multifactor authentication and SOC 2 Compliance certification. It also offers a suite of free ELK apps, such as S3 Bucket Access, MySQL-monitor, Nagios critical alerts visualization and more.
In August, it added its proprietary Cognitive Insights layer that combines supervised machine learning and human interaction with data. It says it works in a way similar to Google’s PageRank algorithm and Amazon’s product recommendations for log data.
“We believe the way to solve the problem of data overloading the IT environment is to combine human intelligence, which knows how to ask the questions to begin with, with machine power to process millions and millions of threads all over the world and understand the context of them — which product is it referring to, how critical it is, and so on,” Yigal said.
“We have about 30,000 different insights from different systems, such as a security insight that something is crawling a bunch of sites with security signatures, and IT operations from different products like SQL Server, Elasticsearch, all the well-known web servers …”
The system gathers information from user forums, Google groups, GitHub and other discussions and matches them in real time to things in your logs.
“[For instance] we search your logs all the time for all the possible SQL exceptions; once something hits, we tell you this is something you should know about and here are a few discussion threads about this issue. You can ask anything you want to know. … What we’re doing is asking all the questions anyone has ever asked about your data in real time and giving you insights about what should be interesting in your data,” he said.
The system is also learning about how people actually interact with the insights. So if the system says ‘This is important’ and for you, it really isn’t, the algorithm is adjusted accordingly. If you set up alerts about it, though, or forward the insight, for instance, it gets positive reinforcement and continues work on that insight.
Founded in late 2014 by Israelis Tomer Levy and Yigal, Logz’ engineering team remains in Tel Aviv, though it has a marketing and sales office in Boston.
Logz maintains it can help customers find and mitigate problems from log data before they become issues that affect production.
Its customers include Electronic Arts, British Airways and Internet performance management company Dyn, which credits Logz.io with helping it deal with a DDoS attack in October that temporarily shut down or slowed many popular websites such as Twitter, Spotify, Reddit, and others.
“Being actively involved with Dyn, I saw firsthand how instrumental Logz.io was in our ability to quickly respond to and mitigate the recent unprecedented DDoS attacks we experienced, helping us assess the source, target, and protocols used,” said Jim Baum, Dyn’s executive chairman, in a prepared statement. Baum is also on the board of directors for Logz.
In a case study, Dyn explained how it consolidated manual processes it had been using to monitor log files and now has scaled with the company that now has more than 130 users overseeing its 18 global data centers.
Furniture company Rent-A-Center had created two different ELK stacks, one on-premise and one in the cloud. It was ingesting more than 100GB of data per day and struggling with optimizing performance and reliability. During a pilot phase, Logz alerted the company to multiple failed root login attempts that Rent-A-Center previously did not detect, boosting its confidence in the new vendor.
Feature image via Pixabay.