Why is it so hard to get predictable performance when deploying big data applications?
That’s the problem that MityLytics set out to solve when it was founded in 2015. It aims to help dev and ops folks know how their workloads will perform on various hardware and the IT bosses to plan how to best scale for the future.
“We look at what an app is doing and exactly what your capacity and budget is,” Chief Technology Officer Manish Singh explained. Then if you want to change something — if you want to change the way it partitions data, for instance, or use different hardware — its predictive algorithms will give you an idea of the effects that will have.
This is done by modeling and simulating the behavior of big data workloads, such as Terasort on MapReduce. Singh spoke on identifying bottlenecks in Spark, Hadoop and Hive at Data by the Bay last year and more recently talked about running Cassandra on ARM processors at Big Data North America, part of ApacheCon held recently in Miami.
It teamed up with Packet, which provides a bare-metal cloud, to test the performance of Kafka, Spark And Cassandra on bare metal.
Once customers deploy their big data application on the simulation hardware, it determines how it will scale and perform with higher volumes of data on scaled-up infrastructure. Its fully-automated performance test employs deep-learning software to predict running times and resource utilization. What if you want to add RAM or CPUs?
“You don’t actually have to spin up an application to see how it will run,” Singh said of its simulator, which provides a dynamic assessment across the app’s entire lifecycle.
It can help with planning, Singh said, by outlining the performance you will get for a particular dollar investment. That type of planning is similar to that used in civil engineering and aeronautics projects, he said.
Developers can test their apps in a simulated environment. They can help ensure their apps will get the resources they need and that with changes, apps will stay within cost and resource budgets.
It provides advance notice of performance and scaling bottlenecks at development and deployment time and makes suggestions for code changes to improve performance across the entire workload.
It works even on custom applications, with just 10 runs of data from all the endpoints in a given infrastructure. It just need to understand the characteristics of the home-grown environment, Singh said
Seeing Effects of Change
Co-founder Rishi Divate brings his experience at SumoLogic into play by applying monitoring to application optimization. Singh brings a background in software-defined networking at Pica8 and cloud services orchestration and automation at GoGrid. A third co-founder, Heather McKelvey, also came from GoGrid and Basho Technologies. Their Alameda, Calif.-based company has seven employees.
It’s offering a test drive of MiCPM, its “point and deploy” SaaS platform with Spark-HDFS clusters. Using aggregated monitoring data and algorithms, MiCPM provides workload simulation and automated performance tuning.
Its technology is intended for private and public cloud Big Data environments such as AWS Elastic MapReduce and Azure HDInsight as well as on clusters deployed in enterprise data centers.
The company has five proof-of-concepts under way, including at telecoms and high-performance computing firms. It’s looking for more participants.
The Fortune 500 companies it’s been working with often have on-premise data centers, but want to know how their applications would be affected should they move to the cloud. MityLytics helps them do that, Singh said.
Packet is a sponsor of InApps Technology.
Feature image by Pixabay.