Home
>
DevOps News
>
Tecton Helps Data Scientists Own Features, and the Model Lifecycle – InApps 2022

March 30, 2022 by Phu Nguyen

Tecton Helps Data Scientists Own Features, and the Model Lifecycle – InApps 2022

Main Contents:

Tecton Helps Data Scientists Own Features, and the Model Lifecycle – InApps is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn Tecton Helps Data Scientists Own Features, and the Model Lifecycle – InApps in today’s post !

Read more about Tecton Helps Data Scientists Own Features, and the Model Lifecycle – InApps at Wikipedia

You can find content about Tecton Helps Data Scientists Own Features, and the Model Lifecycle – InApps from the Wikipedia website

This has been called the year of the feature store, with Databricks and Google among the most recent vendors announcing this technology to smooth the path for harnessing machine learning models in production. Twitter, Facebook, Comcast, Netflix, Pinterest and others also offer feature store platforms.

Not to be confused with Tekton, the open-source framework for creating CI/CD systems, the commercial enterprise feature store Tecton aims to standardize and automate the management of features in production machine learning (ML) applications.

Tecton.ai founders Mike Del Balso, Kevin Stumpf and Jeremy Hermann worked together at Uber as it created the Michelangelo machine learning platform.

Before Michelangelo, data scientists at Uber would create models, then pass them on to engineers who cobbled together open source tools to manage them, Del Balso said. The company had no standardized system for building reliable and reproducible pipelines for creating ML models. Models could not be larger than what would fit on a data scientist’s desktop, there was no centralized storage for training experiments and no way to compare experiments.

“That data management side of machine learning is really the unique thing that we built. And that’s what really inspired us to build Tecton, because we saw how useful that was at catalyzing this explosion of machine learning [that] enabled the company to go from zero to tens of thousands of models in production,” he said.

“We’re trying to bring that same change to the rest of the industry by bringing that same kind of data layer for machine learning, especially for real-time machine learning applications, to other organizations who are trying to figure this stuff out.”

Del Balso, who before his work at Uber helped build the machine learning system for Google’s ad division, notes Tecton is focused on operational machine learning — applying the data the company already has into decision-making for its products, rather than more research-based or analytical uses for data.

“Data scientists often work locally, training models and building the pipelines of data that feed them. But taking that local model into at-scale production is an arduous, time-consuming process, subject to constraints that just aren’t present in the training environment. Furthermore, models trained offline have to be pushed online, and operate on the same type of data (called features) in order to give sensible results. But the tooling to standardize, govern and collaborate around ML data is still incredibly immature,” Martin Casado, general partner at the venture capital firm Andreessen Horowitz, wrote of its investment in Tecton. The company has raised $60 million to date.

Full ML Lifecycle

The technology is more than just a database of features, those variables or attributes such as name, age, sex used in machine learning models.

“Tecton allows for the data scientists to be empowered throughout that machine learning lifecycle, and allows them to both build the prototype. But then in the process, the data pipelines are automatically productionized,” Del Balso said. “So the engineering teams, they have a much easier job because there’s not a lot of cumbersome and error-prone rebuilding of different pipelines along the way. …There’s this is kind of like prototyping transformation, the productionization, and there’s an element of monitoring and quality management along the way.

The Tecton platform consists of:

Feature pipelines for transforming raw data into features or labels
A feature store for storing historical feature and label data
A feature server for serving the latest feature values in production
An SDK for retrieving training data and manipulating feature pipelines
A web UI for managing and tracking features, labels, and data sets
A monitoring engine for detecting data quality or drift issues and alerting

It includes the transformation of features; storage, which consists of an online and an offline store for fast retrieval and slow retrieval; feature serving and then a governance layer, “to help ensure, ‘Hey, these features are only accessible to these teams,’ ‘Help me understand the lineage of different features,’ all the metadata and collaboration that’s needed in building these machine learning applications. And then a data quality and monitoring layer for features to understand the debugging processes that you have with data in your machine learning applications,” he said.

Features are defined as code for any Python environment using the Tecton SDK. The platform can pull existing features from external data sources, but also to compute features on raw data using PySpark, Spark SQL or Python transformations on batch and streaming data.

The offline store contains historical feature values across time and is used to generate training data in batch. The offline feature store is configurable but defaults to Delta Lake. The online store uses AWS DynamoDB to provide the latest feature values for low-latency retrieval.

You can specify configurations like the date in the past to backfill features to, the schedule for future jobs, a time to live and more.

Training datasets are delivered as pandas or Spark dataframes. Once you have your dataset, you can use your existing tools such as XGBoost, TensorFlow, PyTorch to deploy models.

Tecton enables data scientists to use in their models more data that they already have by bringing data sources together in real-time, Del Balso said, and using that real-time data in their applications.

Joining Feast

In April, the San Francisco-based company announced it was hiring Willem Pienaar, founder of the open source feature store Feast, and becoming a major contributor to the project. Feast was created while Pienaar led the data science team at Chinese ride-hailing startup Gojek and in conjunctionå with Google. Feast recently released version 0.10.

“It’s just like something that allows people to get started really easily with feature stores. And we expect to have a lot of additional elements like compatibility between the Feast user experience and the Tecton user experience over time,” Del Balso said. “Today, they’re separate platforms; tomorrow, they may not be. Our goal is to make it really easy for there to be a bridge between them.”

Going forward, the company plans deeper integrations with the data warehouse ecosystem and to add other clouds beyond Amazon Web Services. It plans first-class integrations with Snowflake and Redshift this year. It wants to help users generate better features for their models, find the data most relevant to their decision-making, and to help people figure out how to piece together the ML infrastructure into an architecture that makes sense for their use case, he said. It wants to be able to offer users a template for building a fraud application, a recommendation template, a prediction template, “and have all of the data flows be pre-built for that organization, so they just plug us into their data, this is a pretty big thing that we are spending a lot of time on,” Del Balso said.

List of Keywords users find our article on Google:

tecton ai

feature store

databricks training

data science wikipedia

tecton

data scientist jobs san francisco

“tecton industries”

wso2 training

own the lifecycle

hire databricks developers

amazon data scientist

databricks lineage

great lakes data science reviews

leap engine tooling

data scientist job description linkedin

databricks data lineage

data modeler jobs in usa

andreessen horowitz jobs

hire pyspark developers

streaming data and tensorflow

databricks governance

“contact tecton”

“tecton quality”

“tecton”

“who is tecton industries”

feature store tecton

data life cycle wikipedia

wso2 store is a

databricks customer success engineer

data scientist linkedin profile

linkedin industry for data science

capabilities of tecton industries

data science linkedin profile

data scientist profile linkedin

tens machine wikipedia

data scientist linkedin

data engineer uber

uber data engineer

“redshift digital”

linkedin profile for data scientist

pyspark logo

databricks deploy model

data bricks icon

dynamodb offline

ci cd databricks

“databricks”

data scientist at facebook

share spark dataframe

hire pyspark developer

databricks ui

facebook machine learning jobs

business development gojek

databricks data sources

tensorflow serving aws

google ml metadata

case when pyspark

pyspark sql between

tensorflow model fit

who sells tekton tools

databricks-datasets

dynamodb jobs

eve online models

tecton industries

aws sdk dynamo

ml metadata store

snowflake python pandas

first look: mysql 8 for developers online courses

data scientist jobs in san francisco

databricks data scientist

devops for data scientists online courses

qc labels

dynamodb spark

help.uber.com number

netflix data scientist

dynamodb client

testcontainers version

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.