Teradata Unveils New Data Lake and Advanced Analytics Offerings


(Risto Vita/Shutterstock)

Teradata today rolled out a pair of new products designed to broaden its appeal to a new generation of users, including a new data lake called VantageCloud Lake that combines the workload management capabilities of its namesake files to cloud object storage efficiency, and ClearScape Analytics, which introduces new data preparation tools and MLOps, as well as advanced time-series analytics for IoT data.

When Teradata launched Vantage four years ago, delivering its data warehouse in a cloud environment with elastic scaling and pricing was the big news. The company was tired of losing workloads to disappointing Hadoop clusters, and as AWS and other clouds began to gobble up these failed Hadoop experiments, the writing was clearly on the wall: the cloud was the ‘coming.

While Teradata’s cloud pivot has been fairly successful (the cloud now drives most of its revenue growth), the company’s flagship product itself – an MPP column-oriented analytics database – hasn’t not much changed in the cloud. Earlier this year, the company gave an overview of the changes it was working on with object storage and the separation of compute and storage. But its largest enterprise and government accounts continue to use the relational database to power mission-critical SQL workloads, whether in the cloud or on-premises.

That changes with today’s launch of VantageCloud Lake, which is Teradata’s first official foray into a true cloud-native architecture. Instead of storing data in the proprietary Teradata file system, which is fast and powerful but also expensive, VantageCloud Lake stores data in Amazon S3, using one of several open file formats, such as Parquet, JSON, CSV and Databricks. Delta Lake format (Apache Iceberg is in preparation).

While VantageCloud Enterprise continues to be optimized for high-end production analytics workloads where performance and efficiency are paramount, the new VantageCloud Lake offering is geared more toward data scientists who want the ability to quickly build and easily an exploratory analysis environment, then roll it back just as quickly, said Stephen Brobst, CTO of Teradata.

“Data lake users are more sophisticated, in the sense that they want to provision a data lab, they want to be able to work with data, do data mining, data science work, Brobst said. . “There’s a self-service capability where they can just provision their own resources on demand, and when they’re done, they just shut it down.”

Downtime will be basically non-existent with VantageCloud Lake, which Teradata also calls Cloud Lake Edition, or CLE. “In the past, if you wanted to add resources, etc., you had to restart. It all went with the CLE,” Brobst said.

But CLE is not a complete break from VantageCloud Enterprise, the cloud version of the classic Teradata system that was previously called Teradata Vantage, as CLE retains Enterprise Edition’s workload management capabilities as well as technology hooks .

The workload management capability in CLE is key to avoiding the large bills that users of new cloud-based analytics offerings sometimes receive, according to Teradata. Brobst cited an analyst report that found that more than 80% of cloud analytics customers exceed their budget by more than 50% in the first 18 months. These are the “newbie” players, and he pointed to Snowflake as a top offender.

Stephen Brobst, CTO of Teradata

“Consumers of those entry-level gamers who are not sophisticated in workload management. They basically give the seller a blank check,” Brobst said. datanami. “It’s very, very painful. At Teradata, we’re much more efficient in using resources and managing within the client’s budget.

The ability to manage concurrency is critical enterprise-wide, Brobst said. When an analytics cloud is deficient in this category, it “uses elasticity as a crutch,” he said. “Beyond a certain concurrency, they start spinning more and more clusters, which use resources very inefficiently,” he continued. “We don’t have that problem. This is why our technology is more efficient in terms of factors (not percentages, but factors) based on cost per request. »

In addition to sharing the workload management capabilities of its big brother, the new CLE also gains the ability to leverage the benefits of the proprietary Teradata file system when the workload can benefit.

“In Cloud Lake Edition, we’re heavily object store-centric,” Brobst said. “We still have access to the block store, and we do caching and things like that. But the software configuration [in Enterprise Edition] is very different for these quick input and output queries compared to the more exploratory work that occurs in the Lake edition.

While CLE stores data in Amazon S3 by default or Microsoft Azure and Google Cloud object stores, which Teradata has committed to supporting with CLE in 2023, it’s not the only option. CLE also has an “Organized Data Format” which is a variation of how the Teradata file system reads and writes data from block storage, or Amazon Elastic Block Store (EBS).

“I wouldn’t say it’s exactly the Teradata filesystem, because we’re optimizing it for the object store,” he said. “We have, I’ll call it, a variation of cloud-optimized native storage.”

Either way, users probably won’t notice the difference in what’s happening under the covers. And if users want to move workloads to VantageCloud Enterprise, it’s not difficult.

“It’s important to note that the API consistency between Lake Edition and Enterprise is there,” Brobst said. “Any workload that I created as a data scientist in the Lake edition, I can move it to the Enterprise edition without any friction. So that’s something very important for this unified architecture approach that we adopt.

The changes on the analytics front, with ClearScape Analytics, are almost as significant as the changes with CLE.

ClearScape Analytics is a new suite of in-database analytics and machine learning tools that can run on any Teradata environment, including both VantageCloud offerings and on-premises Vantage environments. The new offering includes existing Teradata functionality and introduces new functionality in two major areas, including ModelOps and time series functions.

“Analytics is more than just scoring the model at the end. There’s a whole data preparation pipeline. Depending on who you talk to, 80% of the work is processing and transforming the data, etc. We so we have a whole bunch of capabilities in that area.

When it comes to time series, ClearScape Analytics builds on support for the time series data type with new algorithms and other functions designed to process time series data.

Meanwhile, ClearScape’s MLOps capabilities are designed to help data scientists automate the machine learning lifecycle, including capturing, training, deploying, and monitoring the machine learning model in production. . “It’s something that was done before with a lot of scripting and a lot of manual stuff,” Brobst said. “Now it’s completely automated.”

ClearScape is not designed to be a data science workshop. Instead, Teradata wants the product to be used in conjunction with a data science notebook like Jupyter or tools from SAS, Dataiku, H2O.ai, RStudio and others.

“It doesn’t replace those things,” Brobst said. “The data scientist uses the best tool for them. But then we automate the process behind the scenes for model deployment, monitoring, etc.

ClearScape Analytics reflects Teradata’s plan to be more aggressive in adopting open source in the age of AI and enabling its customers to use open source predictive and prescriptive analytics with its platform. form of data management.

“A lot of machine learning is absolutely open source,” Brobst said, citing things like TensorFlow and scikit-learn and other R and Python libraries. “But we have these capabilities in the database that are uniquely aligned with the capability of the original architecture that came out of CalTech, which was kind of where the Teradata technology came from. And we can provide these libraries with an order of magnitude, or several orders of magnitude, accelerated.

Teradata won’t be open source for its core technology anytime soon. But as the use of machine learning explodes and new deep learning techniques emerge, the company sees an opportunity to not only bring more unstructured data (the raw material of deep learning and AI) under its wing, but also to give data scientists better tools to get their AI creations to market.

“We are the data management platform. We’re still the best at it, and that’s where we come from,” Brobst said. “There is no AI without data. Data is the fuel. And so as I said, it is not our intention from an R&D point of view to invest in new algorithms. But we will take these algorithms invented in the academic and commercial community, with partners, etc., and we will design them to work in parallel, to make them work faster. We will absolutely support it.

Related articles:

Teradata submits its new cloud architecture to the 1,000 node test

Teradata Rides Cloud Wave to Highest Level in Two Years

Inside Teradata’s bold plan to consolidate analytics


Comments are closed.