Comments from VAST Data co-founders CEO Renen Hallak and CMO Jeff Denworth suggest the company will develop data infrastructure software that can help realize the vision of Thinking Machines, a long-defunct supercomputing team. .
It’s far from simply storing bits of data in the clever way that VAST Universal Storage accomplishes.
In May last year, Hallak told a Protocol Enterprise writer that VAST wanted to run its own data science platform, with a trajectory he said would pit it against vendors such as Databricks.
“We think five years from now…this stack of infrastructure needs to be very different. It needs to enable AI supercomputers rather than the apps we had in the past… Vertical integration adds to the simplicity. But more than that, it lets you take full advantage of the underlying technology. »
Hallak hinted that VAST Data would seek to build most of the platform itself: “It wouldn’t be possible for us to just buy someone else…and tie it on top of our system. We always lean towards doing the critical parts ourselves. And if there are peripherals that aren’t as important…then maybe there would be room for acquisitions.
“There is a huge opportunity to compile different data services into a single product suite,” he added.
Denworth told Computer Weekly in November last year: “Over the next 20 years we will see a new class of applications. It won’t be about transactional data, it won’t be about digital transformation. What we will see is computing coming to humans; see, hear and analyze this natural data.
“We are aware of not computing in a single data center and using unstructured and structured data. We are also aware that data has gravity, but so does computing when you reach the top of the line. “
This will mean a new IT framework with “very ambitious products” to be announced by VAST.
In January, Denworth told The Next Platform, “Thinking Machines was a very bespoke supercomputing company that focused on building some really interesting systems over time. That’s ultimately what we’re going to aim for: a system that can finally think for itself. »
He added: “We realized that we could go far beyond the classic definitions of a file system, but the realization was that the architecture that has the most intimate understanding of the data can make the best decisions about it. what to do with that data. First by figuring out what’s inside. Second, by moving the data to where the computation is or the computation to where the data is, depending on the most optimized decision at a given time.
The Beeler Video Podcast
Denworth told Brian Beeler on a Storage Review video podcast: “The next 20 years could [see] something we call natural transformation, where computers start to adapt to people… Our realization is that if you rethink everything at the infrastructure level, there are gains that can be made higher up the stack towards which we will take the world over the next two years.”
“Computers are definitely at a point where they can now do the sensory part of what humans could do before; they can see, they can hear, they probably can’t smell as much, but they understand natural information more and more. closer to how humans understand them. And I think the leap again, from there to having thinking machines, can be big, maybe smaller. But once you get to a thinking machine, the game is over, you don’t need anything more than that.
“And so I think that’s warranted, that we’re putting all of our resources into building infrastructure that enables that next wave. And I think we’ll be surprised how far we can go in terms of what’s possible.
He talked about organizations working in different parts of the stack: “We have, obviously, hardware vendors working on GPUs, we have vendors like us working on that middle of the infrastructure sandwich part and software, we have application vendors working on life sciences, genomics, medical imaging, we have financial institutions profiting from all types of information coming into their systems, it’s really exciting.
The arrival of data will stimulate activity: “I think things are reversed if you previously had an application, and it read data, either from memory or storage, in order to manipulate it, and then she was writing the result he understood to be the case i think the more we are going to see of data driven applications the data itself as it flows through the system will trigger functions that need to be executed on it based different characteristics of this information.
“And then you’ll have the recursion of more and more functions that need to be performed because of what we understand about that specific information when we compare it to the rest of the data that we already have stored specifically as it relates to GPU,” said said Denworth.
“I think the fact that we call ourselves VAST data is a big clue. We are trying to build this next generation of data infrastructure.
“People will see us expanding storage space and getting closer and closer to realizing our customers’ true vision of universal storage without having to think about where they put their data and how much they have access to it, and what can be done with it.
“And at the same time, you’ll see more and more not necessarily stock parts coming from us as well, based on the feedback we’re getting from customers.”
VAST “will basically work to help customers solve their whole problem of data processing, deep machine learning in a hybrid cloud world, in a way, where we don’t just take the complexity of the prioritization and things like that as considerations and take them off the table… And it seems [be] is becoming more and more popular as people start to understand some of these natural language processing models, some of these new computer vision or computer audio models. And so it’s, it’s pretty exciting. We have a lot to do with Nvidia.
Thinking Machines and Databricks
Thinking Machines was a supercomputing company established in 1983 to build highly parallel systems using the artificial intelligence technology of that era. The goal was to sift through masses of data much faster than serial computing and thus arrive at decisions in seconds or minutes instead of days or weeks.
The company outgrew itself and collapsed in 1994, with parts being purchased by Sun Microsystems. Its architecture typically required a front-end server, back-end Sparc processors, and vector processors
In February last year, Blocks & Files wrote: “Databricks enables rapid querying and SQL analysis of data lakes without having to first extract, transform and load the data into data warehouses. The company claims that its “Data Lakehouse” technology offers 9 times better price performance than traditional data warehouses. Databricks supports both AWS and Azure clouds and is generally considered a competitor to Snowflake, which had a huge IPO in September 2020… Databricks’ Delta Lake open source software is built on Apache Spark.
The VAST Future
VAST Data will build a layer of data infrastructure vertically integrated with its existing storage platform to form what would today be called an AI supercomputer. This layer will provide data lake capabilities and can initiate analytics processing itself; data as it flows through the system will trigger functions that need to be performed on it.
VAST CTO Sven Breuner has previously confirmed this, stating that VAST will bridge the customer’s separate VAST systems: “Now is the time to start scaling up by building in more layers around database-like functionality and around the connection seamless distribution of geodistributed data centers”.
We believe that VAST will use many Apache open source software, such as Spark like Databricks, Druid like Imply and Kafka like Confluent.
VAST studies hearing, speech and vision applications and will use Nvidia hardware, such as Grace and Hopper chip systems. We are confident that penta-level cellular flash and the CXL bus will play a role in VAST’s storage and infrastructure roadmap.
It will showcase its IT infrastructure systems, both on-premises and in the public cloud, with the goal of helping customers solve all their data processing and deep learning challenges in a hybrid cloud world. We believe that VAST will not port its Universal Storage software to the public cloud. The Cnode software could be ported easily, but the Dnode framework (storage-class front-end memory drive with NVMe QLC SSD backend drives) could be difficult to replicate with the appropriate storage instances in the public cloud.
B&F thinks it is more likely that there is a VAST system in a public cloud available directly or indirectly to CSP customers.
Our understanding is that VAST will announce its 10-year roadmap at an event later this year.