VCSLAB 2020-21 Projects (Mohan Potheri)
This is a list of items that are completed or in progress.
Area | Project | Description | Planned Deliverables | Location | Start Quarter | Status | Owner (type "@" to write owner) |
Machine Learning | Distributed Machine Learning with VMware Bitfusion | Showcase Distributed ML with VMware Bitfusion | Video: https://youtu.be/ET2j_zP1_iM | SC2 | 2020 Q1 | Completed | InterraIT |
Modern Databases | Reniac with FPGA Acceleration | Collaborative project with Reniac, Intel and VMware for FPGA based acceleration | Blog: Accelerating Virtualized & Distributed Cassandra databases with FPGAs | SC2 | 2020 Q3 | Completed |
|
Machine Learning | ML/AI Reference Architecture in collaboration with Dell | This Reference Architecture describes a VMware Cloud Foundation based solution for Machine Learning environments with GPUs. The solution combines VMware virtualization and container orchestration with the latest hardware innovations to provide robust infrastructure for machine learning applications. | Paper: Sharing GPUs in ML/AI environments. (A Reference Architecture with DTC) | SC2 | 2020 Q2 | Completed | Janet Morss (Dell) |
High Performance Computing | HPC Reference Architecture update for Dell | High Performance Computing (HPC) workloads have been traditionally run on bare-metal, non-virtualized clusters. Virtualization was often seen as an additional layer that leads to performance degradation. Performance studieshave shown that virtualization often has minimal impact on HPC application performance. | SC2 | 2020 Q2 | Completed | Janet Morss (Dell) | |
Machine Learning & Kubernetes | Integration between Bitfusion, TKG and PKS Enterprise | There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters. | SC2 | 2020 Q3 | Completed | ||
Machine Learning & HPC | VMware Tanzu Kubernetes on FlashBlade® for ML/HPC Applications | There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters. | Blog: https://blog.purestorage.com/purely-technical/vmware-tanzu-kubernetes-on-flashblade/ = | SC2 | 2020 Q3 | Completed | Bikash Choudhury |
Machine Learning | NVIDIA & VMware Keynote A100 compute | Collaborative project to showcase Covid related computing and imaging for use in VMworld Keynote | Keynote video: https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1598654132634001BSxt | SC2 | 2020 Q3 | Completed | |
Machine Learning | Data for Good (Part of the VMware Tech for Good Initiative.) | Nonprofits and NGOs have accumulated tremendous amount of data over the past decades. Their data has great potential to solve many of the world’s most pressing problems. These organizations lack the compute & storage capacity along the technology to analyze the data that they have accumulated many of the world’s pressing problems. vSphere is a great hybrid platform for data analytics with its support for high performance networking and accelerators like GPU. VMware Solutions Lab has a robust hybrid cloud infrastructure for high performance data analytics, which it will make available and free to use for nonprofits. Common data analytics problems (Kaggle Competitions) will be validated to run well in the lab infrastructure. VMware vMLP platform would be leveraged for this project. |
Video: In Progress | SC2 | 2020 Q4 | Completed |
|
Data Science | Apache Spark 3 with NVIDIA GPU based analytics | Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is now a native option for Spark resource manager. By packaging Spark application as a container, you can reap the benefits of containers because you package your dependencies along with your application as a single entity. Given the “embarrassingly parallel” nature of many data processing tasks, GPUs can be of tremendous benefit. This project will combine VMware Tanzu, Apache Spark and NVIDIA GPU capabilities to showcase accelerated Apache Spark 3.0. | Video: Accelerated Apache Spark 3 on VMware Tanzu for XGBOOST Accelerated Transactions processing leveraging Virtualized Apache Spark and NVIDIA GPUs | SC2 | 2020 Q4 | Completed | |
Database as a Service | DBaas with Scalegrid | Showcase partner capabilities of Scalegrid for end to end Database as a service capabilities. | Blogs: In progress Video: https://youtu.be/LxR6RPWWxpE | SC2 | 2021 Q1 | Completed |
|
Multi-Cloud & Kubernetes | Deploy a multi-tiered Tanzu based solution across clouds and manage with Tanzu Mission control and Tanzu Service Mesh. | Funded project for Tanzu Multi-Cloud with a distributed modern application | Paper: https://bit.ly/37EKDpP Blogs: https://blogs.vmware.com/apps/2021/06/multicloud-k8-part1.html Video Demo: https://youtu.be/j3txpQaEgnk | SC2 | 2021 Q1 | In Progress |
|
Distributed Machine Learning | Accelerated Distributed AI/ML training with PVRDMA | Deploy Horovod with PVRDMA for distributed machine learning training | Blogs: https://blogs.vmware.com/apps/2021/09/horovod-part-1-of-2.html https://blogs.vmware.com/apps/2021/09/horovod-part-2-of-2.html Video: https://youtu.be/Jw0XxkCI_jc VMworld Session: TBD | SC2 | 2021 Q2 | Completed | PVRDMA Team |
Transfer Learning across Multi-Cloud Environments | Training on-prem and inference in AWS | Collaborative project with AWS team on AI/ML. Infrastructure across multi-cloud | End to End Machine Learning with Training on-premises & inference in AWS using transfer learning
https://blogs.vmware.com/apps/2021/10/transfer-learning-aws-part2.html | SC2 | 2021 Q3 | Completed | Sahil Thapar (AWS) Rakesh Ramdas (AWS) |
Multi-Cloud Machine Learning with data from on-premises and training with Google Cloud Vertex platform | Training with GCP Vertex and data from on-premises | Collaborative project with Google Cloud Team and VMware. | Part 1: https://bit.ly/2ZUwDIt Part 2: https://bit.ly/3GLfz8x & Video can be found at Multi-Cloud AI/ML with data from on-premises and training with Google Cloud Vertex platform | SC2 | 2021 Q3 | Completed | Mohan Potheri Wade Holmes (Google) |