VCSLAB 2020-21 Projects (Mohan Potheri)

This is a list of items that are completed or in progress.

Area

Project

Description

Planned Deliverables

Location

Start Quarter

Status

Owner (type "@" to write owner)

Machine Learning

Distributed Machine Learning with VMware Bitfusion

Showcase Distributed ML with VMware Bitfusion

Blogs: https://blogs.vmware.com/apps/2020/03/scaling-distributed-machine-learning-leveraging-vsphere-bitfusion-and-nvidia-gpu-part-1-of-2.html

Video: https://youtu.be/ET2j_zP1_iM

SC2

2020 Q1

Completed

Mohan Potheri

InterraIT

Modern Databases

Reniac with FPGA Acceleration

Collaborative project with Reniac, Intel and VMware for FPGA based acceleration

Blog: Accelerating Virtualized & Distributed Cassandra databases with FPGAs

SC2

2020 Q3

Completed

Mohan Potheri

 

Machine Learning

ML/AI Reference Architecture in collaboration with Dell

This Reference Architecture describes a VMware Cloud Foundation based solution for Machine Learning environments with GPUs. The solution combines VMware virtualization and container orchestration with the latest hardware innovations to provide robust infrastructure for machine learning applications.

Blog: Part 1, Part 2, Part 3

Paper: Sharing GPUs in ML/AI environments. (A Reference Architecture with DTC)

SC2

2020 Q2

Completed

Mohan Potheri

Janet Morss (Dell)

High Performance Computing

HPC Reference Architecture update for Dell

High Performance Computing (HPC) workloads have been traditionally run on bare-metal, non-virtualized clusters. Virtualization was often seen as an additional layer that leads to performance degradation. Performance studieshave shown that virtualization often has minimal impact on HPC application performance.

Paper: Virtualized HPC Reference Architecture

SC2

2020 Q2

Completed

Mohan Potheri

Janet Morss (Dell)

Machine Learning & Kubernetes

Integration between Bitfusion, TKG and PKS Enterprise

There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters.

Blog: Part 1, Part 2

Video: VMware TKG Integration with Bitfusion

SC2

2020 Q3

Completed

Mohan Potheri

Machine Learning & HPC

VMware Tanzu Kubernetes on FlashBlade® for ML/HPC Applications

There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters.

Blog: https://blog.purestorage.com/purely-technical/vmware-tanzu-kubernetes-on-flashblade/

=

SC2

2020 Q3

Completed

Mohan Potheri

Bikash Choudhury

Machine Learning

NVIDIA & VMware Keynote A100 compute

Collaborative project to showcase Covid related computing and imaging for use in VMworld Keynote

Keynote video: https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1598654132634001BSxt

SC2

2020 Q3

Completed

Mohan Potheri

Justin Murray 

Machine Learning

Data for Good (Part of the VMware Tech for Good Initiative.)

Nonprofits and NGOs have accumulated tremendous amount of data over the past decades. Their data has great potential to solve many of the world’s most pressing problems. These organizations lack the compute & storage capacity along the technology to analyze the data that they have accumulated many of the world’s pressing problems. vSphere is a great hybrid platform for data analytics with its support for high performance networking and accelerators like GPU. VMware Solutions Lab has a robust hybrid cloud infrastructure for high performance data analytics, which it will make available and free to use for nonprofits. Common data analytics problems (Kaggle Competitions) will be validated to run well in the lab infrastructure. VMware vMLP platform would be leveraged for this project.

Blog: https://blogs.vmware.com/apps/2020/12/a-data-for-good-solution-empowered-by-vmware-cloud-foundation-with-tanzu-part-1-of-3.html

 

Video: In Progress

SC2

2020 Q4

Completed

Mohan Potheri

Andrii Nevarov

 

Data Science

Apache Spark 3 with NVIDIA GPU based analytics

Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is now a native option for Spark resource manager. By packaging Spark application as a container, you can reap the benefits of containers because you package your dependencies along with your application as a single entity. Given the “embarrassingly parallel” nature of many data processing tasks, GPUs can be of tremendous benefit. This project will combine VMware Tanzu, Apache Spark and NVIDIA GPU capabilities to showcase accelerated Apache Spark 3.0.

Blogs: https://blogs.vmware.com/apps/2021/02/accelerated-apache-spark-3-leveraging-nvidia-gpus-on-vmware-cloud-part-1-of-2.html

Video: https://youtu.be/EK2YqcjKkgc

Paper: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/vmw-gpu-accelerated-spark.pdf

Talks: https://gtc21.event.nvidia.com/media/Apache Spark Acceleration over VMware’s Tanzu with NVIDIA’s GPU and Networking Solutions [S31710]/1_iaacamwx

SC2

2020 Q4

Completed

Mohan Potheri

Database as a Service

DBaas with Scalegrid

Showcase partner capabilities of Scalegrid for end to end Database as a service capabilities.

Blogs: In progress

Video: https://youtu.be/LxR6RPWWxpE

SC2

2021 Q1

Completed

Mohan Potheri

 

Multi-Cloud & Kubernetes

Deploy a multi-tiered Tanzu based solution across clouds and manage with Tanzu Mission control and Tanzu Service Mesh.

Funded project for Tanzu Multi-Cloud with a distributed modern application

Paper: https://bit.ly/37EKDpP

Blogs:

Video Demo:

SC2

2021 Q1

In Progress

Mohan Potheri

 

Distributed Machine Learning

Accelerated Distributed AI/ML training with PVRDMA

Deploy Horovod with PVRDMA for distributed machine learning training

Blogs:

Video:

VMworld Session: TBD

SC2

2021 Q2

Completed

Mohan Potheri

PVRDMA Team

Transfer Learning across Multi-Cloud Environments

Training on-prem and inference in AWS

Collaborative project with AWS team on AI/ML. Infrastructure across multi-cloud


SC2

2021 Q3

Completed

Mohan Potheri

Sahil Thapar (AWS)

Rakesh Ramdas (AWS)

Multi-Cloud Machine Learning with data from on-premises and training with Google Cloud Vertex platform

Training with GCP Vertex and data from on-premises

Collaborative project with Google Cloud Team and VMware.

Part 1: 

Part 2: 

& Video can be found at 

SC2

2021 Q3

Completed

Mohan Potheri

Wade Holmes (Google)