VSLAB Projects (2020-2021)

This is a list of items that are completed or in progress.

Area

Project

Description

Planned Deliverables

Location

Start Quarter

Status

Owner (type "@" to write owner)

Oracle

Oracle Experts event DOAG Germany

Training was conducted for Oracle Experts in DOAG. VSLAB was used for demo and hands on exercises.

 Labs and Workshops

SC2

2020 Q1

Completed

Sudhir Balasubramanian

Oracle

VMware and Rocky Mountain Oracle User Group (RMOUG) - Training Days, Feb 18th, 2020

VMware and Rocky Mountain Oracle User Group (RMOUG) - Training Days, Feb 18th, 2020

Sessions / Workshop

SC2

2020 Q1

Completed

Sudhir Balasubramanian

Oracle

VMware and Oracle together - A brave World – Oracle Cloud Days, 2020

VMware and Oracle together - A brave World – Oracle Cloud Days, 2020

Sessions / Workshop

SC2

2020 Q1

Completed

Sudhir Balasubramanian

 

PCIE Device placement and NUMA

Device locality and NUMA placement on vSphere

Blogs: https://frankdenneman.nl/2020/01/10/pcie-device-numa-node-locality/

SC2

2020 Q1

Completed

Frank Denneman

 

Pure Storage Orchestrator with PKS

Leverage PSO for seamless kubernetes storage provisioning and management

Blogs: https://blog.purestorage.com/jupyter-flash-kubernetes/

SC2

2020 Q1

In Progress

Pure Storage

 

Distributed Machine Learning with VMware Bitfusion

Showcase Distributed ML with VMware Bitfusion

Blogs: https://blogs.vmware.com/apps/2020/03/scaling-distributed-machine-learning-leveraging-vsphere-bitfusion-and-nvidia-gpu-part-1-of-2.html

Video: https://youtu.be/ET2j_zP1_iM

Paper: TBD

SC2

2020 Q1

Completed

Mohan Potheri

InterraIT

 

Confluent with VMware vSphere

Validate Confluent solution with Apache Kafka on vSphere

Blogs: TBD

Video: TBD

SC2

2020 Q1

In Progress

Justin Murray 

 

Dotscience with VMware vSphere

Validate Dotscience solution for ML/AI end to end workloads

Blogs: Kafka on vSphere with Kubernetes leveraging Confluent Operator

SC2

2020 Q1

Completed

Justin Murray 

 

Research with VMware Research Group

VMware Research is working on a research project on multi-application scheduling for ML and looking for a GPU cluster

Blogs: None

SC2

2020 Q1

Completed

Sangeetha Jyothi

 

Reniac with FPGA Acceleration

Collaborative project with Reniac, Intel and VMware for FPGA based acceleration

Blog: Accelerating Virtualized & Distributed Cassandra databases with FPGAs

SC2

2020 Q3

Completed

Mohan Potheri

 

 

ML/AI Reference Architecture in collaboration with Dell

This Reference Architecture describes a VMware Cloud Foundation based solution for Machine Learning environments with GPUs. The solution combines VMware virtualization and container orchestration with the latest hardware innovations to provide robust infrastructure for machine learning applications.

Blog: Part 1, Part 2, Part 3

Paper: Sharing GPUs in ML/AI environments. (A Reference Architecture with DTC)

SC2

2020 Q2

Completed

Mohan Potheri

Janet Morss (Dell)

 

Dynamic infrastructure for HPC on demand

Project Multiverse originates from the OCTO HPC team and implements a virtualized HPC framework for dynamic VM creation/destruction based on user jobs. In other words, it is a VM per job model which spawns individual VMs on demand for every incoming job in an HPC Cluster.

Blog: Project Multiverse

Paper: IEEE Proceedings Paper

SC2

2020 Q2

Completed

Michael Cui



 

HPC Reference Architecture update for Dell

High Performance Computing (HPC) workloads have been traditionally run on bare-metal, non-virtualized clusters. Virtualization was often seen as an additional layer that leads to performance degradation. Performance studieshave shown that virtualization often has minimal impact on HPC application performance.

Paper: Virtualized HPC Reference Architecture

SC2

2020 Q2

Completed

Mohan Potheri

Janet Morss (Dell)

 

TKG with UberCloud HPC Containers

Deploy and validate the UberCloud Container platform solution on VMware TKG Plus and TKGI.

Blog: TBD

Paper: TBD

SC2

2020 Q3

In Progress

Mohan Potheri

Daniel Gruber (UberCloud)

 

 

Integration between Bitfusion, TKG and PKS Enterprise

There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters.

Blog: Part 1, Part 2

Video: VMware TKG Integration with Bitfusion

SC2

2020 Q3

Completed

Mohan Potheri

 

VMware Tanzu Kubernetes on FlashBlade® for ML/HPC Applications

There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters.

Blog: https://blog.purestorage.com/purely-technical/vmware-tanzu-kubernetes-on-flashblade/

=

SC2

2020 Q3

Completed

Mohan Potheri

Bikash Choudhury

 

NSX Micro-segmentation and its effect on HPC workloads

NSX Micro-segmentation can be leveraged for multi-tenancy for HPC workloads. This study measures any impact it would have on Network Latency

Blog: https://blogs.vmware.com/apps/2020/04/highlights-of-new-features-and-improvements-in-vsphere-7-vsan-7-nsx-t-3-for-high-performance-computing-and-machine-learning.html

Paper: TBD

SC2

2020 Q3

Completed

Michael Cui

Na Zhang

 

NVIDIA & VMware Keynote A100 compute

Collaborative project to showcase Covid related computing and imaging for use in VMworld Keynote

Keynote video: https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1598654132634001BSxt

SC2

2020 Q3

Completed

Mohan Potheri

Justin Murray 

 

Multi Instance GPU with vSphere 7

Collaborative project to showcase MIG capabilities with NVIDIA A100 GPU

https://blogs.vmware.com/vsphere/2020/09/vmware-vsphere-7-u1-with-nvidia-multi-instance-gpus-mig-for-machine-learning-applications.html

https://blogs.vmware.com/apps/2020/09/vsphere-7-0-u1-with-multi-instance-gpus-mig-on-the-nvidia-a100-for-machine-learning-applications-part-2-profiles-and-setup.html

SC2

2020 Q3

Completed

Justin Murray 

 

NVIDIA & VMware Keynote Smart NICs

Collaborative project to showcase smart NICs from Mellanox with GPUs on vSphere

Blog: https://blogs.vmware.com/vsphere/2020/09/announcing-project-monterey-redefining-hybrid-cloud-architecture.html

SC2

2020 Q3

Completed

Niels Hagoort

 

Bitfusion Use Cases

Multiple use cases and feature showcase

https://blogs.vmware.com/vsphere/2020/11/bitfusion-2-5-0-release-gives-you-the-features-you-want.html

https://blogs.vmware.com/vsphere/2020/09/bitfusion-client-service-lightweight-gpuaas.html

https://blogs.vmware.com/vsphere/2020/08/bitfusion-jupyter-integration-its-full-of-stars.html

https://blogs.vmware.com/vsphere/2020/06/ai-ml-vsphere-bitfusion-and-docker-containers-a-sparkling-refreshment-for-modern-apps.html

SC2

2020 Q2, Q3 & Q4

Completed

James Brogan

 

Virtualizing ParallelWorks

Parallel Works POC on vSphere to prove out capabilities

No Publications

SC2

2020

Completed

Mohan Potheri

Parallel Works

 

Virtualizing Algorithmia

Validation of Algorithmia ML platform on vSphere

VMLive Video: https://www.youtube.com/watch?v=PKvkWe8pzPo

SC2

2020 Q3

Completed

Algorithmia

Justin Murray 

 

Greenplum integration with vSphere

VMware Tanzu Greenplum is a massively parallel processing (MPP) data platform, based on the open-source Greenplum Database project, designed to run the full gamut of analytical workloads from business intelligence (BI) to artificial intelligence (AI). Enterprise data lives and grows throughout an organization, and it is suboptimal to copy large data sets between different systems because they aren’t able to perform fast enough, scale high enough or offer the right features. Tanzu Greenplum brings compute to where the data lives.

VMworld 2020 Sessions

https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1586467547979001ehEa

https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1589580297282001SUMh

Blogs: https://tanzu.vmware.com/content/blog/analytic-workloads-bi-ai-vmware-tanzu-greenplum

SC2

2020 Q3

Completed

Domino Valdano

Frank McQuillan

Oracle

On Demand Scaling up / Scaling down Storage resources for Oracle production workloads – Hot Add and Hot Remove non-clustered Disks

On Demand Scaling up / Scaling down Storage resources for Oracle production workloads – Hot Add and Hot Remove non-clustered Disks

Blog https://blogs.vmware.com/apps/2020/08/hot-add-hot-remove-disks.html

SC2

2020 Q3

Completed

Sudhir Balasubramanian

Oracle

On Demand hot extend non-clustered Oracle Disks online without downtime – Hot Extend Disks

On Demand hot extend non-clustered Oracle Disks online without downtime – Hot Extend Disks

Blog https://blogs.vmware.com/apps/2020/08/on-demand-hot-extend-oracle-disks-online-without-downtime-hot-extend-disks.html

SC2

2020 Q3

Completed

Sudhir Balasubramanian

Oracle

To NUMA or not to NUMA – Oracle workloads and NUMA

To NUMA or not to NUMA – Oracle workloads and NUMA

Blog

SC2

2020 Q3

Completed

Sudhir Balasubramanian

Oracle

PVSCSI Controllers and Queue Depth – Accelerating performance for Oracle Workloads

PVSCSI Controllers and Queue Depth – Accelerating performance for Oracle Workloads

Blog

SC2

2020 Q3

Completed

Sudhir Balasubramanian

Oracle

PVSCSI Controllers and Queue Depth – ASM SAME and Oracle Workloads

PVSCSI Controllers and Queue Depth – ASM SAME and Oracle Workloads

Blog

SC2

2020 Q3

Completed

Sudhir Balasubramanian

 

Data for Good (Part of the VMware Tech for Good Initiative.)

Nonprofits and NGOs have accumulated tremendous amount of data over the past decades. Their data has great potential to solve many of the world’s most pressing problems. These organizations lack the compute & storage capacity along the technology to analyze the data that they have accumulated many of the world’s pressing problems. vSphere is a great hybrid platform for data analytics with its support for high performance networking and accelerators like GPU. VMware Solutions Lab has a robust hybrid cloud infrastructure for high performance data analytics, which it will make available and free to use for nonprofits. Common data analytics problems (Kaggle Competitions) will be validated to run well in the lab infrastructure. VMware vMLP platform would be leveraged for this project.

Blog:

 

Video: In Progress

SC2

2020 Q4

Completed

Mohan Potheri

Andrii Nevarov

 

 

Apache Spark 3 with NVIDIA GPU based analytics

Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is now a native option for Spark resource manager. By packaging Spark application as a container, you can reap the benefits of containers because you package your dependencies along with your application as a single entity. Given the “embarrassingly parallel” nature of many data processing tasks, GPUs can be of tremendous benefit. This project will combine VMware Tanzu, Apache Spark and NVIDIA GPU capabilities to showcase accelerated Apache Spark 3.0.

Blogs:

Video:

Paper: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/vmw-gpu-accelerated-spark.pdf

Talks: https://gtc21.event.nvidia.com/media/Apache Spark Acceleration over VMware’s Tanzu with NVIDIA’s GPU and Networking Solutions [S31710]/1_iaacamwx

SC2

2020 Q4

Completed

Mohan Potheri

 

Nimbix on VMware Cloud

Running HPC apps on Nimbix with multi-cloud capabilities

Video Demo: https://youtu.be/Yam83dK3ypg

SC2

2020 Q4

Completed

Michael Cui

Na Zhang

 

SQL Server memory sizing

Monitoring and Rightsizing Memory Resource for virtualized SQL Server Workloads

Blogs:

SC2

2020 Q4

Completed

Oleg Ulyanov

 

Oracle

Backing up Oracle Workloads (RAC & Non-RAC) with VMware Snapshot Technology

Backing up Oracle Workloads (RAC & Non-RAC) with VMware Snapshot Technology

Blog

SC2

2021 Q1

Completed

Sudhir Balasubramanian

Oracle

No downtime Storage vMotion of Oracle RAC Cluster using shared vmdk’s with multi-writer attribute from one vSAN to another vSAN Cluster using VMware HCI Mesh

No downtime Storage vMotion of Oracle RAC Cluster using shared vmdk’s with multi-writer attribute from one vSAN to another vSAN Cluster using VMware HCI Mesh

Blog

SC2

2021 Q1

Completed

Sudhir Balasubramanian

 

DBaas with Scalegrid

Showcase partner capabilities of Scalegrid for end to end Database as a service capabilities.

Blogs: In progress

Video: https://youtu.be/LxR6RPWWxpE

SC2

2021 Q1

Completed

Mohan Potheri

 

 

Deploy a multi-tiered Tanzu based solution across clouds and manage with Tanzu Mission control and Tanzu Service Mesh.

Funded project for Tanzu Multi-Cloud with a distributed modern application

Paper: In progress

Blogs: In progress

Video Demo: In progress

SC2

2021 Q1

In Progress

Mohan Potheri

 

 

Bitfusion 3.0 Solution and use cases

Use cases for Bitfusion 3.0 lunch

Blogs:

 

SC2

2021 Q1

Completed

Jim Brogan

 

 

Assignable HW in vSphere 7

Demonstration of assignable HW with vSphere 7

Video:

SC2

2021 Q1

Completed

Niels Hagoort

 

Workloads in VMware Cloud

Consistent Workload Performance for Enterprise Apps in VMware Multi-Cloud

Blogs:

SC2

2021 Q1

Completed

Oleg Ulyanov

 

NVIDIA GPU features in vSphere 7 Update 2

Multiple Machine Learning Workloads Using NVIDIA GPUs: New Features in vSphere 7 Update 2

Blogs:

SC2

2021 Q1

Completed

Justin Murray

 

 

Workloads in VMware Cloud

This solution show cases a multi-cloud deployment of a distributed application leveraging Tanzu Kubernetes Grid. The multi cloud TKG solution is deployed in a distributed fashion across two different cloud environments that includes a VMC on AWS SDDC in Oregon and VMC on Dell EMC SDDC in Santa Clara. Tanzu Mission Control and Tanzu service mesh described below are used to operationalize, secure and manage the environment.

In part 1 of the blog series we look at the challenges faced by organization leveraging Kubernetes across multi-cloud environments and looked at some of the components of the VMware Tanzu portfolio.

In part 2 of the blog series, we look at the components of the Multi-Cloud solution and their deployment.In this part of the blog series, we will look at deployment of the application and the workings of the solution. I

In this part 3 of the blog series, we look at deployment of the application and the workings of the solution.

Video of Solution is here:

SC2

2021 Q2

Completed

Mohan Potheri

 

Accelerated Distributed AI/ML training with PVRDMA

Deploy Horovod with PVRDMA for distributed machine learning training

Paper: In progress

VMworld Session: TBD

SC2

2021 Q2

In Progress

Mohan Potheri

PVRDMA Team

 

Training on-prem and inference in AWS

Collaborative project with AWS team on AI/ML. Infrastructure across multi-cloud

Blogs: TBD

VMworld Session: TBD

SC2

2021 Q2

In Progress

Mohan Potheri

Sahil Thapar (AWS)

Oracle

Virtualizing Oracle Workloads with VMware vSphere Virtual Volumes on VMware Hybrid Cloud

Oracle on VMware vVols using Pure Storage

Paper Completed

SC2

2021 Q1

Completed

Sudhir Balasubramanian,Jason Massae

Oracle

Oracle VMware Hybrid Cloud Business Continuity and Disaster Recovery Guide

Oracle BC & DR using Pure Storage

Paper in progress

SC2

2021 Q2

In Progress

Sudhir Balasubramanian, Cato Grace,Jason Massae

Oracle

Deploying Oracle Workloads on vSAN HCI Mesh Compute Cluster – Disaggregating Compute and Storage

Oracle on vSAN HCI Mesh

Blog completed

SC2

2021 Q2

Completed

Sudhir Balasubramanian

Oracle

On Demand hot extend clustered vmdk's online without downtime – Hot Extend RAC clustered disks

Oracle RAC blog

Blog completed

SC2

2021 Q2

Completed

Sudhir Balasubramanian

Oracle

Reclaiming dead space from Oracle databases on VMware Hybrid Platform

Oracle Blog

Blog completed

SC2

2021 Q2

Completed

Sudhir Balasubramanian