VSLAB Projects (2020-2021)
This is a list of items that are completed or in progress.
Area | Project | Description | Planned Deliverables | Location | Start Quarter | Status | Owner (type "@" to write owner) |
Oracle | Oracle Experts event DOAG Germany | Training was conducted for Oracle Experts in DOAG. VSLAB was used for demo and hands on exercises. | Labs and Workshops | SC2 | 2020 Q1 | Completed | |
Oracle | VMware and Rocky Mountain Oracle User Group (RMOUG) - Training Days, Feb 18th, 2020 | VMware and Rocky Mountain Oracle User Group (RMOUG) - Training Days, Feb 18th, 2020 | Sessions / Workshop | SC2 | 2020 Q1 | Completed | |
Oracle | VMware and Oracle together - A brave World – Oracle Cloud Days, 2020 | VMware and Oracle together - A brave World – Oracle Cloud Days, 2020 | Sessions / Workshop | SC2 | 2020 Q1 | Completed | |
| PCIE Device placement and NUMA | Device locality and NUMA placement on vSphere | Blogs: https://frankdenneman.nl/2020/01/10/pcie-device-numa-node-locality/ | SC2 | 2020 Q1 | Completed | Frank Denneman |
| Pure Storage Orchestrator with PKS | Leverage PSO for seamless kubernetes storage provisioning and management | Blogs: https://blog.purestorage.com/jupyter-flash-kubernetes/ | SC2 | 2020 Q1 | In Progress | Pure Storage |
| Distributed Machine Learning with VMware Bitfusion | Showcase Distributed ML with VMware Bitfusion | Video: https://youtu.be/ET2j_zP1_iM Paper: TBD | SC2 | 2020 Q1 | Completed | InterraIT |
| Confluent with VMware vSphere | Validate Confluent solution with Apache Kafka on vSphere | Blogs: TBD Video: TBD | SC2 | 2020 Q1 | In Progress | |
| Dotscience with VMware vSphere | Validate Dotscience solution for ML/AI end to end workloads | Blogs: Kafka on vSphere with Kubernetes leveraging Confluent Operator | SC2 | 2020 Q1 | Completed | |
| Research with VMware Research Group | VMware Research is working on a research project on multi-application scheduling for ML and looking for a GPU cluster | Blogs: None | SC2 | 2020 Q1 | Completed | Sangeetha Jyothi |
| Reniac with FPGA Acceleration | Collaborative project with Reniac, Intel and VMware for FPGA based acceleration | Blog: Accelerating Virtualized & Distributed Cassandra databases with FPGAs | SC2 | 2020 Q3 | Completed |
|
| ML/AI Reference Architecture in collaboration with Dell | This Reference Architecture describes a VMware Cloud Foundation based solution for Machine Learning environments with GPUs. The solution combines VMware virtualization and container orchestration with the latest hardware innovations to provide robust infrastructure for machine learning applications. | Paper: Sharing GPUs in ML/AI environments. (A Reference Architecture with DTC) | SC2 | 2020 Q2 | Completed | Janet Morss (Dell) |
| Dynamic infrastructure for HPC on demand | Project Multiverse originates from the OCTO HPC team and implements a virtualized HPC framework for dynamic VM creation/destruction based on user jobs. In other words, it is a VM per job model which spawns individual VMs on demand for every incoming job in an HPC Cluster. | Blog: Project Multiverse Paper: IEEE Proceedings Paper | SC2 | 2020 Q2 | Completed | |
| HPC Reference Architecture update for Dell | High Performance Computing (HPC) workloads have been traditionally run on bare-metal, non-virtualized clusters. Virtualization was often seen as an additional layer that leads to performance degradation. Performance studieshave shown that virtualization often has minimal impact on HPC application performance. | SC2 | 2020 Q2 | Completed | Janet Morss (Dell) | |
| TKG with UberCloud HPC Containers | Deploy and validate the UberCloud Container platform solution on VMware TKG Plus and TKGI. | Blog: TBD Paper: TBD | SC2 | 2020 Q3 | In Progress | Daniel Gruber (UberCloud)
|
| Integration between Bitfusion, TKG and PKS Enterprise | There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters. | SC2 | 2020 Q3 | Completed | ||
| VMware Tanzu Kubernetes on FlashBlade® for ML/HPC Applications | There is a need for modern application developers and data scientists to leverage Kubernetes and be able to access GPUs for their training and other needs. It can be very cost prohibitive to provide these developers individually with GPUs. vSphere Bitfusion provides access over the network to GPUs that are aggregated into a dedicated vSphere cluster or resource pool. The solution shows the integration between three different types of Tanzu Kubernetes with Bitfusion that includes TKG Guest, TKG Supervisor & TKGI clusters. | Blog: https://blog.purestorage.com/purely-technical/vmware-tanzu-kubernetes-on-flashblade/ = | SC2 | 2020 Q3 | Completed | Bikash Choudhury |
| NSX Micro-segmentation and its effect on HPC workloads | NSX Micro-segmentation can be leveraged for multi-tenancy for HPC workloads. This study measures any impact it would have on Network Latency | Paper: TBD | SC2 | 2020 Q3 | Completed | Michael Cui Na Zhang |
| NVIDIA & VMware Keynote A100 compute | Collaborative project to showcase Covid related computing and imaging for use in VMworld Keynote | Keynote video: https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1598654132634001BSxt | SC2 | 2020 Q3 | Completed | |
| Multi Instance GPU with vSphere 7 | Collaborative project to showcase MIG capabilities with NVIDIA A100 GPU | SC2 | 2020 Q3 | Completed | ||
| NVIDIA & VMware Keynote Smart NICs | Collaborative project to showcase smart NICs from Mellanox with GPUs on vSphere | SC2 | 2020 Q3 | Completed | Niels Hagoort | |
| Bitfusion Use Cases | Multiple use cases and feature showcase | https://blogs.vmware.com/vsphere/2020/09/bitfusion-client-service-lightweight-gpuaas.html https://blogs.vmware.com/vsphere/2020/08/bitfusion-jupyter-integration-its-full-of-stars.html | SC2 | 2020 Q2, Q3 & Q4 | Completed | James Brogan |
| Virtualizing ParallelWorks | Parallel Works POC on vSphere to prove out capabilities | No Publications | SC2 | 2020 | Completed | Parallel Works |
| Virtualizing Algorithmia | Validation of Algorithmia ML platform on vSphere | VMLive Video: MLOPS on VMware Cloud with Algorithmia | SC2 | 2020 Q3 | Completed | Algorithmia |
| Greenplum integration with vSphere | VMware Tanzu Greenplum is a massively parallel processing (MPP) data platform, based on the open-source Greenplum Database project, designed to run the full gamut of analytical workloads from business intelligence (BI) to artificial intelligence (AI). Enterprise data lives and grows throughout an organization, and it is suboptimal to copy large data sets between different systems because they aren’t able to perform fast enough, scale high enough or offer the right features. Tanzu Greenplum brings compute to where the data lives. | VMworld 2020 Sessions https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1586467547979001ehEa https://www.vmworld.com/en/video-library/video-landing.html?sessionid=1589580297282001SUMh Blogs: https://tanzu.vmware.com/content/blog/analytic-workloads-bi-ai-vmware-tanzu-greenplum | SC2 | 2020 Q3 | Completed | Domino Valdano Frank McQuillan |
Oracle | On Demand Scaling up / Scaling down Storage resources for Oracle production workloads – Hot Add and Hot Remove non-clustered Disks | On Demand Scaling up / Scaling down Storage resources for Oracle production workloads – Hot Add and Hot Remove non-clustered Disks | Blog https://blogs.vmware.com/apps/2020/08/hot-add-hot-remove-disks.html | SC2 | 2020 Q3 | Completed | Sudhir Balasubramanian |
Oracle | On Demand hot extend non-clustered Oracle Disks online without downtime – Hot Extend Disks | On Demand hot extend non-clustered Oracle Disks online without downtime – Hot Extend Disks | SC2 | 2020 Q3 | Completed | Sudhir Balasubramanian | |
Oracle | To NUMA or not to NUMA – Oracle workloads and NUMA | To NUMA or not to NUMA – Oracle workloads and NUMA | Blog https://blogs.vmware.com/apps/2020/08/to-numa-or-not-to-numa-oracle-workloads-and-numa.html | SC2 | 2020 Q3 | Completed | Sudhir Balasubramanian |
Oracle | PVSCSI Controllers and Queue Depth – Accelerating performance for Oracle Workloads | PVSCSI Controllers and Queue Depth – Accelerating performance for Oracle Workloads | Blog https://blogs.vmware.com/apps/2020/09/pvscsi_queue_oracle_asm_disk.html | SC2 | 2020 Q3 | Completed | Sudhir Balasubramanian |
Oracle | PVSCSI Controllers and Queue Depth – ASM SAME and Oracle Workloads | PVSCSI Controllers and Queue Depth – ASM SAME and Oracle Workloads | Blog https://blogs.vmware.com/apps/2020/09/pvscsi-asm-same-oracle-workloads.html | SC2 | 2020 Q3 | Completed | Sudhir Balasubramanian |
| Data for Good (Part of the VMware Tech for Good Initiative.) | Nonprofits and NGOs have accumulated tremendous amount of data over the past decades. Their data has great potential to solve many of the world’s most pressing problems. These organizations lack the compute & storage capacity along the technology to analyze the data that they have accumulated many of the world’s pressing problems. vSphere is a great hybrid platform for data analytics with its support for high performance networking and accelerators like GPU. VMware Solutions Lab has a robust hybrid cloud infrastructure for high performance data analytics, which it will make available and free to use for nonprofits. Common data analytics problems (Kaggle Competitions) will be validated to run well in the lab infrastructure. VMware vMLP platform would be leveraged for this project. |
Video: In Progress | SC2 | 2020 Q4 | Completed |
|
| Apache Spark 3 with NVIDIA GPU based analytics | Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is now a native option for Spark resource manager. By packaging Spark application as a container, you can reap the benefits of containers because you package your dependencies along with your application as a single entity. Given the “embarrassingly parallel” nature of many data processing tasks, GPUs can be of tremendous benefit. This project will combine VMware Tanzu, Apache Spark and NVIDIA GPU capabilities to showcase accelerated Apache Spark 3.0. | Video: Accelerated Apache Spark 3 on VMware Tanzu for XGBOOST Accelerated Transactions processing leveraging Virtualized Apache Spark and NVIDIA GPUs | SC2 | 2020 Q4 | Completed | |
| Nimbix on VMware Cloud | Running HPC apps on Nimbix with multi-cloud capabilities | Video Demo: https://youtu.be/Yam83dK3ypg | SC2 | 2020 Q4 | Completed | |
| SQL Server memory sizing | Monitoring and Rightsizing Memory Resource for virtualized SQL Server Workloads | SC2 | 2020 Q4 | Completed | Oleg Ulyanov
| |
Oracle | Backing up Oracle Workloads (RAC & Non-RAC) with VMware Snapshot Technology | Backing up Oracle Workloads (RAC & Non-RAC) with VMware Snapshot Technology | SC2 | 2021 Q1 | Completed | Sudhir Balasubramanian | |
Oracle | No downtime Storage vMotion of Oracle RAC Cluster using shared vmdk’s with multi-writer attribute from one vSAN to another vSAN Cluster using VMware HCI Mesh | No downtime Storage vMotion of Oracle RAC Cluster using shared vmdk’s with multi-writer attribute from one vSAN to another vSAN Cluster using VMware HCI Mesh | Blog https://blogs.vmware.com/apps/2021/02/no-dt-svmotion-oracle-rac-vmware-hci-mesh.html | SC2 | 2021 Q1 | Completed | Sudhir Balasubramanian |
| DBaas with Scalegrid | Showcase partner capabilities of Scalegrid for end to end Database as a service capabilities. | Blogs: In progress Video: https://youtu.be/LxR6RPWWxpE | SC2 | 2021 Q1 | Completed |
|
| Deploy a multi-tiered Tanzu based solution across clouds and manage with Tanzu Mission control and Tanzu Service Mesh. | Funded project for Tanzu Multi-Cloud with a distributed modern application | Paper: In progress Blogs: In progress Video Demo: In progress | SC2 | 2021 Q1 | In Progress |
|
| Bitfusion 3.0 Solution and use cases | Use cases for Bitfusion 3.0 lunch | Blogs: https://core.vmware.com/blog/bitfusion-300-unleashed https://core.vmware.com/resource/bitfusion-compatibility-and-interoperability https://core.vmware.com/resource/running-multiple-applications-one-gpu-vsphere-bitfusion
| SC2 | 2021 Q1 | Completed | Jim Brogan
|
| Assignable HW in vSphere 7 | Demonstration of assignable HW with vSphere 7 | SC2 | 2021 Q1 | Completed | Niels Hagoort | |
| Workloads in VMware Cloud | Consistent Workload Performance for Enterprise Apps in VMware Multi-Cloud | Blogs: https://blogs.vmware.com/apps/2021/03/workload-vmware-multi-cloud.html | SC2 | 2021 Q1 | Completed | Oleg Ulyanov |
| NVIDIA GPU features in vSphere 7 Update 2 | Multiple Machine Learning Workloads Using NVIDIA GPUs: New Features in vSphere 7 Update 2 | SC2 | 2021 Q1 | Completed | Justin Murray
| |
| Workloads in VMware Cloud | This solution show cases a multi-cloud deployment of a distributed application leveraging Tanzu Kubernetes Grid. The multi cloud TKG solution is deployed in a distributed fashion across two different cloud environments that includes a VMC on AWS SDDC in Oregon and VMC on Dell EMC SDDC in Santa Clara. Tanzu Mission Control and Tanzu service mesh described below are used to operationalize, secure and manage the environment. | In part 1 of the blog series we look at the challenges faced by organization leveraging Kubernetes across multi-cloud environments and looked at some of the components of the VMware Tanzu portfolio. In part 2 of the blog series, we look at the components of the Multi-Cloud solution and their deployment.In this part of the blog series, we will look at deployment of the application and the workings of the solution. I In this part 3 of the blog series, we look at deployment of the application and the workings of the solution. Video of Solution is here: | SC2 | 2021 Q2 | Completed | Mohan Potheri |
| Accelerated Distributed AI/ML training with PVRDMA | Deploy Horovod with PVRDMA for distributed machine learning training | Paper: In progress VMworld Session: TBD | SC2 | 2021 Q2 | In Progress | PVRDMA Team |
| Training on-prem and inference in AWS | Collaborative project with AWS team on AI/ML. Infrastructure across multi-cloud | Blogs: TBD VMworld Session: TBD | SC2 | 2021 Q2 | In Progress | Sahil Thapar (AWS) |
Oracle | Virtualizing Oracle Workloads with VMware vSphere Virtual Volumes on VMware Hybrid Cloud | Oracle on VMware vVols using Pure Storage | Paper Completed | SC2 | 2021 Q1 | Completed | Sudhir Balasubramanian,Jason Massae |
Oracle | Oracle VMware Hybrid Cloud Business Continuity and Disaster Recovery Guide | Oracle BC & DR using Pure Storage | Paper in progress | SC2 | 2021 Q2 | In Progress | Sudhir Balasubramanian, Cato Grace,Jason Massae |
Oracle | Deploying Oracle Workloads on vSAN HCI Mesh Compute Cluster – Disaggregating Compute and Storage | Oracle on vSAN HCI Mesh | Blog completed | SC2 | 2021 Q2 | Completed | Sudhir Balasubramanian |
Oracle | On Demand hot extend clustered vmdk's online without downtime – Hot Extend RAC clustered disks | Oracle RAC blog | Blog completed | SC2 | 2021 Q2 | Completed | Sudhir Balasubramanian |
Oracle | Reclaiming dead space from Oracle databases on VMware Hybrid Platform | Oracle Blog | Blog completed | SC2 | 2021 Q2 | Completed | Sudhir Balasubramanian |