Might Have Bitten Off More Than I Can Chew With VM System Build - Please Help

Jan 13, 2022
52
1
11
Hello,

My company requested that I create a VM Server Host (to house 12, Windows 11 Pro and Linux guest VM's) and late into the quote process, I think I may have made a big mistake in the design of the system.

As for myself, I do not work in IT, but instead am a QA Hardware and Software Manager.

I work for a small company (only 9 individuals), and because of that, we don't have anyone who is knowledgeable in this.

I've been working for weeks to get quotes for all the hardware and software to run on the VM Server Host (and finally got everything figured out), but think the VM software itself either can't:

1. Do what I need it to do.

2. Will cost nearly as much as the VM Server itself to properly license.

The server configuration is the following (please see image below or attachment):


I actually ended up speaking to a VMWare rep (though he seemed a bit young and wasn't 100% certain about everything), but mentioned that I'd only need something like VMWare Workstation Pro for:

A. A dual socket 128 core system (2x 64 core CPU).

B. vGPU sharing of the 2x Nvidia A40 (up to 8 GB for each VM).

C. Direct PCIe connectivity for the following cards:

 C1. DekTec DTA-2179 (for VM - 1 plus 8 GB of vGPU).

 C2. DekTec DTA-2179 (for VM - 2 plus 8 GB of vGPU ).

 C3. Aja Kona 5 (for VM - 3 plus 8 GB of vGPU ).

 C4. Aja Kona 5 (for VM - 4 plus 8 GB of vGPU ).

 C5. Mellanox ConnectX-6 (for VM - 5 plus 8 GB of vGPU ).

 C6. Mellanox ConnectX-6 (for VM - 6).

But the more I read about direct PCIe, the more I see that it looks like it's meant for only certain PCIe cards (e.g., graphics and network) and not general PCIe cards (like the DekTec DTA-2179 and Aja Kona which are 8K and 4K video/analysis cards).

Furthermore, it looks like I have to jump to more expensive VMWare licenses (in order for VMWare to work with all 128 cores), which jumps from $250.00 (for VMWare Workstation Pro) to something like VSphere or VCenter (which though not too bad), gets really expensive when you get to more than 32 core setup (the cost jumped to something like $40,000.00 which is nearly half the cost of all the system hardware).

Given that I've spent weeks getting all the hardware and software figured out (and thought I understood the VM-side), I'm extremely worried that the system budget will increase by 1.5x (just to add the VM software licensing cost).

Am I going about this wrong (can something like VMWare Workstation Pro work for my setup) or do I really need something extremely expensive like VSphere or VCenter to virtualize 12 or so VM's for a single host, with vGPU and direct PCIe connectivity (for certain VM's)?

My apologies for the long post, and a huge thank you to anyone who could steer me in the right direction.

Nelson
 

Attachments

  • VM Server Configuration.png
    VM Server Configuration.png
    2.7 MB · Views: 7

Kiska

Golden Member
Apr 4, 2012
1,013
290
136
Welcome!

It seems like you're in the wrong subforum. You'd want vSphere if the machine is going to be mission critical, Workstation Pro doesn't quite have features like remote management of the host among other things and auto restart of VMs if the host for some reason goes offline(eg power outage).

If the price tag of vSphere scares you, maybe consider Proxmox. But yes you'll require ESXi to do PCIe passthrough.

On the other side are you certain you'll require 128 cores, you could purchase now with something less(and with a cheaper VMWare license) and then upgrade further down the line. AMD does produce Epyc CPUs with <64 cores each.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,560
14,515
136
Welcome!

It seems like you're in the wrong subforum. You'd want vSphere if the machine is going to be mission critical, Workstation Pro doesn't quite have features like remote management of the host among other things and auto restart of VMs if the host for some reason goes offline(eg power outage).

If the price tag of vSphere scares you, maybe consider Proxmox. But yes you'll require ESXi to do PCIe passthrough.

On the other side are you certain you'll require 128 cores, you could purchase now with something less(and with a cheaper VMWare license) and then upgrade further down the line. AMD does produce Epyc CPUs with <64 cores each.
@Kiska , what subforum would you recommend ? I can move it.
 

Ajay

Lifer
Jan 8, 2001
15,454
7,862
136
We don't have a 'correct' forum for this. I'd suggest Operating Systems. Computer Building is an option, though not usually for stuff this complex.
It comes down to the hypervisor the OP wants to use. Though I haven't used Proxmox, I think @Kiska is sending the OP down the right path, for a reduced cost system.
A basic or standard subscription might prove useful in getting direct support for your setup, rather than spending a lot of time digging into forums posts, etc.
 
Jan 13, 2022
52
1
11
Welcome!

It seems like you're in the wrong subforum. You'd want vSphere if the machine is going to be mission critical, Workstation Pro doesn't quite have features like remote management of the host among other things and auto restart of VMs if the host for some reason goes offline(eg power outage).

If the price tag of vSphere scares you, maybe consider Proxmox. But yes you'll require ESXi to do PCIe passthrough.

On the other side are you certain you'll require 128 cores, you could purchase now with something less(and with a cheaper VMWare license) and then upgrade further down the line. AMD does produce Epyc CPUs with <64 cores each.

Hello Kiksa,

Thank you for your input.

I've been doing the numbers, and as crazy as it sounds, it's actually cheaper to just build 10 to 12 physical system systems, rather than create one extremely powerful system, and use software like VMWare.

It's just unfortunate that I came to this realization after 4 to 5 weeks of intensive research.

Thanks again for your help,
Nelson
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,672
578
126
I know this is the wrong section but I think you indeed need to take a heavy step back and look at what you're trying to do here. For one thing, you are talking about Enterprise Level features on Enterprise grade hardware that comes with Enterprise level pricing costs. You mention using NVIDIA vGPU.

NVIDIA vGPU is a licensed feature from NVIDIA. You can't just get an A40 and do it. You need to license it from NVIDIA, stand up a licensing server, and apply that to your system. Regularly.

Additionally NVIDIA only supports 2 platforms for vGPU. VMware vSphere, and Citrix Hypervisor. No Proxmox, no Hyper-V, no AHV or anything like that. Specifically for VMware, VMware only supports NVIDIA vGPU on vSphere. VMware Workstation doesn't even enter into this. On top of THAT, VMware vSphere only supports NVIDIA GPU on Enterprise Plus Licensing. That is a list of price of nearly $18,000 for vSphere licensing for 128 cores plus you'll need to pay ongoing support costs. Which you need because NVIDIA vGPU will only support specific vSphere versions, and you'll need ongoing support to keep patching your hosts to keep up and get bug fixes.

That doesn't even broach the subject of how are you going to break out this Storage? What about redundancy? What about backups?

Yes, the software is as expensive and sometimes more so than the hardware because Hardware is a commodity and no longer a differentiator. You should get very comfortable with the idea that your IT budget may need to allocate more funding for the software running on your servers than the hardware itself.

This doesn't broach complex topics like Bug Fixing, Support entitlement, and firmware bugs that leaves your server in a No-Boot scenario. And why 200Gbps Ethernet? Do you have 200Gbps switches? This system alone would be considered a complex configuration for any admin to consider supporting, especially since it's a Single Point of Failure (why not divide this between 2 or even 3 servers?). And with licensing costs its worth considering why even have this solution at all vs. individual terminals?

This sort of solution needs an experienced consultant, because there is a lot to go wrong here and you're correct to be nervous.
 
Jan 13, 2022
52
1
11
I know this is the wrong section but I think you indeed need to take a heavy step back and look at what you're trying to do here. For one thing, you are talking about Enterprise Level features on Enterprise grade hardware that comes with Enterprise level pricing costs. You mention using NVIDIA vGPU.

NVIDIA vGPU is a licensed feature from NVIDIA. You can't just get an A40 and do it. You need to license it from NVIDIA, stand up a licensing server, and apply that to your system. Regularly.

Additionally NVIDIA only supports 2 platforms for vGPU. VMware vSphere, and Citrix Hypervisor. No Proxmox, no Hyper-V, no AHV or anything like that. Specifically for VMware, VMware only supports NVIDIA vGPU on vSphere. VMware Workstation doesn't even enter into this. On top of THAT, VMware vSphere only supports NVIDIA GPU on Enterprise Plus Licensing. That is a list of price of nearly $18,000 for vSphere licensing for 128 cores plus you'll need to pay ongoing support costs. Which you need because NVIDIA vGPU will only support specific vSphere versions, and you'll need ongoing support to keep patching your hosts to keep up and get bug fixes.

That doesn't even broach the subject of how are you going to break out this Storage? What about redundancy? What about backups?

Yes, the software is as expensive and sometimes more so than the hardware because Hardware is a commodity and no longer a differentiator. You should get very comfortable with the idea that your IT budget may need to allocate more funding for the software running on your servers than the hardware itself.

This doesn't broach complex topics like Bug Fixing, Support entitlement, and firmware bugs that leaves your server in a No-Boot scenario. And why 200Gbps Ethernet? Do you have 200Gbps switches? This system alone would be considered a complex configuration for any admin to consider supporting, especially since it's a Single Point of Failure (why not divide this between 2 or even 3 servers?). And with licensing costs its worth considering why even have this solution at all vs. individual terminals?

This sort of solution needs an experienced consultant, because there is a lot to go wrong here and you're correct to be nervous.

Hello TheCoolnessRune,

You are absolutely correct, and as mentioned before, it's actually much cheaper to build 10 to 12 systems then virtualize everything (which is crazy when you think about it).

I will be going with multiple physical systems.

Thank you,
Nelson
 

crashtech

Lifer
Jan 4, 2013
10,524
2,111
146
I watched this thread because I was looking at the complexity of this setup and wondered the same thing myself. I'm almost disappointed though, it would have really been something to see.
 

JackMDS

Elite Member
Super Moderator
Oct 25, 1999
29,471
387
126
I am sorry to say it this way.

But the content above suggested that The OP is Not with the knowledge to build such thing, and better off seeking Professional help, or assign the task to some one else.

A task with such magnitude it is Not conducive to level of online forum.


:cool: