Why does vMotion require L2 adjacency?

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
Is it using some kind of broadcast heartbeat? Trying to maximize efficiency by eliminating L3/L4 headers and TCP checksum/acks? I don't get why it has to be L2 adjacent. What does L3/L4 really look like on the protocol?
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Or like most programmers, they assume bandwidth is infinite.:D

From the testing I did and my 'minimal' understanding (IE I didn't design it so)... There is some timing tolerances in there that can make vMotion transfer fail. In a properly developed network you shouldn't have issues assuming you can support what is considered the 'minimum' transfer rate which is north of 500Mbps and the max latency (which I never locked down that well.) One of the tests I did I put 2 managed switches attached to the test machines. I did auto (1000) from the switch to the vmotion ports and then configured a 10meg/full between the switches. vmotion would consistently fail to move the machine and the frozen machine on the initial server would start taking requests again.

Using that information, it is not a stretch to assume that most people don't have routers that can do the packets per second to maintain the transfer rate. Like the linked article mentioned, it can find the other host via a routed IP but I didn't have a fast enough router to make the transfer succeed.
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
Almost all switches can do wire speed L2. Not all switches nor routers can do L3 at wire speed.
 

freegeeks

Diamond Member
May 7, 2001
5,460
1
81
I also asked myself that question. You also get sometimes random errors using Vmotion, all he debugging and support of Vmware could not explain why
 

cmetz

Platinum Member
Nov 13, 2001
2,296
0
0
spidey07, I haven't sniffed the protocol, and I don't know exactly how VMware has implemented it (and they sure aren't willing to explain exact details), but other live migration systems I've seen work basically by flipping all RAM associated with the source VM to copy-on-write, and then in parallel, transferring the original read-only copy while keeping a queue of the COW diffs. Then you send over the COW diffs to the other side, and when that queue drains (or you just decide it's time), you actually cease execution on the source VM. Meanwhile, the destination VM's host has been busily applying those COW diffs to the destination, and when it gets the signal, it sends a gratuitous ARP and fires the new VM up.

So once you flip the source VM to COW mode, you need a good bit of RAM in reserve to handle those queued pages, or you're going to have to fail the operation and patch it all back together on the source host (there are a few different strategies you can use, trading between temp RAM and best case vs. worst-case performance, but that's one way it could be done). In order to not consume crazy amounts of RAM or have a dangerous blip if it has to back out, I assume they bound the size of the COW diff list. Similarly, there is a moment, no matter what, when there IS actually a blip in operation, and that requires a handshake between source and dest. The shorter the blip, the happier everyone is with the operation.

The main RAM dump requires bandwidth, as much as possible, and to a lesser extent the same is true of the COW transfer. The actual hand-off requires low latency.

My guess is that they are trying to do everything they can to require you to be as high bandwidth and low latency as possible, which could be achieved by being on the same 1G/10G switch, or at least in the same LAN 1G/10G L2 domain. I don't believe there's a technical reason why L3 is infeasible, but I do believe they have made some decisions trying to prevent people from trying to do things that are unlikely to work reliably (and in turn making them look bad). Look how long it took for ESX to support white-box hardware. Or non-FCAL SANs.
 
Last edited:

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
Thanks for not addressing the op.

thanks for being an asshat. I was adding to cmetz's post and why the L2 reasons are important.

For one any extra headers add CPU time to encapsulate / decapsulate. With L3 not always being at wirespeed and L4 even rarer, it becomes more clear.

The whole push for L2 based, ultra low latency switch gear with ultra high bandwidths is the platform that makes a lot of this technology possible.

peace out woogie!
 

freegeeks

Diamond Member
May 7, 2001
5,460
1
81
thanks for being an asshat. I was adding to cmetz's post and why the L2 reasons are important.

For one any extra headers add CPU time to encapsulate / decapsulate. With L3 not always being at wirespeed and L4 even rarer, it becomes more clear.

The whole push for L2 based, ultra low latency switch gear with ultra high bandwidths is the platform that makes a lot of this technology possible.

peace out woogie!

what you are saying is totally not relevant. L3 forwarding has come to the point that what you are saying is a mood point. I'm an ip/mpls engineer and we deliver all the time high speed wan connections that should have no bandwith or latency troubles to make vmotion a possibility between 2 sites. We have customers using mpls L2 services (vpls or vll) to do vmotion but for some reason it's not possible when using a L3 vprn. It must have been a vmware engineering descision to limit it to the same L2 broadcast domain. It makes no sense that in theory you could do vmotion between 2 hosts connected on an old 10Mbits dumb switch but not on a high speed low latency L3 network.
 
Last edited:

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
LOL! I love it.

I am fortunate to be around long enough to see
A multi-path more beneficial than a tree
A path whose crucial property
Is all active bandwidth via ECMP
A path which must be sure to enable
Loop free paths and small MAC tables
First the architect needs to see
That MAC routing can happen at L2 and L3
When VM’s dance from cloud to cloud
Traffic storms and table overload cannot be allowed
If vendor innovations give you a shrill
Let me introduce you to the working groups addressing LISP and TRILL
 

m1ldslide1

Platinum Member
Feb 20, 2006
2,321
0
0
Its so that you can maintain the same IP address after you vmotion a server to a new physical piece of hardware. I don't personally know of any other requirements, although vmware may have programmed some in to make sure that servers get deployed in this way. If you have to change IP's when you vmotion a server then you've kind of defeated the whole purpose. The only other workarounds is to change DNS, which can take hours and hours to propagate (again defeating the purpose) or use something like an ACE load-balancer or GSS.