Storage subsystem on a fully virtualized enviroment

zir_blazer

Golden Member
Jun 6, 2013
1,179
442
136
From several years ago, I dreamed of using computers in a fully virtualized environment, where I can be the overseer of an Hypervisor, with the OSes themselves being something secondary and easily swapeable, or even being able to be used simultaneously. While virtualization has been around for years, the fact that there were a lot of Hardware pieces that still needed to be emulated, mainly the GPU, didn't allowed for a lot of flexibility, as you would still need to have a native Windows installation to do things like to play games, because you couldn't do that inside a VM. Advancements in the IOMMU virtualization area has made passthrough possible, that somewhat solves this by allowing you to give a Windows VM control of the GPU via VGA passthrough, which makes it pretty much fully functional even while inside a VM.
Yes, at first it seems to add complexity and overhead for seemingly no reason. What I want to accomplish with this? Eliminating Dual Boot. You see, one of the reason why I always through that I (And many other users) didn't even bothered to spend time in Linux, is because Dual Boot ain't for the lazy. If I can do all my everyday task on Windows, I wouldn't even bother to reboot to do in Linux what I can do in the current Windows session, because if I were to do that, I would need to reboot again if I wanted to play a game (You have Wine in Linux, but support and compatibility is not universal). Basically, while I always heared about Linux and know about some of its strengths, I never bothered with it because it couldn't fully replace Windows, and there were areas where Linux may have had an edge but it wasn't enough to even justify installing and learning to use it.
From December of last year, I managed to get quite close to what I want thanks to my new computer with VT-d support, having manage to get a fully functional Windows XP SP3 VM where I can play games, using Xen on top of an Arch Linux installation. After 3 months of using my Frankenstein setup I can say that stability and compatibility are near perfect (Not without some quirks, but almost there), and I would say that the whole thing is pretty much production-ready for power users that look at this idea the same way I do. This way, you can get the most of each OS strength by using them simultaneously. While making it work was quite an accomplishment itself, fine tuning such a complex setup seems to be even harder, from both configuration and usage sides.

I was intending to throw an extremely, extremely long wall of text bomb. You will get it anyway on the long run - this is just the tip of the iceberg. But I decided that dividing it in parts focusing on one aspect of the setup each time, would make it more palatable. Overally, the points I am interesed in, are:
1 - Set in stone the storage subsystem, that due to the fact that is the hardest thing to change after you get it working, should be also the most important to take care of
2 - Proper setup of a slim, lightweight Dom0 (Which currently is Arch Linux + Xen, however, I should re-check XenServer because there are some free versions with nearly all features), that should be secure, stable, and possibly even isolated from Internet, yet has administrative tools to do plenty of management and Hardware monitoring
3 - Tuning the performance and usability, making sure that all this increases my productivity. This step should be where scripts and cheatsheets with commands will come in to learn how to manage everything


The storage part is the first and possibily most important step. It is a thing which is better to get right the very first time you do it, because is a royal pain in the ass to re-do it later, as at the bare minimum you will require another HD with enough free space to copy all the stuff temporarily there while you start partitioning from scratch again if you have a new idea at a later time. Partitioning itself is a ridiculous mess, as every guide you read will usually have its view on what is the most efficient method to do it (Some based on technical merit, some on personal views of how to organize data), namely, amount and recommended size of partitions, File Systems, etc.
As I'm intending to do something very different to the estabilished on all the guides to meet my use case, I have to somehow define my own style - but I don't have the required knowledge to do so, which is why I need people to fill my holes and gaps. I have written a lot about both my use case and the ideas I had to define how to do it, but there are several things which still are lacking.




STORAGE
For everything I know about this matter, doing modifications like resizing partitions in a later stage is a pain in the butt, so I take that everything I do here is pretty much final, reason why I spend a lot of time thinking about it to not make mistakes. As this should also have an noticeable impact on I/O performance, it is even more important to get it right on first try.

BOOT LOADER / BOOT MANAGER
In my case, I have a single 4 TB Hard Disk. Using GPT instead of MBR is pretty much mandatory. My original idea was to use Gummiboot as Boot Manager to launch the xen.efi executable directly, however, Xen in UEFI mode never worked for me, and due to lack of tools to debug it I wasn't able to push any further. I didn't had issues using Syslinux, which can work as a BIOS Boot Loader but has GPT support, so I suppose than I will keep using it to boot the main OS until I can get everything working under UEFI.


PARTITIONING
The first thing to have in consideration is that the HD performance is variable depending on what part of the platter the data is physically at, being faster at the outer edge and slower near the motor. This means that the data that should be used often (Which should include the main Arch Linux + Xen Hypervisor installation, and maybe some of the most important VHDs) should be in the outer edge. As far that I know, LBA addresses starts from the outer edge and ends on the inner tracks, so if you make partitions following order of importance on a brand new HD, you will get it right.
The actual question should be how many partitions are actually needed, and what should be either an appropiate or confortable size for them. It should be a point where I'm not wasting tons of space that will never be used, yet will never have the need to resize them because they're too small and causes them to run out of space for critical stuff that HAS to be there.
At the very least, I would need 3 partitions: The first one will be the EFI System Partition (Which for me, will be unused, so is mostly a placeholder for when things works as I intend them to). According to some Microsoft info on the ESP, it had a recommended size of around 3xx MB and has to be FAT32 formatted. I decided to settle on 512 MB for the ESP. The second one will be the Hypervisor installation, for which 10 GB seems to be enough according to my tests (Is what I'm using currently), through I don't know how much it could grow if, say, I had anything that did intensive logging of the Hypervisor activities. It could also need to be bigger assuming I were to store multiple installation ISOs there instead of somewhere else. Finally, the third partition could be a single, big, storage partition which all the remaining space (3+ TB).
If I were to use other GPT-capable OSes, I would need a partition for each in case I'm intending to run them native, but as the idea is to run everything virtualized and not even bother with a native option, I don't see a need for those. Also, there could be more than one storage partition, as if I were to want to guarantee that the data is physically on the outer tracks boundary, so instead of a single, big, data partition, I could have two or three like if they were priority layers.
Examples of how my HD could end partitioned would look like this (And also, in this LBA order):

1- ESP (512 MB, FAT32)
2- Hypervisor (10 GB)
3- Storage fast (1 TB)
4- Storage slow (Remaining 2.9 TB or so)

1- ESP (512 MB, FAT32)
2- Hypervisor (10 GB)
3- Native OS 1 (120 GB or so)
4- Native OS 2 (120 GB or so)
5- Storage fast (1 TB)
6- Storage slow (Remaining 2.6 TB or so)


LOGICAL VOLUMES
This part overlaps a bit with partitioning, as deciding to use or not LVM may influence how I should partition on the first place. I suppose than the FAT32 ESP is untouchable as a physical partition, and that makes than the absolute bare minimum in a booteable HD is two partitions, with ESP being one for UEFI boot, and another partition using LVM comprising the entire HD, where I just set up virtual partitions following the previously proposed hierarchy without the drawbacks of physical partitions.
An important fact about LVM partitions is that they are a possible storage choice for Xen. Xen can use either standalone files as VHDs, or LVM partitions, which supposedly are faster in I/O. However, when I tried LVM partitions, it seemed that managing them was harder than just creating VHD files on the fly according to my needs, which is why I'm currently using files as VM storage. Maybe with more tools or knowledge, LVM makes more sense, and I can get more performance from the storage subsystem with these. I also don't know how much overhead LVM adds over standard physical partitions, in case I decide to futureproof the storage subsystem by using LVM, but then, instead of using LVM partitions for Xen storage, I keep using VHD files.


FILE SYSTEMS
Regarding File Systems, for ESP, FAT32 is a fixed choice. The partition where the Hypervisor will sit at, will possibly be EXT4. However, in case I take the LVM route, I don't know how much using LVM may influence in File Systems choices for logical partitions instead of physical ones.
Yet another thing that caught my attention was how much next-generation File Systems are hailed...
http://arstechnica.com/information-...-and-atomic-cows-inside-next-gen-filesystems/
http://arstechnica.com/information-...h-using-the-zfs-next-gen-filesystem-on-linux/
Most seems to hail these File Systems for system using RAID arrays with tons of HDs. I don't know how useful these File Systems will actually be for a partition in a single HD (For example, ZFS is useful against bitrot in a RAID array with redundancy, but if I want to have protection against it on a single HD, it literally halves the usable space because it needs to have two copies of everything). I also don't know about performance, how much I win/lose due to the possible added overhead for all the new features. ZFS additionally supposedly loved tons of RAM.
Basically, I'm interested in input if ZFS or BTRFS are worth the extra complications of learning all the quirks and usage forms of something more complex, compared to using just EXT4. Having looked for benchmarks, what I found on sites like Phoronix seems to be on RAID arrays and not as Single Disk setups. Also, ZFS supposedly takes cares of Logical Volumes, which is also LVM job, in what case they overlap. But I didn't researched a lot on that.


GENERAL STORAGE
Another thing that I was pondering about, was regarding where and how to store general data. While the VM's VHD with the Windows installations and all that is self explanatory, being local data exclusive for each VM, I will have also data that will be shared or needs to be easily accessible between many VMs, even if only temporarily. Examples included ISO collections of applications and games I have, movies or videos, etc. While I could store them in a LVM partition or VHD file that I could assign to a given VM at boot via the Xen Configuration File (I suppose storage devices are hotplug capable, but didn't looked into that), if I were to use ZFS for such purpose, Windows will not able to see them directly. I suppose that in order to take advantage of ZFS or BTRFS, I will have a dedicated Linux VM dedicated for storage and allowing Windows to access it via Shared Folders as in a network. Otherwise, I will have to store stuff in NTFS formatted VHDs.
One of the critical things about this, is the tools of the trade to manage files inside the LVM partitions or VHD files if I have to take down a VM for mainteinance. Also, tools that can make a LVM partition out of a VHD file or viceversa could be quite useful, too.


RAMDISK
Due to the fact that I have 32 GB of RAM, I was thinking on the possibility of making use of the excess of RAM as a RAMDisk, which provides beyond SSD I/O performance (Which I didn't had remaining budget for). Many games should be able to fit on a 20 GB or so RAMDisk, and as the computer is 24/7 on, RAMDisk volatility is a non issue for as long as the important stuff like Save Games are on the HD. I had experience working with one on WXP SP3 (Yep, using all the 32 GB of RAM on 32 Bits via PAE, long story) and had some success using symlinks (With NTFS does supports), through to get the most out of it you requiere batch files to copy, rename, and make the symlinks. I see it workable.



That is all what I have thinked about the storage part of my virtualized system. I expect that there will be people that already experimented with these things and have decided on a way or style for managing this stuff that may want to share, to help me take a choice on what to do and how. It has been more than 2 months since the last time I toyed with the configuration of this system, as after getting it to a usable state I decided to enjoy it instead of further optimizing (I was out of gaming a whole 3 weeks until getting VGA Passthrough working, it was pretty much equivalent to the Dark Ages). However, as Xen 4.4 was released recently, I was intending on starting from scratch, applying any ideas I had in the meantime for a final setup.
 

Scarpozzi

Lifer
Jun 13, 2000
26,389
1,778
126
1. Don't install XP. It's losing support very soon. Switch to Windows 7 and run applications in XP mode ifyou must.

2. Pick your hypervisor, pick your physical storage, then build your VMs from there.

Your hypervisor will dictate what virtual filesystems you can support.

Honestly, it sounds like you just need a windows/Linux box with Oracle VirtualBox. Xen Server or VMware would be overkill unless you want VMware desktop (if it supports VMFS)
 

zir_blazer

Golden Member
Jun 6, 2013
1,179
442
136
1 - Unless you want me to get a pirated copy, I'm not touching Windows 7/8 with a fishing pole - don't have any real need to spend money on that. And unless there is a critical zero-day exploit that magically appears after support ends, wrecking any WXP machine connected to the Internet like Blaster did, I also don't get the deal with the "Support ending". Heck, I'm currently using a vanilla WXP SP3 copy with no hotfixes on top of it - already half a decade old. Sensitive data will not be in the Windows VM anyways, so isn't something I'm concerned.

2 - Hypervisor of choice is currently Xen on top of an Arch Linux Dom0. As far that I know, XenServer is pretty much Xen on a custom Linux distribution with ready-to-use tools to do the system administration, but as it is Xen at the core too, I suppose that most supported features will be the same.
As I spend most of my time researching about Xen, chances are I stick to it. The only feature KVM provided that I considered important was Sound Card emulation, which Xen does with QEMU but the VM itself doesn't have any easy means to do audio output, forcing me to rely on USB Sound Cards for passthrough.
I never considered VirtualBox because VGA Passthrough is critical. VMWare I think added support recently, but not sure how mature it is to even consider it.


For my current use case, I may as well ditch everything and stay on WXP SP3 forever, I spend money and time with this build without any real need to do so as my old Athlon II X4 did what I wanted to. But that would defeat the whole purpose of what I'm trying to do.
 

zir_blazer

Golden Member
Jun 6, 2013
1,179
442
136
After some heavy googling, I gathered a lot of data (Didn't took note of most Links, so I would have to search again), yet I still can't draw conclusions with it.

First of all, LVM is a Logical Volume Manager, while EXT4 and BTRFS are File Systems. The black sheep is ZFS, which is BOTH a Logical Volume Manager, and a File System. There is a lot of mix-and-match between them, so that makes some comparisons a hard.

While EXT4 is not the current mainstream File System, as it seems than the older EXT3 is still widely used because it is more tried-and-true, EXT4 seems like an acceptable common denominator if someone wanted to define "average".
BTRFS is the next-gen File System replacement for it. However, it is experimental in nature, and while it seems that from some time ago is deemed stable enough for production use, it still has quirks and needs to be polished. According to Benchmarks, performance is the same or slower than EXT4 - not faster. This should be justifiable for the amount of extra features it has, assuming I specifically needed one of them.
I'm considering that EXT4 could be used for the Hypervisor partition (Be it physical or logical), due to the fact that using BTRFS or ZFS may involve more work while not providing any useful feature for it at all.

Regarding LVM, the idea behind it is that I can make just one big physical partition, and use LVM to create logical volumes that are more flexible to manage than physical partitions. LVM itself seems to have no overhead in a brand new state (1, 2, 3, 4, 5), which seems great, as it means that I have no reason to NOT go for it and have all the flexibility I need to be future proof even if I stay with EXT4. I'm not sure if LVM performance drastically changes over time or if the overhead is significantly different depending on the File System. I expect that partition resizing should degrade performance, as it may put new data too far away depending on free space, causing fragmentation (For example, place the new data on the inner HD tracks for a partition whose data used to be continuous on the outer edge), but at least initially it seems good enough.

Up to this point, mixing LVM + EXT4/BTRFS seems to be a very viable path. The choices would be either a physical partition with EXT4 for the Hypervisor, and another that spans the entire remaining space with LVM where I could have as many logical volumes as I need for VMs (Raw) or data storage (EXT4, or BTRFS if better). Or just a single LVM partition with the Hypervisor also on a logical volume, through that is a bit less reliable because in order to boot from it, some modules have to be loaded beforehand.
LVM should be also the easiest to work with because I can give Xen raw logical volume partitions which should be pretty much the best option for storage performance from inside a VM.


As an alternative to the still unfinished BTRFS, there is ZFS. Due to it being unique as it is both a LVM and a FS, I suppose that using LVM with ZFS formatted partitions doesn't make much sense, as it is redundant. However, I didn't researched enough about ZFS to check if the logical volumes provided by it may be given to Xen or formatted with another File System in the same way as you can with LVM.
While supposedly BTRFS should be better in several aspects than ZFS when it matures, ZFS currently seems to be the ultimate option in FS, with advanced features like a complex cache hierarchy, etc. It was supposed to be very RAM hungry, but that's what I have 32 GB of RAM for. Also, most of the info that I read about it where it gets praised, always involves a RAID array with several HDs, not just with a single HD like mine, so I don't know how useful ZFS can be in this scenario.

I suppose I will add more info as soon as I find it. LVM seems a overally decent choice, but I want to know if it is worth to push for ZFS or simply discard it.
 

Scarpozzi

Lifer
Jun 13, 2000
26,389
1,778
126
ext3/4 are both solid and can be large, but volume size limit varies with the block size you select during formatting. LVM is ok if you need a volume manager. I sometimes install systems without it if I physically can't resize the volume...there's less to break that way.

About XP...yeah, you can run it, it's fast. I don't ever think about MS licensing because I have copies I've purchaesed and never installed... I install them at work all the time, but never worry about licensing there as they're under contract. I'm happy with 7 in a VM, though it takes up a ton of harddrive space. I just like staying with mainstream support where I can.
 

zir_blazer

Golden Member
Jun 6, 2013
1,179
442
136
Isn't that Unix File System? Should be archaic by now. Any real reason to consider it?


I looked around about ZFS, and on ZFS you can create logical volumes and format them with something else, similar to LVM. I suppose I could create a logical volume, make a LVM partition, and feed it to Xen. Will have to research further if Xen can use a ZVOL directly, but that link is a good start about what I could expect from ZFS itself.
What I have no idea is about performance compared to LVM + EXT4 solutions that should be around the same as a physical partition. Also, ZFS as a File System seems more heavy than the traditional ones (Some CPU, but a lot on RAM) and I don't know how much overhead it adds to get its nominal performance.
Arch Linux Wiki has some articles regarding ZFS installation and administration. It is more complex that what I'm used to, and to get it working I will need a crash course before getting it right.


To make things even harder, I'm out of luck and my HD has developed at least one bad block, which has caused me minor data corruption issues (At least it seemed that to me) on a VM that just happen to be there. If I RMA it, I want to deploy this ASAP on the new arrival. And considering that ZFS seems to be better on data integrity, I think it may have an edge here.
 

smakme7757

Golden Member
Nov 20, 2010
1,487
1
81
And unless there is a critical zero-day exploit that magically appears after support ends..../Snip

Oh just wait and see. I bet the cyber-criminals have bucketloads they have been saving up since Microsoft announced the end of XP support.
Just had to wirte the comment :). Carry on!
 
Feb 25, 2011
16,801
1,474
126
Isn't that Unix File System? Should be archaic by now. Any real reason to consider it?

It's been updated a lot.

It's less RAM-hungry than ZFS, performance is fine (it's FINE, dammit!), and Linux/UFS doesn't exhibit some of the weird bugs I've seen with FreeBSD/ZFS.

Most of the features of ZFS are done better by dedicated hardware for more money. The portability of a zpool is nice, but not necessary if you're in an enterprise environment (with access to specific replacement hardware), and if you're a home user you don't need them in the first place. As far as ZFS's anti-corruption tech - screw it. I'm already backing up everything three times. If something bad happens, I click a button and it un-happens. Oddly, however, nothing bad ever happens.

But I'm mostly just sour on ZFS because of issues at work. So... grain of salt. I use ZFS/FreeNAS at home and it's been very set-it-and-forget-it.
 
Last edited:

zir_blazer

Golden Member
Jun 6, 2013
1,179
442
136
Are you sure you didn't tried to mean FreeBSD/UFS? Would make much more sense. As far that I looked around, UFS support on Linux is reliable only for read, not for write, so I don't think it can be implemented as a everyday FS. It also requires custom tinkering to get working, as Arch Linux doesn't support it out of the box, and documentation for UFS itself seems scarse.

As I have 32 GB of RAM, ZFS RAM hungryness doesn't really bothers me, for as long as it performs around the same than LVM + EXT4 and protects better my data. I still believe it seems to be the ultimate choice, but installation, configuration and maintenance instructions are far beyond my understanding, and I have to learn several things before getting it right (That means downtime). I could go for LVM + EXT4 now (After zero-filling or RMAing my HD due the bad block issue, so I get a fresh start), which instructions are covered good enough, or launch myself in a kamikaze fashion to ZFS.
I will try to tinker in a VM with Arch Linux and ZFS, to see if I can get that thing working.
 
Feb 25, 2011
16,801
1,474
126
Sorry, that was poorly written. I meant them individually.

"Linux doesn't exhibit some of the .... with FreeBSD."
"UFS doesn't exhibit some of the .... with ZFS."

I use ext4 with Linux.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
I think most people who want to virtualize their personal desktop do it backwards.

Put windows as the primary OS and use a type-2 hypervisor like vmware workstation, virtualbox, etc to virtualize your primary linux desktop.

Linux behaves better in a VM, and windows games will perform better.

Better yet, use windows 8 with client hyper-v and you don't even take the performance hit of a type-2 hypervisor. Easy peasy and you can use whatever filesystem you want on the linux vm (I suggest ext4).