Lots of FUD in this thread, saying the OS just sees all RAM as equal is quite naive and outdated knowledge at best.
Multi-socket motherboards and the likes today gives ripe opportunity for the OS to optimize performance by allocating memory on the RAM sockets belonging to the processor that the allocating process/thread is running on to prevent expensive ram access across the processor interconnect.
This is typically done with the Table Lookaside Buffer (TLB) which maps virtual memory to physical memory.
Furthermore it's completely insidious to compare VRAM to RAM since they first of all have vastly different access patterns in terms of latency and bandwidth and secondly aren't coded for in the same way either. Regular code accessing RAM is compiled and optimized on the developer's machines meaning optimization of memory accesses is done offline and only once (A LOT of compiler effort goes into this due to the CPU waiting ~100 cycles for cache misses, setting compiler flags for a specific CPU family will sometimes even tweak memory access patterns for that family's memory controller).
This is very different from GPUs since they don't share a common instruction set and as such the shaders must be be compiled at runtime by the
DRIVER, what does this mean? Well the driver becomes the compiler that optimizes the memory access patterns which is SUPER DUPER MEGA important for the massively parallel architectures that GPUs are since their throughput is entirely dependent on wavefronts being scheduled and interleaved with the right timings compared to memory latency and bandwidth such that the ALUs never are idle.
This scheduling is essentially black magic and highly dependent on the GPU architecture, trust me I'm a software engineer