From my understanding, the latency difference between DDR3 ICs and GDDR5 ICs is practically nonexistent.
At similar speeds, it shouldn't be, and/or GDDR5 may even be faster (IE, OCed DDR3 v. stock GDDR5). They rarely run at similar speeds, though. Common GDD5 has been pushing 2GHz, while common DDR3 is still a bit below 1GHz.
Since the memory controller variable stays constant, but the physical distance is decreasing, the latency should decrease.
Only if a lot of the latency is from the trace length, which it typically hasn't been--system RAM is usually only slightly higher in latency than the chips on the DIMM. Not having to worry about DIMMs should allow for a substantial real latency reduction, though (no bank selecting), along with
high concurrency (greater chance of your data being available on some channel/link somewhere, and if not, a great chance that memory utilization is good enough to not worry about it), which should matter much more for ~200 threads.