This is simply untrue.
They're massive because they have a lot of unique texture detail. Rather than having a few base textures that are used to pain large swaths of the level by repeating them over and over, they use a more painterly approach where designers can make every part of the level unique.
The issue of the pop-in is in large part due to the fact that consoles hardware is diabolically bad, often reading data from disc in real time, or at best a HDD, that streaming process is laggy and slow, they come with very little system memory and very little GPU memory.
If the engine was designed around streaming off a SSD, and from PCs with 16Gb-32Gb of RAM and 4-6Gb of VRAM you could stuff most of the textures into memory and have next to no pop in. It's just a side effect to pandering to console users.
On the flip side, this technology on consoles looks a lot better than other engines on the consoles, this kind of tiered streaming is the only way they can manage this size of textures on such limited hardware.