Pretraining requires huge datasets and too much compute power to be done on the fly with a singe consumer grade GPU.
Depends on the use case, were not talking about a neural net that can optimise light sampling for all possible scene configurations, only for specific geometry and environments, which as I mentioned could be trained during game production, similar to how UE4 or Unity builds the light maps when changes are made.
The training dataset size is relative to the ambiguity of the features/patterns you wish the NN to recognise in a given image, if the NN is pre trained on first light samples of a piece of gemoetry from all angles, and is only used for that specific piece of geometry, then the pattern is not ambiguous at all.
As to whether it could be done at load time or not, I would say never say never, given how much ML techniques have improved over the last few years.
Look up "Path Guiding" on Google, its quite an interesting subject given the huge compute demands of RT and PT rendering.
I researched the current state of the field (ML in rendering and DCC) for my masters dissertation, and the benefits are already looking extremely promising for the likes of Pixar/Disney working on it, and likely every developer of offline renderers are already salivating at the prospect of decreased render times - I'd expect to find ML and GFX focused hardware manufacturers like AMD/nVidia close behind, looking for ways to exploit similar techniques in real time rendering.
Some of the Path Guiding, or Neural Importance Sampling methods I have seen were trained on only a CPU, given that optimising variants of methods for GPU seemingly takes longer - ergo once ideal methods are discovered, heavy optimisation can be applied to the NN for GPU/Tensor Cores, or whatever works best - usually both in memory size and running speed.