You are right, I think it would make sense to try to interpolate to guess how to make the upscaling look better.
But I am torn between being a purist by only looking at the actual source information, vs. allowing the upscaling to try to guess at filling in the gaps with new information that was never in the original.
The text example above looks pretty good with interpolation. But surely there might be some example where the interpolation introduces some artificial information that looks a little odd or maybe it guessed wrong?
The aliasing on the text looks wrong, and the interpolation looks better, because we know the text should be a smooth slanted line so we are ok with adding additional information to guess at the true slanted nature of the text lines. But what if instead we are displaying a porcupine or something that was intended to be blocky/jaggy, more than just a pure slanted line? The interpolation would destroy that and blur it all together, something you don't want.
But it is interesting to think about what viewing distance and screen size combinations allow the human eye to perceive the differences. I think there could be many situations were you could, in fact, get away with the 1-to-4 simple zooming. And in situations where it just becomes noticeable, I still think to myself that it's the purest form of the original 1080p source, without any blurring/guessing thrown on top of it to try to make it look better. Maybe a good analogy would be how some people prefer to disable the faux 120 fps upscaling when viewing DVDs, preferring the original source 29 fps or whatever. You could argue that the 120 frames are 'better' but the purist might disagree because it's guessing at filling the gaps, where the original source never contained that data in the first place.