Imagine for a second that we're down-sampling just a row of pixels for simplicity.
We have a native res of 10 which looks like this
[x][x][x][x][x][x][x][x][x][x]
Now we render internally at native res (10 px) which means each pixel of your render output maps exactly onto 1 pixel on your monitor in a 1:1 fashion, you get a perfect fit and it looks good.
Now assume we up the rendering resolution to 15 px and we try and fit this:
[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
into this
[1][2][3][4][5][6][7][8][9][10]
How do we do this? Well we could only sample every Nth pixel from the render but that wont work because they're not exact multiples of each other, which means we'd need to pick odd spacing apart that would distort our final output, an attempt might look like this:
[1][3][4][6][8][10][11][12][13][15]
Notice that we jump odd number of pixels over 2 then 1, 2, 2, 2, 1, 1, 2, 2 - this is an obvious mess.
Or the other solution is to take weighted averages of each colour which result in inaccurate colours which might not even exist in the original scene, if the original scene for example is just black and white pixels alternating each pixel then what you'd get is a grey mess.
Down-sampling non-native resolutions, or up sampling for that matter, are both approximations unless the resolutions are exact multiples of each other.