Ronin, if you understand how it works, why are you misapplying the term "texture?" Why not say subpixel color and depth value? Framebuffer size--by that I mean the color depth and pixel count, not texture size--determines MSAA memory usage. Higher-resolution textures don't affect buffer sizes, which is what you seem to be implying when you say "textures" yield memory usage.
Can you explain the math behind 600,000 textures @ 4xAA = 9.7MB? What units are you using for each of the first two variables? And how did you arrive at 600,000? 1280*720~=900,000. 720*480=~350,000. Did you just divide 9.7MB by 4Bpp and then 4 samples to arrive at 600k? That leaves "pixel" as the unit for 600k, not "texture." I assume you mean to say that, according to your formula, a 10MB buffer can only support 600k pixels?
It appears to me that you're misusing your terminology. OTOH, maybe you're just using more old-school terminology, when textures were the main component of pixel color (but, even in that case, you'd probably want to say "texel" rather than "texture").
I'm curious, are you a developer or reviewer, or are you simply, like many of us, an interested amateur?