In other words, GRE doesn't fit precisely in the 4-layer TCP/IP model.
MPLS runs on top of Ethernet and other layer2 technologies. Maybe layer 2.5 ? So it doesn't fit precisely in the 4-layer TCP/IP model.
Exactly. But the 4-layer TCP/IP model stops at layer 4, TCP. It does not have an application layer. So the TCP/IP 4-layer model should actually be a 5-layer model.
But ARP deals with layer-2 addresses. And it doesn't run on top of layer-1. It runs inside a layer-2 ethernet frame. Again, maybe layer 2.5 ?
My point is: the TCP/IP technologies do not fit precisely into the 4-layer TCP/IP network model. If that is fine (and it is), then why is it a problem if TCP/IP does not fit precisely into the 7-layer OSI network model ?
The 7 layer network model is there to make us think about problems and solutions. Divide and conquer. It's not required to follow the layering in an absolute strict way. You can be flexible. (Although you will probably pay the price many years later, for your deviation from the clean solution.

)
And I think it doesn't. ARP, MPLS, GRE, I am sure there are more examples. Even the wiki page
http://en.wikipedia.org/wiki/OSI_model talks about how things do not fit nicely into both the OSI and the TCP/IP model.
I don't understand this.
See the 12 fundamental truths about networking.
http://www.ietf.org/rfc/rfc1925.txt
Point 8. "It is more complicated than you think".
Real life, and real protocols, are more complicated than we think at first. Because of this, real life protocols will always be a bit more complicated and might not always fit nicely into pre-invented models.
You might think that the OSI model is too complicated. I kinda agree. Layer 4 and 5 can be combined (see TCP). Is layer 6 necessary ? I think it is. But the TCP/IP model lets the implementor of protocols deal with it. So layer 6 and 7 are combined. And every implementor has to find its own solution. (Although there's an easy one for byte-order in integers. Just write the number in ASCII).
I think the OSI model is fine. Not perfect. But good enough to teach people who are new to networking. And good enough to help you think about the problems and solutions. Even if it isn't accurate anymore, there is imho benefit to compare 2 different layering models. It makes you understand things better.
Just like I think that if you had learned about Novell, Appletalk, CLNS and other network protocols, you would have a better understanding of the strengths and weaknesses of each protocol. Including the weaknesses of TCP/IP.
This