I've been reading up on the different protocols involved, and I've become curious about what mechanic determines what application gets the traffic? Lets say it's HTML data over HTTP; using TCP port 80 ensures that the traffic is interpreted as HTTP, but lets say you have two different web browsers open, both communicating over port 80. How is it determined which browsers get the packets?
Browsers don't use 80. Let me explain in degibson-detail.
Only one process can bind to a port. That port then belongs to that one process until the process unbinds or dies. There are a set of well-known ports that certain services (e.g., web servers, ssh servers, smtp servers, etc.) will almost always use (e.g., ports 80, 22, and 25, respectively). This is a convention in place so it becomes easy to 'look for' those services, given an IP. E.g., on IP 1.2.3.4, the web server on that node is probably bound to 80.
TCP connections have two endpoints. Each endpoint is an IP/port pair. As we've just established, servers have known IPs
and, usually, known ports. 1.2.3.4:80 in our previous example. Clients on the other hand, don't need to have known ports, because no other entity is usually trying to connect to a client (usually, clients connect to servers).
So, when a browser starts, it generally doesn't care which port it uses. It is customary for client applications to use a port >1024, and its generally wise to avoid ports that belong to some other service (IANA has a list here
http://www.iana.org/assignments/port-numbers).
A browser will typically bind to the client IP at some high-numbered port. E.g., my Chrome session to look up the IANA port numbers can be seen in 'netstat':
Code:
C:\>netstat
Active Connections
Proto Local Address Foreign Address State
TCP hms-clover:1041 localhost:27015 ESTABLISHED
TCP hms-clover:27015 localhost:1041 ESTABLISHED
...
[b]TCP hms-clover:1146 www.iana.org:http ESTABLISHED[/b]
In the case above, that Chrome process (consisting of one tab in chrome) used port 1146. It connected to
www.iana.org:http -- netstat opportunistically does reverse-DNS (hence why it shows iana.org instead of the IP) and routinely translates real port numbers to service names (i.e.,
www.iana.org:http = 192.0.32.8:80) Most browsers pick ports to use opportunistically -- i.e., they just find any old available port and use it for the connection.
The server, upon seeing a connect request from a client, knows how to respond to that client, because the TCP header encode the port number the client is using.
One last thing: port numbers are by convention only. Its possible to be nefarious and send non-HTTP traffic on 80, or non-ssh traffic on 22.