The compute resources required to route a packet depends on its size, and this can cause the throughput to vary quite a bit. This is why routing performance is measured in packets per second, not bandwidth.
A modern high-end x86 processor has enough power and memory bandwidth to handle several million packets per second. This is plenty for <10Gb routing, but at 40Gb/s, you're going to have a very difficult time keeping up with the traffic, particularly if you have to handle multiple packet sizes. Best case, smaller packets degrade your speed, worst case (i.e., DDoS) your router can completely lock up. And that's just with extremely simple routing activity; you can forget about things like ACLs, traffic shaping, VRRP, etc.
Before you say "throw more processors at it!," Linux's (and I assume BSD's) TCP/IP stack can scale with additional processors, but only to a point. NICs have a limited number of queues, and that will necessarily limit how many cores can be assigned to a particular NIC. Also, having a single fast multi-core CPU is the ideal case for software routing, as multiple processors add NUMA-related headaches that can easily decrease performance if things aren't tuned just right.
A few years ago, a Linux kernel developer
gave a presentation about using Linux as a bi-directional 10GbE router, and while it worked in that role for larger packet sizes, performance didn't scale when adding 10GbE links, and performance tanked with smaller packet sizes. Granted, server hardware has improved since then, but not enough to assure line-rate routing performance at >10GbE.