I'm not worried about the PCI bus being a bottleneck for this physics processor for a number of reasons. For one thing video cards barely utilize the 2X agp bus because of the fact that once the textures are on there, they stay there and are not shared with the system memory except on budget systems. Even with a 16X PCI express bus, it's terribly slow to use system memory for texture storage which is why people like to stick with onboard memory therfore the transfer of data from the system bus to the graphics card is limited to it being "loaded" and I unloaded.
Now physics calculations are much much different than graphics because of the fact it's mostly wireframe models and other calculations. These don't take up a very large portion of memory but instead take a lot of processing power. It's like distributed computing, like SETI. Seti sends some packets (fairly small) to the computer doing the work load which it crunches. While this takes hours, it's still fairly small bits of information (at most 1MB) but it still takes a long time to calculate. This is also like prime 95, a small program but the calculations takes ages with little memory usage (except when using benchmark mode).
So having a "physics calculator add in card" in the PCI slot isn't such a bad idea because of the fact that the work load it recieves are fairly small packets (no where near 133MB/s) and all this calculating is done on the card which can have an unlimited bus speed (assumes the card comes with a little bit of memory). Remember we're in the early stages of working with these physics calculations, in the furture these calculations could take upwards of 100MB! But for now were dealing with relatively simple calcualtions (in comparison to what they could be) that simply take a lot of horse power to compute and no so much memory.
So here is how I can see it happening:
1. Game is loaded into video card memory/system memory and right before you see the game's interface, Physics calculations are sent to the add in card.
2. Add in card has a little bit of it's own memory and the tasks are queued
3. Physics processor does all the calculations it needs (things queued in memory aren't there for very long like 100ms if the task is relatively simple)
4. Sends calculations it's done either directly to the graphics card or to the processor then the graphics card
5. Now you see the video game's interface (Your in the game now looking around)
6. Physics add in card is waits for more tasks and this cycle continues.
(Keep in mind that this queueing, calculating and send the information back is all happening in fairly short order)
Anyways it's not the amount of bandwidth the bus can handle between the add in card and the system board but how fast it moves this data (latency) that will be an issue because calculations aren't graphics, just simple binary code that needs to be quickly calculated.