1. To render a polygon, you have to fill the pixels inside it. Doing this is trivial as long as the polygon is convex. It is impossible to get a concave triangle, so triangles are safe to fill using the fastest alrogithms. I don't think the guy above me is correct about the math - transforms are done vertex by vertex, not for a whole polygon at a time (at least in the engine's I've read about - but they were pretty old so newer engines may do it that way - I don't know)
2. To light each polygon, you have to calculate at what angle each light strikes it (And consider stuff like distance, etc, depending on lighting method used). Add more lights, and you have to do all those lighting calculations another time.
3. In a scene without reflections, you can figure out what objects are "behind" the viewer at an early stage and completely drop them. This cuts what you have to worry about in the later stages (filling polygons, texturing them, lighting them, clipping the view to show only what should be onscreen). With reflections, you now have to consider those objects as well.