So if the insulation fails, the device will short? I have also seen transistors bolted onto the metal heatsink. I believe I saw a car amp with bolted on transistors.
That is because you can buy transistors that have a electrically isolated casing (but have a higher thermal resistance). And you can buy transistors with a metal casing (and afcourse have a lower thermal resistance). It all depends on how much power the transistors need to dissipate. If the amount is low, get a version with an electrically isolated casing. If the amount of power dissipation is too high for an isolated version get one with a metal casing. If the amount of power dissipation is still to high in combination with mica plates or other insulators while using non isolated version, do not use mica plates but use a thermal paste and connect the transistor directly to the heatsink. If the last case is the case, and the transistor caries a high voltage you must use a seperate heatsink. If it i still not enough, use active cooling. In the end it all comes down to preventing the silicon in the transistors from melting and becoming a conductor like a copper wire. Because then your power supply (or amplifier)starts behaving differently.
It is a trade off of costs of components , the best design for a certain price.
As i mentioned before, there are rules and regulation to prevent dangerous situations. Since electronics is omnipresent, people take electronics for granted but it is still a field of engineering.
An isolated transistor :
http://uk.farnell.com/toshiba/2sk3569/mosfet-n-600v-to-220sis/dp/1300779
datasheet :
http://www.farnell.com/datasheets/55583.pdf
thermal resistance, channel to case : 2,78 c/w.
A not isolated transistor :
http://uk.farnell.com/fuji-electric/2sk3682-01/mosfet-n-to-220ab/dp/1208660
datasheet :
http://www.farnell.com/datasheets/39857.pdf
thermal resistance, channel to case : 0,463 c/w.
When you look at the thermal resistance number, you will see that the non isolated is better at conducting heat away. These transistors have similar dimensions. Both have a case called TO-220.
EDIT :
Thermal resistance means the efficiency of heat transfer. The higher the number, the lower the amount of heat energy you can transfer in a given time.
look at this transistor :
It is as big as a mars bar.
http://uk.farnell.com/fuji-electric/1mbi200s-120/igbt-module-1200v-200a/dp/1208669
This baby can dissipate in excess of 1000 watts of heat when properly cooled.
http://www.farnell.com/datasheets/39743.pdf
The reason why this transistor can dissipate more is because it's thermal resistance is so much lower , because it has a bigger surface to transfer heat.
thermal resistance : 0,096 c/w. (for the transistor only)
thermal resistance : 0,260 c/w.(reverse current diode)
thermal resistance, contact surface : 0,0125 c/w. (with thermal paste).
You have to add all these numbers for specific scenario's.