Cisco recalls suicidal UCS blade servers

The week of January 30, Cisco Systems put out a field notice to customers using its Unified Computing System B440 server blades, stating the failure of a MOSFET power transistor on the blade can “cause the component to overheat and emit a short flash which could lead to complete board failure.”

The company said “in extreme circumstances it could affect the other blades in the chassis by disrupting power flow.” Cisco warned customers something was wrong with the MOSFETs July 12, and said at that time there was “no indication of a systemic issue with the MOSFET components, and the observed failure in the field is considered to be a random component failure.”
To that end, Cisco’s system engineers could issue a firmware fix for the blade to keep the MOSFET from overheating and flashing, causing the system board to fail. On January 26, Cisco notified customers using the B440 servers the firmware patch did detect MOSFET failures and prevent a “potential thermal event,” but since the firmware was distributed, another B440 in the field failed. As a result, Cisco made hardware modifications to the B440 system board and is now replacing all machines currently used by customers. Cisco said in the field notice no other UCS B Series blade servers or C Series rack servers are affected by this MOSFET failure issue.
For users with these B440s in production, Cisco recommends upgrading to the most recent UCS blade management controller software, which has the patch for monitoring the B440 MOSFETs, and arranging to get replacement blades as soon as possible.

Source: http://www.theregister.co.uk/2012/02/06/cisco_b440_server_recall/