It turns out that the problem really was that the address was busy – the busyness was caused by some other problems in how we are handling network communications. Your inputs have helped me figure this out. Thank you.
EDIT: to be specific, the problems in handling our network communications were that these status updates would be constantly re-sent if the first failed. It was only a matter of time until we had every distributed slave trying to send its status update at the same time, which was over-saturating our network.
Maybe SO_REUSEADDR helps here?
http://www.unixguide.net/network/socketfaq/4.5.shtml