Since last November, we’ve been receiving sporadic reports of ORF logging “General socket error 0” errors every once in a while.
With the help from a client of us (thank you Peder – your packet capture was the key), we tracked this issue down to a bug in the Microsoft DNS Server. Microsoft PSS has confirmed the bug and a hotfix is now available (as of writing this, not public yet). If you are getting the above error, or you have problems with DNS name resolution under high load, please contact your local Microsoft PSS and ask for hotfix 946565.
A few details for the technical-minded: the issue may occur when the DNS server receives 2 or more concurrent requests to resolve the same DNS resource record. In such event, 1 or more of the DNS responses may be corrupted, which eventually results in the “general socket error 0” message logged by ORF.
The way how the DNS response is corrupted worth writing a few words about. Every DNS query sent by a DNS client (like ORF) has a unique Transaction ID. The DNS response is expected to contain exactly this ID, which helps the DNS client sorting out which response belongs to which request. The Transaction ID is a two-byte value, e.g. 0xAABB in hexadecimal notation. What happens to these corrupted DNS responses is that the response Transaction ID bytes get flipped up—i.e., if ORF sent the the query with Transaction ID 0xAABB, the response it receives has a Transaction ID 0xBBAA. The DNS client in ORF, getting an unknown Transaction ID, discards the DNS response data and generates an error.