Glibc IPv6 bug
While working on jack IPv6 network stack I've trapped onto bizarre getaddrinfo()
behaviour - it was returning me address families in wrong order for passive nameless request. It appears people were trapping onto this bug since 2009 (at least that far I found references to this behaviour in mail-lists).
Moreover I found already submitted bug reports in Ubuntu and Glibc bug trackers. Unfortunatelly there's no movement on the official tracker, even after I submitted the fix.
The nature of the bug is bound to the way how address validation is performed inside getaddrinfo function of the Glibc. First it obtains list of address candidates, then does validation, and finally validated addresses are sorted with rfc3484 recommended sorting algorithm and conditions.
So the validation. The validation ensures that provided candidates for the address are valid addresses which host can actually use. Although POSIX doesn't require that, and in example snippets does that validation out of the call. Anyway, sorting function requires two addresses, local peer and remote peer - to select the best candidate first, and then with decresed feasibility list of valid remaining candidates. And connect() call allows us in one shot verify address is feasible and get two peers - local and remote.
Now, that works perfectly well and logically correct for remote addresses. What happens when we request information about address we want to bind locally (AI_PASSIVE
)? Well, that also works until you specify some address. It will check whether the address is local, then will try to connect to it (no problems with that) and sort self/self peers. However, once you provide NULL
as an address - which should mean any address (a common task for servers) - here comes the problem.
With the NULL
or UNSPEC
address the only viable candidates are 0.0.0.0
and ::
. But if you read POSIX or even BSD specification of the connect() you'll see that proper behaviour for connect()
to UNSPEC
is to connect to localhost, or loopback, or 127.0.0.1
or ::1
. That means the local peer after connecting to such remote peer will be set to loopback address. Further on, sorting is always performed on IPv6 addresses, IPv4 addresses are mapped or converted (based on AI_V4MAPPED flag) to IPv6 representation before sorting. And here after this conversion local and remote peers for IPv6 are becoming from different scopes (loopback vs global) while V4 mapped addresses are both from V4 mapped scope, whether they are global or loopback.
Obviously rfc3484 sorting by rule 5 gives priority to IPv4 - even though machine is V6 enabled, dualstack and V6 should have higher priority.
I don't know whether my proposed patch to this issue is the best possible solution but it is first which came to my mind - if we have combination of this input parameters - UNSPEC
+ AI_PASSIVE
and we discovered after connect() that we have loopback address for INET6 family - just reset that local peer to be UNSPEC as well, so that sorting function will see them both in global scope.
Update
As it appears - getaddrinfo() implementation in Glibc is very poor and this one is just one of many bugs sitting there. There's even project to rectify the implementation mustered by redhat. Details are here.
Link... Mon Mar 4 00:06:35 2013 Upd.: Sat Mar 9 11:05:16 2013