90 lines
3.3 KiB
Markdown
90 lines
3.3 KiB
Markdown
[[!meta title="The Nodejs in IPv6 only networks problem"]]
|
|
|
|
For some years I have been seeing problems of nodejs based
|
|
applications that do not work in IPv6 only networks.
|
|
More recently, [I again found a situation in which a nodejs based
|
|
application does not even
|
|
install](https://twitter.com/NicoSchottelius/status/1352243030368116739),
|
|
if you try to install it in an IPv6 only network.
|
|
|
|
As the situation is not just straight forward, I started to collect
|
|
information about it on this website.
|
|
|
|
## The starting point
|
|
|
|
I wanted to install
|
|
[etherpad-lite](https://github.com/ether/etherpad-lite) and it failed
|
|
with the following error:
|
|
|
|
174 error request to https://registry.npmjs.org/express-session/-/express-session-1.17.1.tgz failed, reason: connect EHOSTUNREACH 104.16.25.35:443
|
|
|
|
The message **connect EHOSTUNREACH 104.16.25.35:443** already cleary
|
|
points to the problem: npm is trying to connect via IPv4 on an IPv6
|
|
only VM. This cleary cannot work.
|
|
|
|
## A bug in NPM?
|
|
|
|
My first suspicion was that it [must be a bug in
|
|
npm](https://github.com/npm/cli/issues/2519). But on Twitter
|
|
[I was told that npm should work in IPv6 only
|
|
networks](https://twitter.com/A1bi/status/1352574621594300416). That's
|
|
strange.
|
|
However it turns out that [somebody else had this problem
|
|
before](https://github.com/npm/cli/issues/348#issuecomment-751143040)
|
|
and it seems to be specific to using npm on [Alpine
|
|
Linux](https://alpinelinux.org/).
|
|
|
|
## A bug in Alpine Linux?
|
|
|
|
Alpine Linux is currently the main distribution that I use. Not
|
|
because of the [small libc called musl](https://musl.libc.org/), but
|
|
because the whole system works straight forward. Correct. And easy to
|
|
use. But what does that have to do with etherpad-lite failing to
|
|
install in an IPv6 only network?
|
|
|
|
It turns out that there is
|
|
[a difference between musl and glibc in the default behaviour of
|
|
getaddrinfo()](https://github.com/libuv/libuv/issues/2225), which is
|
|
used to retrieve DNS results from the operating system.
|
|
|
|
## A bug in musl libc?
|
|
|
|
I got in touch with the developers of musl and the statement is rather
|
|
easy: musl [is behaving according to the
|
|
spec](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html)
|
|
and the caller, in this
|
|
context nodejs, cannot just use the **first** result, but has to
|
|
potentially try **all results**.
|
|
|
|
## A DNS or a design bug?
|
|
|
|
And at this stage the problem gets tricky. Let's revise again what I
|
|
wanted to do and why we are so deep into the rabbit hole.
|
|
|
|
I wanted to install etherpad-lite, which uses resources from
|
|
registry.npmjs.org. So npm wants to connect via HTTPS to
|
|
registry.npmjs.org and download a file. To achieve this, npm has to
|
|
find out which IP address registry.npmjs.org has. And for this it is
|
|
doing a DNS lookup.
|
|
|
|
So far, so good. Now the trouble begins:
|
|
|
|
A DNS lookup can contain 0, 1 or many answers.
|
|
|
|
**And in case of the libc call getaddrinfo, the result is a list of IPv6
|
|
and IPv4 addresses, potentially 0 to many of each.**
|
|
|
|
So an application that "just wants to connect somewhere", cannot just
|
|
take the first result.
|
|
|
|
## A bug in nodejs?
|
|
|
|
The assumption at this point is that nodejs only takes the first
|
|
result from DNS and tries to connect to it. However so far I have not
|
|
been able to spot the exact source code location to support that
|
|
claim.
|
|
|
|
Stay tuned...
|
|
|
|
|
|
[[!tag ipv6 net nodejs]]
|