www.nico.schottelius.org/blog/nodejs-and-ipv6-only-networks.mdwn
Nico Schottelius 6a67754323 ++nodejs bug
2021-01-23 09:48:25 +01:00

90 lines
3.3 KiB
Markdown

[[!meta title="The Nodejs in IPv6 only networks problem"]]
For some years I have been seeing problems of nodejs based
applications that do not work in IPv6 only networks.
More recently, [I again found a situation in which a nodejs based
application does not even
install](https://twitter.com/NicoSchottelius/status/1352243030368116739),
if you try to install it in an IPv6 only network.
As the situation is not just straight forward, I started to collect
information about it on this website.
## The starting point
I wanted to install
[etherpad-lite](https://github.com/ether/etherpad-lite) and it failed
with the following error:
174 error request to https://registry.npmjs.org/express-session/-/express-session-1.17.1.tgz failed, reason: connect EHOSTUNREACH 104.16.25.35:443
The message **connect EHOSTUNREACH 104.16.25.35:443** already cleary
points to the problem: npm is trying to connect via IPv4 on an IPv6
only VM. This cleary cannot work.
## A bug in NPM?
My first suspicion was that it [must be a bug in
npm](https://github.com/npm/cli/issues/2519). But on Twitter
[I was told that npm should work in IPv6 only
networks](https://twitter.com/A1bi/status/1352574621594300416). That's
strange.
However it turns out that [somebody else had this problem
before](https://github.com/npm/cli/issues/348#issuecomment-751143040)
and it seems to be specific to using npm on [Alpine
Linux](https://alpinelinux.org/).
## A bug in Alpine Linux?
Alpine Linux is currently the main distribution that I use. Not
because of the [small libc called musl](https://musl.libc.org/), but
because the whole system works straight forward. Correct. And easy to
use. But what does that have to do with etherpad-lite failing to
install in an IPv6 only network?
It turns out that there is
[a difference between musl and glibc in the default behaviour of
getaddrinfo()](https://github.com/libuv/libuv/issues/2225), which is
used to retrieve DNS results from the operating system.
## A bug in musl libc?
I got in touch with the developers of musl and the statement is rather
easy: musl [is behaving according to the
spec](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html)
and the caller, in this
context nodejs, cannot just use the **first** result, but has to
potentially try **all results**.
## A DNS or a design bug?
And at this stage the problem gets tricky. Let's revise again what I
wanted to do and why we are so deep into the rabbit hole.
I wanted to install etherpad-lite, which uses resources from
registry.npmjs.org. So npm wants to connect via HTTPS to
registry.npmjs.org and download a file. To achieve this, npm has to
find out which IP address registry.npmjs.org has. And for this it is
doing a DNS lookup.
So far, so good. Now the trouble begins:
A DNS lookup can contain 0, 1 or many answers.
**And in case of the libc call getaddrinfo, the result is a list of IPv6
and IPv4 addresses, potentially 0 to many of each.**
So an application that "just wants to connect somewhere", cannot just
take the first result.
## A bug in nodejs?
The assumption at this point is that nodejs only takes the first
result from DNS and tries to connect to it. However so far I have not
been able to spot the exact source code location to support that
claim.
Stay tuned...
[[!tag ipv6 net nodejs]]