diff --git a/blog/nodejs-and-ipv6-only-networks.mdwn b/blog/nodejs-and-ipv6-only-networks.mdwn new file mode 100644 index 00000000..6e1e2855 --- /dev/null +++ b/blog/nodejs-and-ipv6-only-networks.mdwn @@ -0,0 +1,90 @@ +[[!meta title="The Nodejs in IPv6 only networks problem"]] + +For some years I have been seeing problems of nodejs based +applications that do not work in IPv6 only networks. +More recently, [I again found a situation in which a nodejs based +application does not even +install](https://twitter.com/NicoSchottelius/status/1352243030368116739), +if you try to install it in an IPv6 only network. + +As the situation is not just straight forward, I started to collect +information about it on this website. + +## The starting point + +I wanted to install +[etherpad-lite](https://github.com/ether/etherpad-lite) and it failed +with the following error: + + 174 error request to https://registry.npmjs.org/express-session/-/express-session-1.17.1.tgz failed, reason: connect EHOSTUNREACH 104.16.25.35:443 + +The message **connect EHOSTUNREACH 104.16.25.35:443** already cleary +points to the problem: npm is trying to connect via IPv4 on an IPv6 +only VM. This cleary cannot work. + +## A bug in NPM? + +My first suspicion was that it [must be a bug in +npm](https://github.com/npm/cli/issues/2519). But on Twitter +[I was told that npm should work in IPv6 only +networks](https://twitter.com/A1bi/status/1352574621594300416). That's +strange. +However it turns out that [somebody else had this problem +before](https://github.com/npm/cli/issues/348#issuecomment-751143040) +and it seems to be specific to using npm on [Alpine +Linux](https://alpinelinux.org/). + +## A bug in Alpine Linux? + +Alpine Linux is currently the main distribution that I use. Not +because of the [small libc called musl](https://musl.libc.org/), but +because the whole system works straight forward. Correct. And easy to +use. But what does that have to do with etherpad-lite failing to +install in an IPv6 only network? + +It turns out that there is +[a difference between musl and glibc in the default behaviour of +getaddrinfo()](https://github.com/libuv/libuv/issues/2225), which is +used to retrieve DNS results from the operating system. + +## A bug in musl libc? + +I got in touch with the developers of musl and the statement is rather +easy: musl [is behaving according to the +spec](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html) +and the caller, in this +context nodejs, cannot just use the **first** result, but has to +potentially try **all results**. + +## A DNS or a design bug? + +And at this stage the problem gets tricky. Let's revise again what I +wanted to do and why we are so deep into the rabbit hole. + +I wanted to install etherpad-lite, which uses resources from +registry.npmjs.org. So npm wants to connect via HTTPS to +registry.npmjs.org and download a file. To achieve this, npm has to +find out which IP address registry.npmjs.org has. And for this it is +doing a DNS lookup. + +So far, so good. Now the trouble begins: + + A DNS lookup can contain 0, 1 or many answers. + +**And in case of the libc call getaddrinfo, the result is a list of IPv6 +and IPv4 addresses, potentially 0 to many of each.** + +So an application that "just wants to connect somewhere", cannot just +take the first result. + +## A bug in nodejs? + +The assumption at this point is that nodejs only takes the first +result from DNS and tries to connect to it. However so far I have not +been able to spot the exact source code location to support that +claim. + +Stay tuned... + + +[[!tag ipv6 net nodejs]]