Discussion:
[systemd-devel] systemd-resolve SERVFAIL on lookups found by upstream DNS server
Francesco Belladonna
2021-04-16 16:33:17 UTC
Permalink
Greetings,
I’ve been trying to debug why systemd-resolve is not able to perform nslookup
static-exp1.licdn.com.
Altering /etc/resolv.conf to point directly to the DNS server (or my router
in this case) solves the problem, which seems to suggest the problem is
isolated to systemd-resolve.
The problem is identical on both my laptops which are running 2 different
O.S. (Kubuntu 18.04 and Fedora 33).
The entire DNS configuration is provided by the router acting as DHCP
server.

The system I’m performing my tests is Kubuntu, where the systemd version is:

systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS
+KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

SYSTEMD_LOG_LEVEL is set to debug.
Is there any other useful tool I can use to debug this further?
The problematic domain is static-exp1.licdn.com which is the CDN for
LinkedIn. I have no idea why *this* specific domain is affected.

Attached the output of journalctl -u systemd-resolved -f, located at this
URL:
https://gist.github.com/Fire-Dragon-DoL/8369b07dc27f57f7f05ffa25fd1d6962
The output is obtained by running nslookup static-exp1.licdn.com after
systemd-resolve
--flush-caches.

Any help is greatly appreciated.
My temporary fix is to alter /etc/resolv.conf to sidestep systemd-resolve.
I’d like to avoid this approach since it removes local caching

—
Francesco Belladonna
Mantas Mikulėnas
2021-04-17 12:10:51 UTC
Permalink
Post by Francesco Belladonna
Greetings,
I’ve been trying to debug why systemd-resolve is not able to perform nslookup
static-exp1.licdn.com.
Altering /etc/resolv.conf to point directly to the DNS server (or my
router in this case) solves the problem, which seems to suggest the problem
is isolated to systemd-resolve.
The problem is identical on both my laptops which are running 2 different
O.S. (Kubuntu 18.04 and Fedora 33).
The entire DNS configuration is provided by the router acting as DHCP
server.
systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid
SYSTEMD_LOG_LEVEL is set to debug.
Is there any other useful tool I can use to debug this further?
The problematic domain is static-exp1.licdn.com which is the CDN for
LinkedIn. I have no idea why *this* specific domain is affected.
Most likely because it has one of those *interesting* CNAME chains:

$ rdt static-exp1.licdn.com
static-exp1.licdn.com = 2-01-2c3e-003d.cdx.cedexis.net
2-01-2c3e-003d.cdx.cedexis.net = li-prod-static.azureedge.net
li-prod-static.azureedge.net = li-prod-static.afd.azureedge.net
li-prod-static.afd.azureedge.net =
star-azureedge-prod.trafficmanager.net
star-azureedge-prod.trafficmanager.net =
dual.t-0009.t-msedge.net
dual.t-0009.t-msedge.net = t-0009.t-msedge.net
t-0009.t-msedge.net =
Edge-Prod-LON21r3.ctrl.t-0009.t-msedge.net
edge-prod-lon21r3.ctrl.t-0009.t-msedge.net =
standard.t-0009.t-msedge.net
standard.t-0009.t-msedge.net = 13.107.213.19,
13.107.246.19, 2620:1ec:46::19, 2620:1ec:bdf::19

(Though sometimes it's shorter, pointing at epsiloncdn instead of Azure. It
depends on where you're making the query from.)

I *think* this was fixed in git a few weeks ago. There's already an Ubuntu
bug report for the same issue:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1921636
--
Mantas Mikulėnas
Francesco Belladonna
2021-04-17 18:08:57 UTC
Permalink
Thanks for the input. I speculated it could be a problem related to how my
router is returning some dns queries, given that's the other common piece.
The Fedora laptop has systemd resolve version 246 which is way more up to
date.

What you are saying makes sense though. I reviewed the bugs and it seems to
match perfectly.
Post by Mantas Mikulėnas
Post by Francesco Belladonna
Greetings,
I’ve been trying to debug why systemd-resolve is not able to perform nslookup
static-exp1.licdn.com.
Altering /etc/resolv.conf to point directly to the DNS server (or my
router in this case) solves the problem, which seems to suggest the problem
is isolated to systemd-resolve.
The problem is identical on both my laptops which are running 2 different
O.S. (Kubuntu 18.04 and Fedora 33).
The entire DNS configuration is provided by the router acting as DHCP
server.
systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid
SYSTEMD_LOG_LEVEL is set to debug.
Is there any other useful tool I can use to debug this further?
The problematic domain is static-exp1.licdn.com which is the CDN for
LinkedIn. I have no idea why *this* specific domain is affected.
$ rdt static-exp1.licdn.com
static-exp1.licdn.com = 2-01-2c3e-003d.cdx.cedexis.net
2-01-2c3e-003d.cdx.cedexis.net = li-prod-static.azureedge.net
li-prod-static.azureedge.net = li-prod-static.afd.azureedge.net
li-prod-static.afd.azureedge.net =
star-azureedge-prod.trafficmanager.net
star-azureedge-prod.trafficmanager.net =
dual.t-0009.t-msedge.net
dual.t-0009.t-msedge.net = t-0009.t-msedge.net
t-0009.t-msedge.net =
Edge-Prod-LON21r3.ctrl.t-0009.t-msedge.net
edge-prod-lon21r3.ctrl.t-0009.t-msedge.net =
standard.t-0009.t-msedge.net
standard.t-0009.t-msedge.net = 13.107.213.19,
13.107.246.19, 2620:1ec:46::19, 2620:1ec:bdf::19
(Though sometimes it's shorter, pointing at epsiloncdn instead of Azure.
It depends on where you're making the query from.)
I *think* this was fixed in git a few weeks ago. There's already an
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1921636
--
Mantas Mikulėnas
Loading...