Discussion:
systemd failing with vfs-scale-working patch-set
(too old to reply)
Sedat Dilek
2010-11-29 13:36:41 UTC
Permalink
Sorry for resending this, but systemd-devel ML does not accept file
attachments >= 40KiB.
I know it's not loved to reference to pastebin services, but resizing
orig-pic with -10% (convert tool) has still more than 40KiB.
The moderators of the ML should re-think if 40KiB is a bit low (100KiB
should be fine, when ppl want to attach a screenshot taken with their
digicam).

- Sedat -

[1] Loading Image...
Hi,
I have tried the vfs-scale-working patch-set from [1] (GIT tree see [2]).
Unfortunately, I cannot boot my Debian/sid i386 system with systemd
(system freezes) but with sysvinit.
I have attached a screenshot of the Call-trace, hope this helps.
* Downgrade systemd from v15 down to v12
* Remove "mtab hackz" [3]
* Remove native mount feature [4] (also in combination with "mtab hackz")
* Mask systemd-remount-api-vfs.service (for testing purposes)
Not sure what the real problem is (as it is very early), but sysvinit is fine.
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-2.6.37-rc3-686
root=UUID=1ceb69a7-ecf4-47e9-a231-b74e0f0a9b62 ro radeon.modeset=1
lapic 3 init=/sbin/init.sysvinit
How can I protocol very early messages (I don't think it makes any
sense to activate systemd debugging [5] right now)?
Any advices for this?
Any idea on the Call-trace or digging into the problem?
Any help appreciated.
Kind Regards,
- Sedat -
[1] http://lkml.org/lkml/2010/11/27/54
[2] http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git;a=shortlog;h=refs/heads/vfs-scale-working
[3] http://wiki.debian.org/systemd#KnownIssuesandWorkarounds: "Issue
#4: Warning: "/etc/mtab is not a symlink or not pointing to
/proc/self/mounts"
[4] http://wiki.debian.org/systemd#Usenativemount
[5] http://wiki.debian.org/systemd#Debuggingsystemd
P.S.: Mask systemd-remount-api-vfs.service
$ cd /etc/systemd/system ; ln -sf /dev/null systemd-remount-api-vfs.service
Sedat Dilek
2010-11-29 12:55:00 UTC
Permalink
Hi,

I have tried the vfs-scale-working patch-set from [1] (GIT tree see [2]).
Unfortunately, I cannot boot my Debian/sid i386 system with systemd
(system freezes) but with sysvinit.
I have attached a screenshot of the Call-trace, hope this helps.

What I tried to get things worjing is:
* Downgrade systemd from v15 down to v12
* Remove "mtab hackz" [3]
* Remove native mount feature [4] (also in combination with "mtab hackz")
* Mask systemd-remount-api-vfs.service (for testing purposes)

Not sure what the real problem is (as it is very early), but sysvinit is fine.

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-2.6.37-rc3-686
root=UUID=1ceb69a7-ecf4-47e9-a231-b74e0f0a9b62 ro radeon.modeset=1
lapic 3 init=/sbin/init.sysvinit

How can I protocol very early messages (I don't think it makes any
sense to activate systemd debugging [5] right now)?
Any advices for this?

Any idea on the Call-trace or digging into the problem?
Any help appreciated.

Kind Regards,
- Sedat -

[1] http://lkml.org/lkml/2010/11/27/54
[2] http://git.kernel.org/?p=linux/kernel/git/npiggin/linux-npiggin.git;a=shortlog;h=refs/heads/vfs-scale-working
[3] http://wiki.debian.org/systemd#KnownIssuesandWorkarounds: "Issue
#4: Warning: "/etc/mtab is not a symlink or not pointing to
/proc/self/mounts"
[4] http://wiki.debian.org/systemd#Usenativemount
[5] http://wiki.debian.org/systemd#Debuggingsystemd

P.S.: Mask systemd-remount-api-vfs.service

$ cd /etc/systemd/system ; ln -sf /dev/null systemd-remount-api-vfs.service
Sedat Dilek
2010-12-20 21:07:14 UTC
Permalink
Hi Nick,

after upgrading my toolchain and kernel-buildsystem I found some time
spending in getting some more infos on the systemd w/ vfs-scale
problem.
Eric reported in [1] same problems as I have seen.

My systemd is now v15+git20101210.8a9ef77 and Linux kernel is:

# cat /proc/version
Linux version 2.6.37-rc6-686 (Debian
2.6.37~rc6-5~next20101220.dileks.1) (***@gmail.com) (gcc
version 4.5.2 (Debian 4.5.2-1) ) #1 SMP Mon Dec 20 19:03:29 CET 2010

I could take some new pics [2].

[3] shows a kernel NULL pointer dereference (dmesg command in
kdb-console), but the address is not displayed, might be I need to set
some additional kernel-debug-options.

[4] shows the output of "btp 4" (backtrace PID #4 from kdb-console),
see also in [3] "*pde = 00000000" (8 zeroes) shows <4> (PID #4) at the
beginning of the line.

I have also a "btp 1" (systemd has PID #1).

Hope this helps to narrow down the problem.
If you need additional informations or (disasm-ed) files, please let me know.

Some pics are doubled, sorry for the bad quality, I won't be a surgery
(need a calm hand, turned off auto-flash of my digicam is the real
cause I guess ;-)).

Regards,
- Sedat -


[1] http://marc.info/?l=linux-fsdevel&m=129287321819101&w=2
[2] http://files.iniza.org/for_npiggin/2010-12-20/
[3] Loading Image...
[4] Loading Image...
Nick Piggin
2010-12-21 04:57:13 UTC
Permalink
Post by Sedat Dilek
Hi Nick,
after upgrading my toolchain and kernel-buildsystem I found some time
spending in getting some more infos on the systemd w/ vfs-scale
problem.
Eric reported in [1] same problems as I have seen.
# cat /proc/version
Linux version 2.6.37-rc6-686 (Debian
version 4.5.2 (Debian 4.5.2-1) ) #1 SMP Mon Dec 20 19:03:29 CET 2010
I could take some new pics [2].
[3] shows a kernel NULL pointer dereference (dmesg command in
kdb-console), but the address is not displayed, might be I need to set
some additional kernel-debug-options.
[4] shows the output of "btp 4" (backtrace PID #4 from kdb-console),
see also in [3] "*pde = 00000000" (8 zeroes) shows <4> (PID #4) at the
beginning of the line.
I have also a "btp 1" (systemd has PID #1).
Hope this helps to narrow down the problem.
If you need additional informations or (disasm-ed) files, please let me know.
Some pics are doubled, sorry for the bad quality, I won't be a surgery
(need a calm hand, turned off auto-flash of my digicam is the real
cause I guess ;-)).
Thanks to you both for testing and reporting this. The important part is
NULL instruction pointer at dput.

I have a patch to set various d_flags according to what d_op functions
have been defined. This allows branch and cacheline load reduction in
common cases in fastpaths. However those flags were set but not cleared,
not expecting d_ops to be switched on active dentries.

cgroups filesystem actually switches from simple dentry ops to its own
one, when turning from a negative to positive dentry. That's possibly OK
technically (although I didn't consider all races), but AFAIKS it is not
something that a filesystem is allowed to "know".

I'll submit a patch to fix cgroups, and a bugcheck to catch such things
again.

Thanks,
Nick

Nick Piggin
2010-11-29 13:55:26 UTC
Permalink
Hi,
I have tried the vfs-scale-working patch-set from [1] (GIT tree see [2]).
Unfortunately, I cannot boot my Debian/sid i386 system with systemd
(system freezes) but with sysvinit.
I have attached a screenshot of the Call-trace, hope this helps.
* Downgrade systemd from v15 down to v12
* Remove "mtab hackz" [3]
* Remove native mount feature [4] (also in combination with "mtab hackz")
* Mask systemd-remount-api-vfs.service (for testing purposes)
Not sure what the real problem is (as it is very early), but sysvinit is fine.
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-2.6.37-rc3-686
root=UUID=1ceb69a7-ecf4-47e9-a231-b74e0f0a9b62 ro radeon.modeset=1
lapic 3 init=/sbin/init.sysvinit
How can I protocol very early messages (I don't think it makes any
sense to activate systemd debugging [5] right now)?
Any advices for this?
Any idea on the Call-trace or digging into the problem?
Any help appreciated.
Thanks for this, it definitely looks like a bug in vfs scale patches.

I wonder if you can run with frame pointers turned on to get a more
reliable back trace, and then also try to capture the information
surrounding the oops.

Thanks,
Nick
Sedat Dilek
2010-11-29 14:08:42 UTC
Permalink
Post by Nick Piggin
Hi,
I have tried the vfs-scale-working patch-set from [1] (GIT tree see [2]).
Unfortunately, I cannot boot my Debian/sid i386 system with systemd
(system freezes) but with sysvinit.
[ ... ]
Post by Nick Piggin
Any idea on the Call-trace or digging into the problem?
Any help appreciated.
Thanks for this, it definitely looks like a bug in vfs scale patches.
I wonder if you can run with frame pointers turned on to get a more
reliable back trace, and then also try to capture the information
surrounding the oops.
Hi Nick,

I was just discussing the issue on #systemd and as you mention diverse
people there pointed out that it is not enough backtrace.
Serial-console or net-console was recommended to me, this requires a
2nd machine...
So let me see if I can setup n-c and get it running for a backtrace.

# grep -i frame /boot/config-$(uname -r) | grep -i point
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_WANT_FRAME_POINTERS=y
# CONFIG_FRAME_POINTER is not set

Damn, needs a rebuild :-).

- Sedat -
Andrey Borzenkov
2010-11-29 17:30:51 UTC
Permalink
Post by Sedat Dilek
Serial-console or net-console was recommended to me, this requires a
2nd machine...
I used VM with serial port redirection when I needed early console
trace. Is easier than second system.
Sedat Dilek
2010-11-29 17:53:36 UTC
Permalink
Post by Andrey Borzenkov
Post by Sedat Dilek
Serial-console or net-console was recommended to me, this requires a
2nd machine...
I used VM with serial port redirection when I needed early console
trace. Is easier than second system.
After activasion of CONFIG_FRAME_POINTER=y I have now a bit more
verbose output, but I think I didn't have seen all interesting lines
in the boot-process.

Your solution sounds fine to me.
Can you give a bit more details on this?
Which VM solution did you use (KVM, VMware, VirtualBox, etc.)?
Which settings have to be done in the VM?
Which kernel-config parameters do I have to set or are recommended (my
current k-c is attached)?

- Sedat -
Andrey Borzenkov
2010-11-29 18:25:59 UTC
Permalink
Post by Sedat Dilek
Post by Andrey Borzenkov
Post by Sedat Dilek
Serial-console or net-console was recommended to me, this requires a
2nd machine...
I used VM with serial port redirection when I needed early console
trace. Is easier than second system.
After activasion of CONFIG_FRAME_POINTER=y I have now a bit more
verbose output, but I think I didn't have seen all interesting lines
in the boot-process.
Your solution sounds fine to me.
Can you give a bit more details on this?
Which VM solution did you use (KVM, VMware, VirtualBox, etc.)?
I use Vmware Player (for historical reasons); as of version 3.x it is
self contained, i.e. it allows you to create new VM.
Post by Sedat Dilek
Which settings have to be done in the VM?
Add new serial port (may be one is included by default, do not know -
it is not listed in new VM); then serial port appears in removable
devices and you can connect/disconnect it at any time. You can
redirect output to file or connect to socket (not tried).
Post by Sedat Dilek
Which kernel-config parameters do I have to set or are recommended (my
current k-c is attached)?
As per serial-console.txt, add

console=ttyS0 console=tty0

to command line. This could be ttyS1, do not remember.
Kay Sievers
2010-11-29 14:27:31 UTC
Permalink
Post by Sedat Dilek
The moderators of the ML should re-think if 40KiB is a bit low (100KiB
should be fine, when ppl want to attach a screenshot taken with their
digicam).
It's 300kb now.

Kay
Continue reading on narkive:
Loading...