Discussion:
journald fsync() errors
Add Reply
v***@pengaru.com
2018-04-03 21:10:29 UTC
Reply
Permalink
Raw Message
Back when I worked on making fsync() in journald asynchronous, I
preserved the existing strategy of ignoring fsync() errors.

In reading [1], I am reminded of this situation and am again wondering
why this is the case. Shouldn't journald trigger a journal rotate when
fsync() realizes an IO error, marking the previous journal as corrupt?

Can someone remind me of the rationale behind the existing approach?

Regards,
Vito Caputo

[1] https://www.postgresql.org/message-id/flat/CAMsr%2BYE5Gs9iPqw2mQ6OHt1aC5Qk5EuBFCyG%2BvzHun1EqMxyQg%40mail.gmail.com
Lennart Poettering
2018-04-04 14:49:29 UTC
Reply
Permalink
Raw Message
Post by v***@pengaru.com
Back when I worked on making fsync() in journald asynchronous, I
preserved the existing strategy of ignoring fsync() errors.
In reading [1], I am reminded of this situation and am again wondering
why this is the case. Shouldn't journald trigger a journal rotate when
fsync() realizes an IO error, marking the previous journal as corrupt?
Can someone remind me of the rationale behind the existing approach?
Hmm, you are right, we should rotate if fsync() fails, indeed.

Would love to review/merge a patch for that.

Lennart
--
Lennart Poettering, Red Hat
v***@pengaru.com
2018-04-04 21:05:01 UTC
Reply
Permalink
Raw Message
Post by Lennart Poettering
Post by v***@pengaru.com
Back when I worked on making fsync() in journald asynchronous, I
preserved the existing strategy of ignoring fsync() errors.
In reading [1], I am reminded of this situation and am again wondering
why this is the case. Shouldn't journald trigger a journal rotate when
fsync() realizes an IO error, marking the previous journal as corrupt?
Can someone remind me of the rationale behind the existing approach?
Hmm, you are right, we should rotate if fsync() fails, indeed.
Would love to review/merge a patch for that.
Slapped this [1] together today. I did not scrutinize the higher-order
functions to verify they Do The Right Thing when the -EIO propagates
out, but considering mmap_cache_got_sigbus() already produced -EIO, I
assume things work.

As mentioned in the PR, this is 100% untested. But I should be able to
make time to iterate on the PR if it's desirable and review requires
some changes.

Regards,
Vito Caputo

[1] https://github.com/systemd/systemd/pull/8654

Loading...