Discussion:
[systemd-devel] [RFC PATCH v3 2/5] ACPI: button: Extends complement switch event support for all modes
Lv Zheng
2017-05-26 23:42:32 UTC
Permalink
Surface Pro 3 is a typical platform where suspend/resume loop problem
can be seen.

The problem is due to a systemd 229 bug:
1. "ignore": always can trigger endless suspend/resume loop
2. "open": sometimes suspend/resume loop can be stopped
3. "method": always can trigger endless susped/resume loop
The buggy systemd unexpectedly waits for an explicit "open" event after
boot/resume or it will suspends. However even when kernel can send a
faked "open" to it, its state machine is still wrong, systemd may not
respond "close" events arrived after "open" or may suddenly suspend
without seeing any instant events.

Recent systemd 233 has fixed this issue:
1. "ignore": everything works fine;
2. "open": no suspend/resume cycle, but sometimes cannot suspend the
platform again after the first resume;
3. "method": no suspend/resume cycle, but always cannot suspend the
platform again after the first resume.
The conclusion is: for suspend/resume cycle issue, "ignore" mode fixes
everything, but current "method" mode is still buggy.
Differences are because button driver only implements complement switch
events for "ignore" mode. Without complement switch events, firmware
triggered "close" cannot be delivered to userspace (confirmed by
evemu-record).

The root cause of the lid state issues is the variation of the platform
firmware implementations:
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.

Let's check the docking external display issues (see links below):
1. For case 1, both "method"/"ignore" modes can work correctly;
2. For case 2/4/5, both "method"/"ignore" modes cannot work correctly;
3. For case 3, "method" can work correctly while "ignore" mode cannot.
The conclusion is: for docking external display issue, though the issue
still needs graphics layer (graphics drivers or desktop managers) to be
improved to ensure no breakages for case 2/4/5 platforms, there is a case
where "method" mode plays better.

Thus ACPI subsystem has been pushed to revert back to "method" mode due to
regression rule and case 3 (platforms reported on the links should all be
case 3 platforms), and libinput developers have volunteered to help to
provide workarounds when graphics layer is not fixed or systemd is not
updated.

Thus this patch extends the complement switch event support to other modes
using new indication: generating complement switch event for BIOS notified
"close". So that when button driver is reverted back to "method" mode, it
won't act worse than "ignore" mode on fixed systemd.

Tested with systemd 233, all modes worked fine (no suspend/resume loop and
can suspend any times) after applying this patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455
https://bugzilla.redhat.com/show_bug.cgi?id=1430259
Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>

Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 116 +++++++++++++++++++++++++-------------------------
1 file changed, 57 insertions(+), 59 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 874ba60..a72f5bf 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -108,6 +108,7 @@ struct acpi_button {
unsigned long pushed;
int last_state;
ktime_t last_time;
+ bool last_is_bios;
bool suspended;
};

@@ -144,78 +145,71 @@ static int acpi_lid_notify_state(struct acpi_device *device,
struct acpi_button *button = acpi_driver_data(device);
int ret;
ktime_t next_report;
- bool do_update;

/*
- * In lid_init_state=ignore mode, if user opens/closes lid
- * frequently with "open" missing, and "last_time" is also updated
- * frequently, "close" cannot be delivered to the userspace.
- * So "last_time" is only updated after a timeout or an actual
- * switch.
+ * Ignore frequently replayed switch events.
+ *
+ * AML tables can put Notify(LID, xxx) in a notification method,
+ * and handling the hardware events by executing the entry methods
+ * (ex., _Qxx) may cause the notification method to be invoked
+ * several times.
+ * This check doesn't apply to the faked events because if a BIOS
+ * notification comes after a faked event, it must pass this check
+ * in order to be reliablely delivered to user space.
*/
- if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
- button->last_state != !!state)
- do_update = true;
- else
- do_update = false;
-
next_report = ktime_add(button->last_time,
ms_to_ktime(lid_report_interval));
- if (button->last_state == !!state &&
- ktime_after(ktime_get(), next_report)) {
+ if (button->last_is_bios && button->last_state == !!state &&
+ !ktime_after(ktime_get(), next_report))
+ return 0;
+
+ /*
+ * Send the unreliable complement switch event:
+ *
+ * On most platforms, the lid device is reliable. However there are
+ * exceptions:
+ * 1. Platforms returning initial lid state as "close" by default
+ * after booting/resuming:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=89211
+ * https://bugzilla.kernel.org/show_bug.cgi?id=106151
+ * 2. Platforms never reporting "open" events:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=106941
+ * On these buggy platforms, the usage model of the ACPI lid device
+ * actually is:
+ * 1. The initial returning value of _LID may not be reliable.
+ * 2. The open event may not be reliable.
+ * 3. The close event is reliable.
+ *
+ * But SW_LID is typed as input switch event, the input layer
+ * checks if the event is redundant. Hence if the state is not
+ * switched, the userspace cannot see this platform triggered
+ * reliable event. By inserting a complement switch event, it then
+ * is guaranteed that the platform triggered reliable one can
+ * always be seen by the userspace.
+ */
+ if (button->last_state == !!state && bios_notify) {
/* Complain the buggy firmware */
pr_warn_once("The lid device is not compliant to SW_LID.\n");

/*
- * Send the unreliable complement switch event:
- *
- * On most platforms, the lid device is reliable. However
- * there are exceptions:
- * 1. Platforms returning initial lid state as "close" by
- * default after booting/resuming:
- * https://bugzilla.kernel.org/show_bug.cgi?id=89211
- * https://bugzilla.kernel.org/show_bug.cgi?id=106151
- * 2. Platforms never reporting "open" events:
- * https://bugzilla.kernel.org/show_bug.cgi?id=106941
- * On these buggy platforms, the usage model of the ACPI
- * lid device actually is:
- * 1. The initial returning value of _LID may not be
- * reliable.
- * 2. The open event may not be reliable.
- * 3. The close event is reliable.
- *
- * But SW_LID is typed as input switch event, the input
- * layer checks if the event is redundant. Hence if the
- * state is not switched, the userspace cannot see this
- * platform triggered reliable event. By inserting a
- * complement switch event, it then is guaranteed that the
- * platform triggered reliable one can always be seen by
- * the userspace.
+ * Do not generate complement switch event for "open"
+ * events - faking "close" events can trigger unexpected
+ * behaviors.
+ * Thus only generate complement switch event for BIOS
+ * notified "close".
*/
- if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
- do_update = true;
- /*
- * Do generate complement switch event for "close"
- * as "close" is reliable and wrong "open" won't
- * trigger unexpected behaviors.
- * Do not generate complement switch event for
- * "open" as "open" is not reliable and wrong
- * "close" will trigger unexpected behaviors.
- */
- if (!state) {
- input_report_switch(button->input,
- SW_LID, state);
- input_sync(button->input);
- }
+ if (!state) {
+ input_report_switch(button->input, SW_LID, state);
+ input_sync(button->input);
}
}
+
/* Send the platform triggered reliable event */
- if (do_update) {
- input_report_switch(button->input, SW_LID, !state);
- input_sync(button->input);
- button->last_state = !!state;
- button->last_time = ktime_get();
- }
+ input_report_switch(button->input, SW_LID, !state);
+ input_sync(button->input);
+ button->last_state = !!state;
+ button->last_time = ktime_get();
+ button->last_is_bios = bios_notify;

if (state)
pm_wakeup_event(&device->dev, 0);
@@ -444,6 +438,8 @@ static int acpi_button_resume(struct device *dev)
struct acpi_button *button = acpi_driver_data(device);

button->suspended = false;
+ /* ignore replay frequency check between suspend/resume */
+ button->last_is_bios = false;
if (button->type == ACPI_BUTTON_TYPE_LID)
acpi_lid_initialize_state(device);
return 0;
@@ -492,6 +488,8 @@ static int acpi_button_add(struct acpi_device *device)
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
button->last_state = !!acpi_lid_evaluate_state(device);
button->last_time = ktime_get();
+ /* ignore replay frequency check after boot */
+ button->last_is_bios = false;
} else {
printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
error = -ENODEV;
--
2.7.4
Lv Zheng
2017-05-26 23:42:24 UTC
Permalink
This patch adds a parameter to acpi_lid_notify_state() so that it can act
differently against BIOS notification and kernel faked events.

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 6d5a8c1..874ba60 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -138,7 +138,8 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
return lid_state ? 1 : 0;
}

-static int acpi_lid_notify_state(struct acpi_device *device, int state)
+static int acpi_lid_notify_state(struct acpi_device *device,
+ int state, bool bios_notify)
{
struct acpi_button *button = acpi_driver_data(device);
int ret;
@@ -360,7 +361,8 @@ int acpi_lid_open(void)
}
EXPORT_SYMBOL(acpi_lid_open);

-static int acpi_lid_update_state(struct acpi_device *device)
+static int acpi_lid_update_state(struct acpi_device *device,
+ bool bios_notify)
{
int state;

@@ -368,17 +370,17 @@ static int acpi_lid_update_state(struct acpi_device *device)
if (state < 0)
return state;

- return acpi_lid_notify_state(device, state);
+ return acpi_lid_notify_state(device, state, bios_notify);
}

static void acpi_lid_initialize_state(struct acpi_device *device)
{
switch (lid_init_state) {
case ACPI_BUTTON_LID_INIT_OPEN:
- (void)acpi_lid_notify_state(device, 1);
+ (void)acpi_lid_notify_state(device, 1, false);
break;
case ACPI_BUTTON_LID_INIT_METHOD:
- (void)acpi_lid_update_state(device);
+ (void)acpi_lid_update_state(device, false);
break;
case ACPI_BUTTON_LID_INIT_IGNORE:
default:
@@ -398,7 +400,7 @@ static void acpi_button_notify(struct acpi_device *device, u32 event)
case ACPI_BUTTON_NOTIFY_STATUS:
input = button->input;
if (button->type == ACPI_BUTTON_TYPE_LID) {
- acpi_lid_update_state(device);
+ acpi_lid_update_state(device, true);
} else {
int keycode;
--
2.7.4
Benjamin Tissoires
2017-05-29 16:04:00 UTC
Permalink
Hi Lv,
Post by Lv Zheng
This patch adds a parameter to acpi_lid_notify_state() so that it can act
differently against BIOS notification and kernel faked events.
---
Answering to this one for the entire series:
last week was a mix of public holidays and PTO from me. I was only
able to review this series today, so sorry for the delay.

I still have a feeling this driver is far too engineered for a simple
input node. There are internal states, defers, mangle of events and too
many kernel parameters.

I still need to get my head around it, but the more I think of it, the
more I think the solution provided by Lennart in
https://github.com/systemd/systemd/issues/2807 is the simplest one:
when we are not sure about the state of the LID switch because _LID
might be wrong, we shouldn't export a LID input node.
Which means that all broken cases would be fixed by just a quirk
"unreliable lid switch".

Give me a day or two to get this in a better shape.

Cheers,
Benjamin
Zheng, Lv
2017-05-31 08:57:33 UTC
Permalink
Hi,
Post by Lv Zheng
Tissoires
Subject: Re: [RFC PATCH v3 1/5] ACPI: button: Add indication of BIOS notification and faked events
Hi Lv,
Post by Lv Zheng
This patch adds a parameter to acpi_lid_notify_state() so that it can act
differently against BIOS notification and kernel faked events.
---
last week was a mix of public holidays and PTO from me. I was only
able to review this series today, so sorry for the delay.
Here we were having "the Dragon Boat Festival".
But we really won't have chances of seeing see dragon boats in major cities.
Post by Lv Zheng
I still have a feeling this driver is far too engineered for a simple
input node. There are internal states, defers, mangle of events and too
many kernel parameters.
That's the firmware world and windows compliance world. :)
Post by Lv Zheng
I still need to get my head around it, but the more I think of it, the
more I think the solution provided by Lennart in
when we are not sure about the state of the LID switch because _LID
might be wrong, we shouldn't export a LID input node.
Which means that all broken cases would be fixed by just a quirk
"unreliable lid switch".
I checked the post and had no idea about what was going on.
However, my test shows systemd 233 works fine with button.lid_init_state=ignore.
I don't know what has been improved.

But a noticeable thing on old systemd 229 is:
Even with button.lid_init_state=open, systemd still behaves strangely.
It looks systemd 229 really has a problem with its timeout logic.
And in systemd 233, the timeout mechanism seems to work better.

Anyway, the problem disappears when there are only user space changes.
Post by Lv Zheng
Give me a day or two to get this in a better shape.
Sure.

Cheers,
Lv
Post by Lv Zheng
Cheers,
Benjamin
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lv Zheng
2017-05-31 09:41:58 UTC
Permalink
Surface Pro 3 is a typical platform where suspend/resume loop problem
can be seen.

The problem is due to a systemd 229 bug:
1. "ignore": always can trigger endless suspend/resume loop
2. "open": sometimes suspend/resume loop can be stopped
3. "method": always can trigger endless susped/resume loop
The buggy systemd unexpectedly waits for an explicit "open" event after
boot/resume or it will suspends. However even when kernel can send a
faked "open" to it, its state machine is still wrong, systemd may not
respond "close" events arrived after "open" or may suddenly suspend
without seeing any instant events.

Recent systemd 233 has fixed this issue:
1. "ignore": everything works fine;
2. "open": no suspend/resume cycle, but sometimes cannot suspend the
platform again after the first resume;
3. "method": no suspend/resume cycle, but always cannot suspend the
platform again after the first resume.
The conclusion is: for suspend/resume cycle issue, "ignore" mode fixes
everything, but current "method" mode is still buggy.
The differences are due to button driver only implements complement switch
events for "ignore" mode. Without complement switch events, firmware
triggered "close" cannot be delivered to userspace (confirmed by
evemu-record).

The root cause of the lid state issues is the variation of the platform
firmware implementations:
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.

Let's check the docking external display issues (see links below):
1. For case 1, both "method"/"ignore" modes can work correctly;
2. For case 2/4/5, both "method"/"ignore" modes cannot work correctly;
3. For case 3, "method" can work correctly while "ignore" mode cannot.
The conclusion is: for docking external display issue, though the issue
still needs graphics layer (graphics drivers or desktop managers) to be
improved to ensure no breakages for case 2/4/5 platforms, there is a case
where "method" mode plays better.

Thus ACPI subsystem has been pushed to revert back to "method" mode due to
regression rule and case 3 (platforms reported on the links should all be
case 3 platforms), and libinput developers have volunteered to help to
provide workarounds when graphics layer is not fixed or systemd is not
updated.

Thus this patch extends the complement switch event support to other modes
using new indication: generating complement switch event for BIOS notified
"close". So that when button driver is reverted back to "method" mode, it
won't act worse than "ignore" mode on fixed systemd.

Tested with systemd 233, all modes worked fine (no suspend/resume loop and
can suspend any times) after applying this patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455
https://bugzilla.redhat.com/show_bug.cgi?id=1430259

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 116 +++++++++++++++++++++++++-------------------------
1 file changed, 57 insertions(+), 59 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 725a15a..36485cf 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -108,6 +108,7 @@ struct acpi_button {
unsigned long pushed;
int last_state;
ktime_t last_time;
+ bool last_is_bios;
bool suspended;
};

@@ -144,78 +145,71 @@ static int acpi_lid_notify_state(struct acpi_device *device,
struct acpi_button *button = acpi_driver_data(device);
int ret;
ktime_t next_report;
- bool do_update;

/*
- * In lid_init_state=ignore mode, if user opens/closes lid
- * frequently with "open" missing, and "last_time" is also updated
- * frequently, "close" cannot be delivered to the userspace.
- * So "last_time" is only updated after a timeout or an actual
- * switch.
+ * Ignore frequently replayed switch events.
+ *
+ * AML tables can put Notify(LID, xxx) in a notification method,
+ * and handling the hardware events by executing the entry methods
+ * (ex., _Qxx) may cause the notification method to be invoked
+ * several times.
+ * This check doesn't apply to the faked events because if a BIOS
+ * notification comes after a faked event, it must pass this check
+ * in order to be reliablely delivered to user space.
*/
- if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
- button->last_state != !!state)
- do_update = true;
- else
- do_update = false;
-
next_report = ktime_add(button->last_time,
ms_to_ktime(lid_report_interval));
- if (button->last_state == !!state &&
- ktime_after(ktime_get(), next_report)) {
+ if (button->last_is_bios && button->last_state == !!state &&
+ !ktime_after(ktime_get(), next_report))
+ return 0;
+
+ /*
+ * Send the unreliable complement switch event:
+ *
+ * On most platforms, the lid device is reliable. However there are
+ * exceptions:
+ * 1. Platforms returning initial lid state as "close" by default
+ * after booting/resuming:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=89211
+ * https://bugzilla.kernel.org/show_bug.cgi?id=106151
+ * 2. Platforms never reporting "open" events:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=106941
+ * On these buggy platforms, the usage model of the ACPI lid device
+ * actually is:
+ * 1. The initial returning value of _LID may not be reliable.
+ * 2. The open event may not be reliable.
+ * 3. The close event is reliable.
+ *
+ * But SW_LID is typed as input switch event, the input layer
+ * checks if the event is redundant. Hence if the state is not
+ * switched, the userspace cannot see this platform triggered
+ * reliable event. By inserting a complement switch event, it then
+ * is guaranteed that the platform triggered reliable one can
+ * always be seen by the userspace.
+ */
+ if (button->last_state == !!state && is_bios_event) {
/* Complain the buggy firmware */
pr_warn_once("The lid device is not compliant to SW_LID.\n");

/*
- * Send the unreliable complement switch event:
- *
- * On most platforms, the lid device is reliable. However
- * there are exceptions:
- * 1. Platforms returning initial lid state as "close" by
- * default after booting/resuming:
- * https://bugzilla.kernel.org/show_bug.cgi?id=89211
- * https://bugzilla.kernel.org/show_bug.cgi?id=106151
- * 2. Platforms never reporting "open" events:
- * https://bugzilla.kernel.org/show_bug.cgi?id=106941
- * On these buggy platforms, the usage model of the ACPI
- * lid device actually is:
- * 1. The initial returning value of _LID may not be
- * reliable.
- * 2. The open event may not be reliable.
- * 3. The close event is reliable.
- *
- * But SW_LID is typed as input switch event, the input
- * layer checks if the event is redundant. Hence if the
- * state is not switched, the userspace cannot see this
- * platform triggered reliable event. By inserting a
- * complement switch event, it then is guaranteed that the
- * platform triggered reliable one can always be seen by
- * the userspace.
+ * Do not generate complement switch event for "open"
+ * events - faking "close" events can trigger unexpected
+ * behaviors.
+ * Thus only generate complement switch event for BIOS
+ * notified "close".
*/
- if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
- do_update = true;
- /*
- * Do generate complement switch event for "close"
- * as "close" is reliable and wrong "open" won't
- * trigger unexpected behaviors.
- * Do not generate complement switch event for
- * "open" as "open" is not reliable and wrong
- * "close" will trigger unexpected behaviors.
- */
- if (!state) {
- input_report_switch(button->input,
- SW_LID, state);
- input_sync(button->input);
- }
+ if (!state) {
+ input_report_switch(button->input, SW_LID, state);
+ input_sync(button->input);
}
}
+
/* Send the platform triggered reliable event */
- if (do_update) {
- input_report_switch(button->input, SW_LID, !state);
- input_sync(button->input);
- button->last_state = !!state;
- button->last_time = ktime_get();
- }
+ input_report_switch(button->input, SW_LID, !state);
+ input_sync(button->input);
+ button->last_state = !!state;
+ button->last_time = ktime_get();
+ button->last_is_bios = is_bios_event;

if (state)
pm_wakeup_hard_event(&device->dev);
@@ -444,6 +438,8 @@ static int acpi_button_resume(struct device *dev)
struct acpi_button *button = acpi_driver_data(device);

button->suspended = false;
+ /* ignore replay frequency check between suspend/resume */
+ button->last_is_bios = false;
if (button->type == ACPI_BUTTON_TYPE_LID)
acpi_lid_initialize_state(device);
return 0;
@@ -492,6 +488,8 @@ static int acpi_button_add(struct acpi_device *device)
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
button->last_state = !!acpi_lid_evaluate_state(device);
button->last_time = ktime_get();
+ /* ignore replay frequency check after boot */
+ button->last_is_bios = false;
} else {
printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
error = -ENODEV;
--
2.7.4
Lv Zheng
2017-05-31 09:41:50 UTC
Permalink
This patch adds a parameter to acpi_lid_notify_state() so that it can act
differently against BIOS notification and kernel faked events.

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 9ad8cdb..725a15a 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -138,7 +138,8 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
return lid_state ? 1 : 0;
}

-static int acpi_lid_notify_state(struct acpi_device *device, int state)
+static int acpi_lid_notify_state(struct acpi_device *device,
+ int state, bool is_bios_event)
{
struct acpi_button *button = acpi_driver_data(device);
int ret;
@@ -360,7 +361,8 @@ int acpi_lid_open(void)
}
EXPORT_SYMBOL(acpi_lid_open);

-static int acpi_lid_update_state(struct acpi_device *device)
+static int acpi_lid_update_state(struct acpi_device *device,
+ bool is_bios_event)
{
int state;

@@ -368,17 +370,17 @@ static int acpi_lid_update_state(struct acpi_device *device)
if (state < 0)
return state;

- return acpi_lid_notify_state(device, state);
+ return acpi_lid_notify_state(device, state, is_bios_event);
}

static void acpi_lid_initialize_state(struct acpi_device *device)
{
switch (lid_init_state) {
case ACPI_BUTTON_LID_INIT_OPEN:
- (void)acpi_lid_notify_state(device, 1);
+ (void)acpi_lid_notify_state(device, 1, false);
break;
case ACPI_BUTTON_LID_INIT_METHOD:
- (void)acpi_lid_update_state(device);
+ (void)acpi_lid_update_state(device, false);
break;
case ACPI_BUTTON_LID_INIT_IGNORE:
default:
@@ -398,7 +400,7 @@ static void acpi_button_notify(struct acpi_device *device, u32 event)
case ACPI_BUTTON_NOTIFY_STATUS:
input = button->input;
if (button->type == ACPI_BUTTON_TYPE_LID) {
- acpi_lid_update_state(device);
+ acpi_lid_update_state(device, true);
} else {
int keycode;
--
2.7.4
Lv Zheng
2017-06-07 09:43:28 UTC
Permalink
The following approach fixes button.lid_init_state=method mode for
systemd 233:
https://patchwork.kernel.org/patch/9756457/
https://patchwork.kernel.org/patch/9756467/
But it is not working well with old systemd 229. This solution tries to
make a more comfortable approach for systemd 229.

There are platform variations implementing ACPI lid device in different
way:
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.

[PATCH 1] tries to fix case 2,4, making them working with any systemd.
[PATCH 2] tries to fix case 5, making it working with systemd 233.
This is also a replacement of the following solution:
https://patchwork.kernel.org/patch/9760867/
It seems adding/removing input node and requesting systemd to
change again is unnecessary for such platforms, so this patch
simply converts "lid_unreliable" into
"button.lid_init_state=ignore".

This material is just sent to demonstrate solutions and issues, the
final solution is not determined yet. So marking them as RFC.

Lv Zheng (2):
ACPI: button: Fix issue that button notify cannot report stateful
SW_LID state
ACPI: button: Add a quirk mode for Surface Pro 1 like laptop

drivers/acpi/button.c | 188 ++++++++++++++++++++++++--------------------------
1 file changed, 89 insertions(+), 99 deletions(-)
--
2.7.4
Lv Zheng
2017-06-07 09:43:39 UTC
Permalink
There are platform variations implementing ACPI lid device in different
ways:
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.
Currently, only case 1,3 works fine with systemd 229.

Case 2,4 can be treated as an order issue. This patch first fixes this
issue by defer sending initial lid state 10 seconds later after resume,
which ensures acpi_ec_resume() is always invoked before
acpi_button_resume().

However we can see different problems due to systemd bugs:
systemd won't suspend right after seeing "close" event, it has a timeout,
within the timeout, user may opens lid again. But even lid
firmware/driver properly deliver this "open" to user space, when the
timeout tickes, systemd still suspends the platform.
Then user has to close/open again to wake the system up. Noticing that
the first close event will remain in firmware, after resume, user space
can still see a "close" followed by "open", and nothing can stop systemd
from suspending again.
This problem can only be fixed by continously updating lid state. Thus
this patch doesn't kill the timer after seeing the BIOS notification, but
continously sending _LID return value to the input layer for
button.lid_init_state=method mode.

The users can configure update interval via button.lid_update_interval.

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index e19f530..fd8eff6 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -28,6 +28,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/input.h>
+#include <linux/timer.h>
#include <linux/slab.h>
#include <linux/acpi.h>
#include <acpi/button.h>
@@ -79,6 +80,7 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
static int acpi_button_add(struct acpi_device *device);
static int acpi_button_remove(struct acpi_device *device);
static void acpi_button_notify(struct acpi_device *device, u32 event);
+static void acpi_lid_timeout(ulong arg);

#ifdef CONFIG_PM_SLEEP
static int acpi_button_suspend(struct device *dev);
@@ -104,6 +106,7 @@ static struct acpi_driver acpi_button_driver = {
struct acpi_button {
unsigned int type;
struct input_dev *input;
+ struct timer_list lid_timer;
char phys[32]; /* for input device */
unsigned long pushed;
int last_state;
@@ -119,6 +122,10 @@ static unsigned long lid_report_interval __read_mostly = 500;
module_param(lid_report_interval, ulong, 0644);
MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");

+static unsigned long lid_update_interval __read_mostly = 10 * MSEC_PER_SEC;
+module_param(lid_update_interval, ulong, 0644);
+MODULE_PARM_DESC(lid_update_interval, "Interval (ms) between lid state updates");
+
/* --------------------------------------------------------------------------
FS Interface (/proc)
-------------------------------------------------------------------------- */
@@ -371,17 +378,25 @@ static int acpi_lid_update_state(struct acpi_device *device)
return acpi_lid_notify_state(device, state);
}

-static void acpi_lid_initialize_state(struct acpi_device *device)
+static void acpi_lid_tick(struct acpi_device *device)
+{
+ struct acpi_button *button = acpi_driver_data(device);
+
+ mod_timer(&button->lid_timer,
+ jiffies + msecs_to_jiffies(lid_update_interval));
+}
+
+static void acpi_lid_timeout(ulong arg)
{
+ struct acpi_device *device = (struct acpi_device *)arg;
+
switch (lid_init_state) {
case ACPI_BUTTON_LID_INIT_OPEN:
(void)acpi_lid_notify_state(device, 1);
break;
case ACPI_BUTTON_LID_INIT_METHOD:
- (void)acpi_lid_update_state(device);
- break;
- case ACPI_BUTTON_LID_INIT_IGNORE:
- default:
+ acpi_lid_update_state(device);
+ acpi_lid_tick(device);
break;
}
}
@@ -432,6 +447,8 @@ static int acpi_button_suspend(struct device *dev)
struct acpi_device *device = to_acpi_device(dev);
struct acpi_button *button = acpi_driver_data(device);

+ if (button->type == ACPI_BUTTON_TYPE_LID)
+ del_timer(&button->lid_timer);
button->suspended = true;
return 0;
}
@@ -443,7 +460,7 @@ static int acpi_button_resume(struct device *dev)

button->suspended = false;
if (button->type == ACPI_BUTTON_TYPE_LID)
- acpi_lid_initialize_state(device);
+ acpi_lid_tick(device);
return 0;
}
#endif
@@ -467,6 +484,7 @@ static int acpi_button_add(struct acpi_device *device)
error = -ENOMEM;
goto err_free_button;
}
+ init_timer(&button->lid_timer);

name = acpi_device_name(device);
class = acpi_device_class(device);
@@ -490,6 +508,8 @@ static int acpi_button_add(struct acpi_device *device)
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
button->last_state = !!acpi_lid_evaluate_state(device);
button->last_time = ktime_get();
+ setup_timer(&button->lid_timer,
+ acpi_lid_timeout, (ulong)device);
} else {
printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
error = -ENODEV;
@@ -526,7 +546,7 @@ static int acpi_button_add(struct acpi_device *device)
if (error)
goto err_remove_fs;
if (button->type == ACPI_BUTTON_TYPE_LID) {
- acpi_lid_initialize_state(device);
+ acpi_lid_tick(device);
/*
* This assumes there's only one lid device, or if there are
* more we only care about the last one...
@@ -550,6 +570,8 @@ static int acpi_button_remove(struct acpi_device *device)
{
struct acpi_button *button = acpi_driver_data(device);

+ if (button->type == ACPI_BUTTON_TYPE_LID)
+ del_timer(&button->lid_timer);
acpi_button_remove_fs(device);
input_unregister_device(button->input);
kfree(button);
--
2.7.4
Lv Zheng
2017-06-07 09:43:45 UTC
Permalink
Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.

Such platforms cannot work well with systemd 229 in
button.lid_init_state=method mode, but button.lid_init_state=open
workaround is available for them to work with systemd 229 and they can work
perfectly with systemd 233 in button.lid_init_state=ignore mode.

This patch introduces a boot parameter to mark such platform lid device as
unreliable to replace old button.lid_init_state=ignore mode. So that users
can use this quirk to make such platforms working with systemd 233. Since
such platform only sends "close", old complicated "open" complement event
mechanism is replaced by a simpler one of always prepending "open" before
any events.

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 164 ++++++++++++++++++++------------------------------
1 file changed, 66 insertions(+), 98 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index fd8eff6..02b85c1 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -56,9 +56,8 @@
#define ACPI_BUTTON_DEVICE_NAME_LID "Lid Switch"
#define ACPI_BUTTON_TYPE_LID 0x05

-#define ACPI_BUTTON_LID_INIT_IGNORE 0x00
+#define ACPI_BUTTON_LID_INIT_METHOD 0x00
#define ACPI_BUTTON_LID_INIT_OPEN 0x01
-#define ACPI_BUTTON_LID_INIT_METHOD 0x02

#define _COMPONENT ACPI_BUTTON_COMPONENT
ACPI_MODULE_NAME("button");
@@ -109,23 +108,20 @@ struct acpi_button {
struct timer_list lid_timer;
char phys[32]; /* for input device */
unsigned long pushed;
- int last_state;
- ktime_t last_time;
bool suspended;
};

+static DEFINE_MUTEX(lid_device_lock);
static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
static struct acpi_device *lid_device;
static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;

-static unsigned long lid_report_interval __read_mostly = 500;
-module_param(lid_report_interval, ulong, 0644);
-MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");
-
static unsigned long lid_update_interval __read_mostly = 10 * MSEC_PER_SEC;
module_param(lid_update_interval, ulong, 0644);
MODULE_PARM_DESC(lid_update_interval, "Interval (ms) between lid state updates");

+static bool lid_unreliable __read_mostly = false;
+
/* --------------------------------------------------------------------------
FS Interface (/proc)
-------------------------------------------------------------------------- */
@@ -149,79 +145,12 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
{
struct acpi_button *button = acpi_driver_data(device);
int ret;
- ktime_t next_report;
- bool do_update;
-
- /*
- * In lid_init_state=ignore mode, if user opens/closes lid
- * frequently with "open" missing, and "last_time" is also updated
- * frequently, "close" cannot be delivered to the userspace.
- * So "last_time" is only updated after a timeout or an actual
- * switch.
- */
- if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
- button->last_state != !!state)
- do_update = true;
- else
- do_update = false;
-
- next_report = ktime_add(button->last_time,
- ms_to_ktime(lid_report_interval));
- if (button->last_state == !!state &&
- ktime_after(ktime_get(), next_report)) {
- /* Complain the buggy firmware */
- pr_warn_once("The lid device is not compliant to SW_LID.\n");

- /*
- * Send the unreliable complement switch event:
- *
- * On most platforms, the lid device is reliable. However
- * there are exceptions:
- * 1. Platforms returning initial lid state as "close" by
- * default after booting/resuming:
- * https://bugzilla.kernel.org/show_bug.cgi?id=89211
- * https://bugzilla.kernel.org/show_bug.cgi?id=106151
- * 2. Platforms never reporting "open" events:
- * https://bugzilla.kernel.org/show_bug.cgi?id=106941
- * On these buggy platforms, the usage model of the ACPI
- * lid device actually is:
- * 1. The initial returning value of _LID may not be
- * reliable.
- * 2. The open event may not be reliable.
- * 3. The close event is reliable.
- *
- * But SW_LID is typed as input switch event, the input
- * layer checks if the event is redundant. Hence if the
- * state is not switched, the userspace cannot see this
- * platform triggered reliable event. By inserting a
- * complement switch event, it then is guaranteed that the
- * platform triggered reliable one can always be seen by
- * the userspace.
- */
- if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
- do_update = true;
- /*
- * Do generate complement switch event for "close"
- * as "close" is reliable and wrong "open" won't
- * trigger unexpected behaviors.
- * Do not generate complement switch event for
- * "open" as "open" is not reliable and wrong
- * "close" will trigger unexpected behaviors.
- */
- if (!state) {
- input_report_switch(button->input,
- SW_LID, state);
- input_sync(button->input);
- }
- }
- }
- /* Send the platform triggered reliable event */
- if (do_update) {
- input_report_switch(button->input, SW_LID, !state);
- input_sync(button->input);
- button->last_state = !!state;
- button->last_time = ktime_get();
- }
+ if (lid_unreliable)
+ input_report_switch(button->input, SW_LID, 0);
+
+ input_report_switch(button->input, SW_LID, !state);
+ input_sync(button->input);

if (state)
pm_wakeup_event(&device->dev, 0);
@@ -382,22 +311,25 @@ static void acpi_lid_tick(struct acpi_device *device)
{
struct acpi_button *button = acpi_driver_data(device);

- mod_timer(&button->lid_timer,
- jiffies + msecs_to_jiffies(lid_update_interval));
+ if (!lid_unreliable)
+ mod_timer(&button->lid_timer,
+ jiffies + msecs_to_jiffies(lid_update_interval));
}

static void acpi_lid_timeout(ulong arg)
{
struct acpi_device *device = (struct acpi_device *)arg;

- switch (lid_init_state) {
- case ACPI_BUTTON_LID_INIT_OPEN:
- (void)acpi_lid_notify_state(device, 1);
- break;
- case ACPI_BUTTON_LID_INIT_METHOD:
- acpi_lid_update_state(device);
- acpi_lid_tick(device);
- break;
+ if (!lid_unreliable) {
+ switch (lid_init_state) {
+ case ACPI_BUTTON_LID_INIT_OPEN:
+ (void)acpi_lid_notify_state(device, 1);
+ break;
+ case ACPI_BUTTON_LID_INIT_METHOD:
+ acpi_lid_update_state(device);
+ acpi_lid_tick(device);
+ break;
+ }
}
}

@@ -506,8 +438,6 @@ static int acpi_button_add(struct acpi_device *device)
strcpy(name, ACPI_BUTTON_DEVICE_NAME_LID);
sprintf(class, "%s/%s",
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
- button->last_state = !!acpi_lid_evaluate_state(device);
- button->last_time = ktime_get();
setup_timer(&button->lid_timer,
acpi_lid_timeout, (ulong)device);
} else {
@@ -551,7 +481,10 @@ static int acpi_button_add(struct acpi_device *device)
* This assumes there's only one lid device, or if there are
* more we only care about the last one...
*/
+ mutex_lock(&lid_device_lock);
+ get_device(&device->dev);
lid_device = device;
+ mutex_unlock(&lid_device_lock);
}

printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
@@ -570,8 +503,15 @@ static int acpi_button_remove(struct acpi_device *device)
{
struct acpi_button *button = acpi_driver_data(device);

- if (button->type == ACPI_BUTTON_TYPE_LID)
+ if (button->type == ACPI_BUTTON_TYPE_LID) {
del_timer(&button->lid_timer);
+ mutex_lock(&lid_device_lock);
+ if (device == lid_device) {
+ put_device(&device->dev);
+ lid_device = NULL;
+ }
+ mutex_unlock(&lid_device_lock);
+ }
acpi_button_remove_fs(device);
input_unregister_device(button->input);
kfree(button);
@@ -588,9 +528,6 @@ static int param_set_lid_init_state(const char *val, struct kernel_param *kp)
} else if (!strncmp(val, "method", sizeof("method") - 1)) {
lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
pr_info("Notify initial lid state with _LID return value\n");
- } else if (!strncmp(val, "ignore", sizeof("ignore") - 1)) {
- lid_init_state = ACPI_BUTTON_LID_INIT_IGNORE;
- pr_info("Do not notify initial lid state\n");
} else
result = -EINVAL;
return result;
@@ -603,17 +540,48 @@ static int param_get_lid_init_state(char *buffer, struct kernel_param *kp)
return sprintf(buffer, "open");
case ACPI_BUTTON_LID_INIT_METHOD:
return sprintf(buffer, "method");
- case ACPI_BUTTON_LID_INIT_IGNORE:
- return sprintf(buffer, "ignore");
default:
return sprintf(buffer, "invalid");
}
return 0;
}

+static int param_set_lid_unreliable(const char *val, struct kernel_param *kp)
+{
+ int result = 0;
+ struct device *dev;
+ struct acpi_device *device;
+ struct acpi_button *button;
+
+ result = param_set_bool(val, kp);
+ if (result)
+ return result;
+
+ mutex_lock(&lid_device_lock);
+ if (lid_device) {
+ dev = get_device(&lid_device->dev);
+ mutex_unlock(&lid_device_lock);
+ device = to_acpi_device(dev);
+ button = acpi_driver_data(device);
+ if (lid_unreliable)
+ del_timer(&button->lid_timer);
+ else
+ acpi_lid_tick(device);
+ put_device(&device->dev);
+ mutex_lock(&lid_device_lock);
+ }
+ mutex_unlock(&lid_device_lock);
+ return result;
+}
+
module_param_call(lid_init_state,
param_set_lid_init_state, param_get_lid_init_state,
NULL, 0644);
MODULE_PARM_DESC(lid_init_state, "Behavior for reporting LID initial state");

+module_param_call(lid_unreliable,
+ param_set_lid_unreliable, param_get_bool,
+ &lid_unreliable, 0644);
+MODULE_PARM_DESC(lid_unreliable, "Mark lid device as unreliable");
+
module_acpi_driver(acpi_button_driver);
--
2.7.4
kernel test robot
2017-06-13 06:17:00 UTC
Permalink
FYI, we noticed the following commit:

commit: 4d0c35c1af080033a9c2b0ac5734ca78f9bc3c63 ("ACPI: button: Fix issue that button notify cannot report stateful SW_LID state")
url: https://github.com/0day-ci/linux/commits/Lv-Zheng/ACPI-button-Fix-button-lid_init_state-method-mode/20170608-210525


in testcase: netperf
with following parameters:

ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: TCP_CRR
cpufreq_governor: performance

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/


on test machine: 4 threads Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz with 4G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------------------------------------+----------+------------+
| | v4.9-rc8 | 4d0c35c1af |
+---------------------------------------------------------------------------+----------+------------+
| boot_successes | 1249 | 2 |
| boot_failures | 100 | 10 |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/irq/manage.c | 17 | |
| calltrace:SyS_write | 8 | |
| calltrace:init_netconsole | 9 | |
| calltrace:SyS_finit_module | 40 | 1 |
| WARNING:at_drivers/gpu/drm/i915/intel_display.c:#intel_modeset_init[i915] | 20 | 1 |
| calltrace:i915_init | 20 | 1 |
| drm:intel_set_cpu_fifo_underrun_reporting[i915]] | 1 | |
| drm:intel_cpu_fifo_underrun_irq_handler[i915]] | 1 | |
| drm:intel_set_pch_fifo_underrun_reporting[i915]] | 1 | |
| drm:intel_pch_fifo_underrun_irq_handler[i915]] | 1 | |
| WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup | 11 | |
| calltrace:parport_pc_init | 11 | |
| WARNING:at_lib/kobject.c:#kobject_add_internal | 11 | |
| invoked_oom-killer:gfp_mask=0x | 27 | 2 |
| Mem-Info | 27 | 2 |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 27 | 2 |
| BUG:kernel_hang_in_test_stage | 11 | |
| WARNING:at_net/mac80211/driver-ops.h:#ieee80211_set_default_key[mac80211] | 7 | |
| calltrace:SyS_sendmsg | 7 | |
| WARNING:at_net/mac80211/driver-ops.h:#ieee80211_key_replace[mac80211] | 7 | |
| BUG:kernel_hang_in_boot_stage | 2 | |
| BUG:kernel_in_stage | 11 | |
| BUG:scheduling_while_atomic | 0 | 8 |
| WARNING:at_kernel/time/timer.c:#call_timer_fn | 0 | 8 |
| BUG:sleeping_function_called_from_invalid_context_at_mm/slab.h | 0 | 8 |
| calltrace:intel_lid_notify | 0 | 8 |
| calltrace:SyS_sendto | 0 | 8 |
| calltrace:SyS_connect | 0 | 8 |
| calltrace:SyS_read | 0 | 4 |
| calltrace:smpboot_thread_fn | 0 | 8 |
| calltrace:SyS_recvfrom | 0 | 7 |
| calltrace:SyS_accept | 0 | 5 |
| BUG:unable_to_handle_kernel | 0 | 6 |
| Oops:#[##] | 0 | 6 |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0 | 6 |
| calltrace:SyS_socket | 0 | 3 |
| WARNING:at_net/sched/sch_generic.c:#dev_watchdog | 0 | 1 |
| BUG:soft_lockup-CPU##stuck_for#s | 0 | 2 |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 0 | 2 |
| WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule | 0 | 1 |
| INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 0 | 1 |
| RIP:smp_call_function_many | 0 | 1 |
| calltrace:SyS_seccomp | 0 | 1 |
| calltrace:SyS_nanosleep | 0 | 1 |
| calltrace:SyS_bind | 0 | 1 |
+---------------------------------------------------------------------------+----------+------------+



[ 16.440882] BUG: scheduling while atomic: swapper/2/0/0x00000102
[ 16.441400] Modules linked in:
[ 16.441750] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0-rc8-00001-g4d0c35c #1
[ 16.442346] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
[ 16.443007] ffff880112a837b0 ffffffff81477139 0000000000000000 ffff880112a993c0
[ 16.443830] ffff880112a837c0 ffffffff810a7d04 ffff880112a83818 ffffffff81962ed4
[ 16.444651] 0000000000000422 0000000000000008 ffff880112a993c0 ffff880112a837f8
[ 16.445473] Call Trace:
[ 16.446670] <IRQ>
[ 16.446857] [<ffffffff81477139>] dump_stack+0x63/0x8a
[ 16.447329] [<ffffffff810a7d04>] __schedule_bug+0x54/0x70
[ 16.447766] [<ffffffff81962ed4>] __schedule+0x554/0x6f0
[ 16.448191] [<ffffffff819630ad>] schedule+0x3d/0x90
[ 16.448594] [<ffffffff819666f6>] schedule_timeout+0x1d6/0x3f0
[ 16.449052] [<ffffffff810f0340>] ? del_timer_sync+0x50/0x50
[ 16.449501] [<ffffffff810c634f>] ? prepare_to_wait_event+0x7f/0x100
[ 16.449989] [<ffffffff81506972>] ec_guard+0x173/0x1b1
[ 16.450405] [<ffffffff810c63d0>] ? prepare_to_wait_event+0x100/0x100
[ 16.450897] [<ffffffff81507956>] acpi_ec_transaction+0x13f/0x2f2
[ 16.451369] [<ffffffff81507b56>] acpi_ec_read+0x4d/0x59
[ 16.451794] [<ffffffff81507cb9>] acpi_ec_space_handler+0xce/0x183
[ 16.452271] [<ffffffff814fdb35>] ? acpi_os_signal_semaphore+0x74/0x84
[ 16.452771] [<ffffffff81542030>] ? acpi_ut_release_mutex+0x11c/0x127
[ 16.453264] [<ffffffff8151b2b3>] acpi_ev_address_space_dispatch+0x2ce/0x33a
[ 16.453792] [<ffffffff81507beb>] ? ec_transaction+0x4f/0x4f
[ 16.454239] [<ffffffff81522ef6>] acpi_ex_access_region+0x414/0x4ca
[ 16.454719] [<ffffffff810a81da>] ? __might_sleep+0x4a/0x80
[ 16.455159] [<ffffffff81523386>] acpi_ex_field_datum_io+0x178/0x3fa
[ 16.455647] [<ffffffff8152392c>] acpi_ex_extract_from_field+0x151/0x2bb
[ 16.456157] [<ffffffff81522592>] acpi_ex_read_data_from_field+0x380/0x3d0
[ 16.456676] [<ffffffff815279f2>] acpi_ex_resolve_node_to_value+0x326/0x443
[ 16.457202] [<ffffffff81527e83>] acpi_ex_resolve_to_value+0x374/0x40e
[ 16.457700] [<ffffffff81515151>] acpi_ds_evaluate_name_path+0xa3/0x143
[ 16.458204] [<ffffffff815157c0>] acpi_ds_exec_end_op+0xd1/0x6c6
[ 16.458670] [<ffffffff815365c5>] acpi_ps_parse_loop+0x7d7/0x873
[ 16.459137] [<ffffffff8153f1f3>] ? acpi_ut_trace+0x26/0x66
[ 16.459579] [<ffffffff8153786c>] acpi_ps_parse_aml+0x1ac/0x4a1
[ 16.460041] [<ffffffff81538526>] acpi_ps_execute_method+0x1f4/0x2b6
[ 16.460529] [<ffffffff8152f567>] acpi_ns_evaluate+0x2ee/0x435
[ 16.460985] [<ffffffff81534209>] acpi_evaluate_object+0x178/0x38e
[ 16.461463] [<ffffffff814fe210>] acpi_evaluate_integer+0x44/0xca
[ 16.461934] [<ffffffff810aba12>] ? check_preempt_curr+0x52/0x90
[ 16.462403] [<ffffffff8154533d>] ? acpi_button_notify+0x133/0x133
[ 16.462880] [<ffffffff81544b04>] acpi_lid_evaluate_state+0x1c/0x33
[ 16.463401] [<ffffffff815451f9>] acpi_lid_update_state+0x16/0x27
[ 16.463875] [<ffffffff8154535d>] acpi_lid_timeout+0x20/0x37
[ 16.464320] [<ffffffff810f0385>] call_timer_fn+0x35/0x130
[ 16.464756] [<ffffffff810f0962>] run_timer_softirq+0x222/0x4e0
[ 16.465218] [<ffffffff810f8c01>] ? ktime_get+0x41/0xb0
[ 16.465640] [<ffffffff81053256>] ? lapic_next_deadline+0x26/0x30
[ 16.466112] [<ffffffff8196af74>] __do_softirq+0x104/0x2ab
[ 16.466550] [<ffffffff810869c1>] irq_exit+0xf1/0x100
[ 16.466959] [<ffffffff8196ad72>] smp_apic_timer_interrupt+0x42/0x50
[ 16.467446] [<ffffffff81969f2c>] apic_timer_interrupt+0x8c/0xa0
[ 16.467908] <EOI>
[ 16.468089] [<ffffffff817d3b12>] ? cpuidle_enter_state+0x122/0x2e0
[ 16.468669] [<ffffffff817d3d07>] cpuidle_enter+0x17/0x20
[ 16.469099] [<ffffffff810c6983>] call_cpuidle+0x23/0x40
[ 16.469525] [<ffffffff810c6bb4>] cpu_startup_entry+0x114/0x200
[ 16.469988] [<ffffffff81051d57>] start_secondary+0x107/0x130
[ 16.470515] ACPI : button: The lid device is not compliant to SW_LID.
[ 16.471013] ------------[ cut here ]------------
[ 16.471397] WARNING: CPU: 2 PID: 0 at kernel/time/timer.c:1315 call_timer_fn+0x128/0x130
[ 16.472117] timer: acpi_lid_timeout+0x0/0x37 preempt leak: 00000101 -> 00000000
[ 16.472699] Modules linked in:
[ 16.473046] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 4.9.0-rc8-00001-g4d0c35c #1
[ 16.473713] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
[ 16.474368] ffff880112a83de8 ffffffff81477139 ffff880112a83e38 0000000000000000
[ 16.475191] ffff880112a83e28 ffffffff810803db 0000052300000246 0000000000000001
[ 16.476013] ffff8800b4f0b190 0000000000000101 ffffffff8154533d ffff88010d4e5800
[ 16.476835] Call Trace:
[ 16.477082] <IRQ>
[ 16.477262] [<ffffffff81477139>] dump_stack+0x63/0x8a
[ 16.477734] [<ffffffff810803db>] __warn+0xcb/0xf0
[ 16.478128] [<ffffffff8154533d>] ? acpi_button_notify+0x133/0x133
[ 16.478603] [<ffffffff8108044f>] warn_slowpath_fmt+0x4f/0x60
[ 16.479054] [<ffffffff81544a59>] ? acpi_lid_tick+0x2f/0x32
[ 16.479494] [<ffffffff8154533d>] ? acpi_button_notify+0x133/0x133
[ 16.479970] [<ffffffff810f0478>] call_timer_fn+0x128/0x130
[ 16.480410] [<ffffffff810f0962>] run_timer_softirq+0x222/0x4e0
[ 16.480871] [<ffffffff810f8c01>] ? ktime_get+0x41/0xb0
[ 16.481290] [<ffffffff81053256>] ? lapic_next_deadline+0x26/0x30
[ 16.481763] [<ffffffff8196af74>] __do_softirq+0x104/0x2ab
[ 16.482199] [<ffffffff810869c1>] irq_exit+0xf1/0x100
[ 16.482607] [<ffffffff8196ad72>] smp_apic_timer_interrupt+0x42/0x50
[ 16.483093] [<ffffffff81969f2c>] apic_timer_interrupt+0x8c/0xa0
[ 16.483555] <EOI>
[ 16.483734] [<ffffffff817d3b12>] ? cpuidle_enter_state+0x122/0x2e0
[ 16.484313] [<ffffffff817d3d07>] cpuidle_enter+0x17/0x20
[ 16.484745] [<ffffffff810c6983>] call_cpuidle+0x23/0x40
[ 16.485169] [<ffffffff810c6bb4>] cpu_startup_entry+0x114/0x200
[ 16.485630] [<ffffffff81051d57>] start_secondary+0x107/0x130
[ 16.486081] ---[ end trace 1414925b5a440cdd ]---


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Xiaolong
Lv Zheng
2017-06-21 08:55:09 UTC
Permalink
There are platform variations implementing ACPI lid device in different
ways:
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.
Currently, all cases work fine with systemd 233, but only case 1,2 work
fine with systemd 229.

Case 3,4 can be treated as an event missing issue:
After seeing a LID "close" event, systemd 229 will wait several seconds
(HoldoffTimeoutSec) before suspending the platform. Thus on case 4
platforms, if users close lid, and re-open it during the
HoldoffTimeoutSec period, there is still no "open" events seen by the
userspace. Thus systemd still considers the last state as "close" and
suspends the platform after the HoldoffTimeoutSec times out.

Note that not only systemd 229, desktop managers (ex.,
gnome-settings-daemon) also suffer from this issue.

This patch tries to fix this issue by periodically sending _LID return
value to userspace, which ensures to trigger a SW_LID event when the
underlying hardware state has changed. As adding a periodic timer is not a
power friendly way, this patch prepares an option for users to enable on
failure platforms for old userspace programs.

Users can configure update interval via button.lid_update_interval.
This should be configured to a smaller value than HoldoffTimeoutSec in
/etc/systemd/logind.conf.

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 67a0d78..a8b119e 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -126,6 +126,14 @@ static unsigned long lid_notify_timeout __read_mostly = 10;
module_param(lid_notify_timeout, ulong, 0644);
MODULE_PARM_DESC(lid_notify_timeout, "Timeout (s) before receiving lid notification");

+static bool lid_periodic_update __read_mostly = false;
+module_param(lid_periodic_update, bool, 0644);
+MODULE_PARM_DESC(lid_periodic_update, "Periodically sending lid state updates");
+
+static unsigned long lid_update_interval __read_mostly = 1 * MSEC_PER_SEC;
+module_param(lid_update_interval, ulong, 0644);
+MODULE_PARM_DESC(lid_update_interval, "Interval (ms) between lid periodic updates");
+
/* --------------------------------------------------------------------------
FS Interface (/proc)
-------------------------------------------------------------------------- */
@@ -395,6 +403,8 @@ static void acpi_lid_initialize_state(struct acpi_device *device)
break;
case ACPI_BUTTON_LID_INIT_METHOD:
(void)acpi_lid_update_state(device);
+ if (lid_periodic_update)
+ acpi_lid_start_timer(device, lid_update_interval);
break;
case ACPI_BUTTON_LID_INIT_IGNORE:
default:
@@ -560,8 +570,11 @@ static int acpi_button_add(struct acpi_device *device)
* more we only care about the last one...
*/
lid_device = device;
- acpi_lid_start_timer(device,
- lid_notify_timeout * MSEC_PER_SEC);
+ if (lid_periodic_update)
+ acpi_lid_initialize_state(device);
+ else
+ acpi_lid_start_timer(device,
+ lid_notify_timeout * MSEC_PER_SEC);
}

printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
--
2.7.4
Benjamin Tissoires
2017-06-23 14:03:08 UTC
Permalink
Post by Lv Zheng
There are platform variations implementing ACPI lid device in different
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.
Currently, all cases work fine with systemd 233, but only case 1,2 work
fine with systemd 229.
After seeing a LID "close" event, systemd 229 will wait several seconds
(HoldoffTimeoutSec) before suspending the platform. Thus on case 4
platforms, if users close lid, and re-open it during the
HoldoffTimeoutSec period, there is still no "open" events seen by the
userspace. Thus systemd still considers the last state as "close" and
suspends the platform after the HoldoffTimeoutSec times out.
Note that not only systemd 229, desktop managers (ex.,
gnome-settings-daemon) also suffer from this issue.
This patch tries to fix this issue by periodically sending _LID return
value to userspace, which ensures to trigger a SW_LID event when the
underlying hardware state has changed. As adding a periodic timer is not a
power friendly way, this patch prepares an option for users to enable on
failure platforms for old userspace programs.
Users can configure update interval via button.lid_update_interval.
This should be configured to a smaller value than HoldoffTimeoutSec in
/etc/systemd/logind.conf.
---
NACK: what is the point if you just want to provide an event to user
space? We already explained to you that the events do not matter, only
the state is. If user space has an issue with a state not being in sync,
it's a user space bug and it has to be fixed in user space.

This patch could be useful if the purpose was to monitor the changes
while the state is unreliable (in case 4):
- if _LID returns a different value than right after suspend, it is
expected to be something reliable, and so we could then re-create the
input node and present it to user space
- if _LID still returns the same value after a somewhat long delay
(10-20 seconds), then we can consider the value to be correct and
re-create the input node.

Polling for polling and hoping user space will have more chances of
seeing an event for an EV_SW is moot.

Cheers,
Benjamin
Post by Lv Zheng
drivers/acpi/button.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 67a0d78..a8b119e 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -126,6 +126,14 @@ static unsigned long lid_notify_timeout __read_mostly = 10;
module_param(lid_notify_timeout, ulong, 0644);
MODULE_PARM_DESC(lid_notify_timeout, "Timeout (s) before receiving lid notification");
+static bool lid_periodic_update __read_mostly = false;
+module_param(lid_periodic_update, bool, 0644);
+MODULE_PARM_DESC(lid_periodic_update, "Periodically sending lid state updates");
+
+static unsigned long lid_update_interval __read_mostly = 1 * MSEC_PER_SEC;
+module_param(lid_update_interval, ulong, 0644);
+MODULE_PARM_DESC(lid_update_interval, "Interval (ms) between lid periodic updates");
+
/* --------------------------------------------------------------------------
FS Interface (/proc)
-------------------------------------------------------------------------- */
@@ -395,6 +403,8 @@ static void acpi_lid_initialize_state(struct acpi_device *device)
break;
(void)acpi_lid_update_state(device);
+ if (lid_periodic_update)
+ acpi_lid_start_timer(device, lid_update_interval);
break;
@@ -560,8 +570,11 @@ static int acpi_button_add(struct acpi_device *device)
* more we only care about the last one...
*/
lid_device = device;
- acpi_lid_start_timer(device,
- lid_notify_timeout * MSEC_PER_SEC);
+ if (lid_periodic_update)
+ acpi_lid_initialize_state(device);
+ else
+ acpi_lid_start_timer(device,
+ lid_notify_timeout * MSEC_PER_SEC);
}
printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
--
2.7.4
Lv Zheng
2017-06-21 08:55:03 UTC
Permalink
There are platform variations implementing ACPI lid device in different
ways:
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.
Currently, all cases work find with systemd 233, but only case 1 works fine
with systemd 229.

Case 2,4 can be treated as an order issue:
If the button driver always evaluates _LID after seeing a LID
notification, there shouldn't be such a problem.

Note that this order issue cannot be fixed by sorting OS drivers' resume
code:
1. When acpi.ec_freeze_events=Y, the order depends on whether PNP0C0D(LID)
or PNP0C09(EC) appears first in the namespace (probe order).
2. Even when acpi.ec_freeze_events=N, which forces OS to handle EC events
as early as possible, the order issue can still be reproduced due to
platform delays (reproduce ratio is lower than case 1).
3. Some platform simply has a very big delay for LID open events.

This patch tries to fix this issue for systemd 229 by defer sending initial
lid state 10 seconds later after resume, which ensures EC events can always
arrive before button driver evaluates _LID. It finally only fixes case 2
platforms as fixing case 4 platforms needs additional efforts (see known
issue below).

The users can configure wait timeout via button.lid_notify_timeout.

Known issu:
1. systemd/logind 229 still mis-bahaves with case 3,4 platforms
After seeing a LID "close" event, systemd 229 will wait several seconds
(HoldoffTimeoutSec) before suspending the platform. Thus on case 4
platforms, if users close lid, and re-open it during the
HoldoffTimeoutSec period, there is still no "open" events seen by the
userspace. Thus systemd still considers the last state as "close" and
suspends the platform after the HoldoffTimeoutSec times out.

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 36 ++++++++++++++++++++++++++++++++++--
1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index e19f530..67a0d78 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -28,6 +28,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/input.h>
+#include <linux/timer.h>
#include <linux/slab.h>
#include <linux/acpi.h>
#include <acpi/button.h>
@@ -79,6 +80,7 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
static int acpi_button_add(struct acpi_device *device);
static int acpi_button_remove(struct acpi_device *device);
static void acpi_button_notify(struct acpi_device *device, u32 event);
+static void acpi_lid_timeout(ulong arg);

#ifdef CONFIG_PM_SLEEP
static int acpi_button_suspend(struct device *dev);
@@ -104,6 +106,7 @@ static struct acpi_driver acpi_button_driver = {
struct acpi_button {
unsigned int type;
struct input_dev *input;
+ struct timer_list lid_timer;
char phys[32]; /* for input device */
unsigned long pushed;
int last_state;
@@ -119,6 +122,10 @@ static unsigned long lid_report_interval __read_mostly = 500;
module_param(lid_report_interval, ulong, 0644);
MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");

+static unsigned long lid_notify_timeout __read_mostly = 10;
+module_param(lid_notify_timeout, ulong, 0644);
+MODULE_PARM_DESC(lid_notify_timeout, "Timeout (s) before receiving lid notification");
+
/* --------------------------------------------------------------------------
FS Interface (/proc)
-------------------------------------------------------------------------- */
@@ -371,6 +378,15 @@ static int acpi_lid_update_state(struct acpi_device *device)
return acpi_lid_notify_state(device, state);
}

+static void acpi_lid_start_timer(struct acpi_device *device,
+ unsigned long msecs)
+{
+ struct acpi_button *button = acpi_driver_data(device);
+
+ mod_timer(&button->lid_timer,
+ jiffies + msecs_to_jiffies(msecs));
+}
+
static void acpi_lid_initialize_state(struct acpi_device *device)
{
switch (lid_init_state) {
@@ -386,6 +402,13 @@ static void acpi_lid_initialize_state(struct acpi_device *device)
}
}

+static void acpi_lid_timeout(ulong arg)
+{
+ struct acpi_device *device = (struct acpi_device *)arg;
+
+ acpi_lid_initialize_state(device);
+}
+
static void acpi_button_notify(struct acpi_device *device, u32 event)
{
struct acpi_button *button = acpi_driver_data(device);
@@ -432,6 +455,8 @@ static int acpi_button_suspend(struct device *dev)
struct acpi_device *device = to_acpi_device(dev);
struct acpi_button *button = acpi_driver_data(device);

+ if (button->type == ACPI_BUTTON_TYPE_LID)
+ del_timer(&button->lid_timer);
button->suspended = true;
return 0;
}
@@ -443,7 +468,8 @@ static int acpi_button_resume(struct device *dev)

button->suspended = false;
if (button->type == ACPI_BUTTON_TYPE_LID)
- acpi_lid_initialize_state(device);
+ acpi_lid_start_timer(device,
+ lid_notify_timeout * MSEC_PER_SEC);
return 0;
}
#endif
@@ -490,6 +516,9 @@ static int acpi_button_add(struct acpi_device *device)
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
button->last_state = !!acpi_lid_evaluate_state(device);
button->last_time = ktime_get();
+ init_timer(&button->lid_timer);
+ setup_timer(&button->lid_timer,
+ acpi_lid_timeout, (ulong)device);
} else {
printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
error = -ENODEV;
@@ -526,12 +555,13 @@ static int acpi_button_add(struct acpi_device *device)
if (error)
goto err_remove_fs;
if (button->type == ACPI_BUTTON_TYPE_LID) {
- acpi_lid_initialize_state(device);
/*
* This assumes there's only one lid device, or if there are
* more we only care about the last one...
*/
lid_device = device;
+ acpi_lid_start_timer(device,
+ lid_notify_timeout * MSEC_PER_SEC);
}

printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
@@ -551,6 +581,8 @@ static int acpi_button_remove(struct acpi_device *device)
struct acpi_button *button = acpi_driver_data(device);

acpi_button_remove_fs(device);
+ if (button->type == ACPI_BUTTON_TYPE_LID)
+ del_timer_sync(&button->lid_timer);
input_unregister_device(button->input);
kfree(button);
return 0;
--
2.7.4
Benjamin Tissoires
2017-06-23 14:02:48 UTC
Permalink
Post by Lv Zheng
There are platform variations implementing ACPI lid device in different
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.
Currently, all cases work find with systemd 233, but only case 1 works fine
with systemd 229.
If the button driver always evaluates _LID after seeing a LID
notification, there shouldn't be such a problem.
Note that this order issue cannot be fixed by sorting OS drivers' resume
1. When acpi.ec_freeze_events=Y, the order depends on whether PNP0C0D(LID)
or PNP0C09(EC) appears first in the namespace (probe order).
2. Even when acpi.ec_freeze_events=N, which forces OS to handle EC events
as early as possible, the order issue can still be reproduced due to
platform delays (reproduce ratio is lower than case 1).
3. Some platform simply has a very big delay for LID open events.
This patch tries to fix this issue for systemd 229 by defer sending initial
lid state 10 seconds later after resume, which ensures EC events can always
arrive before button driver evaluates _LID. It finally only fixes case 2
platforms as fixing case 4 platforms needs additional efforts (see known
issue below).
The users can configure wait timeout via button.lid_notify_timeout.
1. systemd/logind 229 still mis-bahaves with case 3,4 platforms
After seeing a LID "close" event, systemd 229 will wait several seconds
(HoldoffTimeoutSec) before suspending the platform. Thus on case 4
platforms, if users close lid, and re-open it during the
HoldoffTimeoutSec period, there is still no "open" events seen by the
userspace. Thus systemd still considers the last state as "close" and
suspends the platform after the HoldoffTimeoutSec times out.
---
NACK on this one (at least in this current form): You are presenting a
device to user space with an unknown state.
If you want to delay the query of the state, you also need to delay the
call of input_register_device().

Cheers,
Benjamin
Post by Lv Zheng
drivers/acpi/button.c | 36 ++++++++++++++++++++++++++++++++++--
1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index e19f530..67a0d78 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -28,6 +28,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/input.h>
+#include <linux/timer.h>
#include <linux/slab.h>
#include <linux/acpi.h>
#include <acpi/button.h>
@@ -79,6 +80,7 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
static int acpi_button_add(struct acpi_device *device);
static int acpi_button_remove(struct acpi_device *device);
static void acpi_button_notify(struct acpi_device *device, u32 event);
+static void acpi_lid_timeout(ulong arg);
#ifdef CONFIG_PM_SLEEP
static int acpi_button_suspend(struct device *dev);
@@ -104,6 +106,7 @@ static struct acpi_driver acpi_button_driver = {
struct acpi_button {
unsigned int type;
struct input_dev *input;
+ struct timer_list lid_timer;
char phys[32]; /* for input device */
unsigned long pushed;
int last_state;
@@ -119,6 +122,10 @@ static unsigned long lid_report_interval __read_mostly = 500;
module_param(lid_report_interval, ulong, 0644);
MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");
+static unsigned long lid_notify_timeout __read_mostly = 10;
+module_param(lid_notify_timeout, ulong, 0644);
+MODULE_PARM_DESC(lid_notify_timeout, "Timeout (s) before receiving lid notification");
+
/* --------------------------------------------------------------------------
FS Interface (/proc)
-------------------------------------------------------------------------- */
@@ -371,6 +378,15 @@ static int acpi_lid_update_state(struct acpi_device *device)
return acpi_lid_notify_state(device, state);
}
+static void acpi_lid_start_timer(struct acpi_device *device,
+ unsigned long msecs)
+{
+ struct acpi_button *button = acpi_driver_data(device);
+
+ mod_timer(&button->lid_timer,
+ jiffies + msecs_to_jiffies(msecs));
+}
+
static void acpi_lid_initialize_state(struct acpi_device *device)
{
switch (lid_init_state) {
@@ -386,6 +402,13 @@ static void acpi_lid_initialize_state(struct acpi_device *device)
}
}
+static void acpi_lid_timeout(ulong arg)
+{
+ struct acpi_device *device = (struct acpi_device *)arg;
+
+ acpi_lid_initialize_state(device);
+}
+
static void acpi_button_notify(struct acpi_device *device, u32 event)
{
struct acpi_button *button = acpi_driver_data(device);
@@ -432,6 +455,8 @@ static int acpi_button_suspend(struct device *dev)
struct acpi_device *device = to_acpi_device(dev);
struct acpi_button *button = acpi_driver_data(device);
+ if (button->type == ACPI_BUTTON_TYPE_LID)
+ del_timer(&button->lid_timer);
button->suspended = true;
return 0;
}
@@ -443,7 +468,8 @@ static int acpi_button_resume(struct device *dev)
button->suspended = false;
if (button->type == ACPI_BUTTON_TYPE_LID)
- acpi_lid_initialize_state(device);
+ acpi_lid_start_timer(device,
+ lid_notify_timeout * MSEC_PER_SEC);
return 0;
}
#endif
@@ -490,6 +516,9 @@ static int acpi_button_add(struct acpi_device *device)
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
button->last_state = !!acpi_lid_evaluate_state(device);
button->last_time = ktime_get();
+ init_timer(&button->lid_timer);
+ setup_timer(&button->lid_timer,
+ acpi_lid_timeout, (ulong)device);
} else {
printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
error = -ENODEV;
@@ -526,12 +555,13 @@ static int acpi_button_add(struct acpi_device *device)
if (error)
goto err_remove_fs;
if (button->type == ACPI_BUTTON_TYPE_LID) {
- acpi_lid_initialize_state(device);
/*
* This assumes there's only one lid device, or if there are
* more we only care about the last one...
*/
lid_device = device;
+ acpi_lid_start_timer(device,
+ lid_notify_timeout * MSEC_PER_SEC);
}
printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
@@ -551,6 +581,8 @@ static int acpi_button_remove(struct acpi_device *device)
struct acpi_button *button = acpi_driver_data(device);
acpi_button_remove_fs(device);
+ if (button->type == ACPI_BUTTON_TYPE_LID)
+ del_timer_sync(&button->lid_timer);
input_unregister_device(button->input);
kfree(button);
return 0;
--
2.7.4
Lv Zheng
2017-06-21 08:55:15 UTC
Permalink
There are platform variations implementing ACPI lid device in different
ways:
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.
Currently, all cases work find with systemd 233, but only case 1,2,3,4 work
fine with systemd 229.

After fixing all the other issues for old userspace programs, case 5 is the
only case that the exported input node is still not fully compliant to
SW_LID ABI and thus needs quirks or ABI changes.

The original button.lid_init_state=ignore ABI change solution is now too
complicated if its purpose is to only solve this final incompliant use
case. This patch re-works it by unconditionally prepending "open"
complement events.

Cc: <systemd-***@lists.freedesktop.org>
Cc: Benjamin Tissoires <***@redhat.com>
Cc: Peter Hutterer <***@who-t.net>
Signed-off-by: Lv Zheng <***@intel.com>
---
drivers/acpi/button.c | 85 +++------------------------------------------------
1 file changed, 5 insertions(+), 80 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index a8b119e..1256a8c 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -109,8 +109,6 @@ struct acpi_button {
struct timer_list lid_timer;
char phys[32]; /* for input device */
unsigned long pushed;
- int last_state;
- ktime_t last_time;
bool suspended;
};

@@ -118,10 +116,6 @@ static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
static struct acpi_device *lid_device;
static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;

-static unsigned long lid_report_interval __read_mostly = 500;
-module_param(lid_report_interval, ulong, 0644);
-MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");
-
static unsigned long lid_notify_timeout __read_mostly = 10;
module_param(lid_notify_timeout, ulong, 0644);
MODULE_PARM_DESC(lid_notify_timeout, "Timeout (s) before receiving lid notification");
@@ -157,79 +151,12 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
{
struct acpi_button *button = acpi_driver_data(device);
int ret;
- ktime_t next_report;
- bool do_update;
-
- /*
- * In lid_init_state=ignore mode, if user opens/closes lid
- * frequently with "open" missing, and "last_time" is also updated
- * frequently, "close" cannot be delivered to the userspace.
- * So "last_time" is only updated after a timeout or an actual
- * switch.
- */
- if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
- button->last_state != !!state)
- do_update = true;
- else
- do_update = false;
-
- next_report = ktime_add(button->last_time,
- ms_to_ktime(lid_report_interval));
- if (button->last_state == !!state &&
- ktime_after(ktime_get(), next_report)) {
- /* Complain the buggy firmware */
- pr_warn_once("The lid device is not compliant to SW_LID.\n");

- /*
- * Send the unreliable complement switch event:
- *
- * On most platforms, the lid device is reliable. However
- * there are exceptions:
- * 1. Platforms returning initial lid state as "close" by
- * default after booting/resuming:
- * https://bugzilla.kernel.org/show_bug.cgi?id=89211
- * https://bugzilla.kernel.org/show_bug.cgi?id=106151
- * 2. Platforms never reporting "open" events:
- * https://bugzilla.kernel.org/show_bug.cgi?id=106941
- * On these buggy platforms, the usage model of the ACPI
- * lid device actually is:
- * 1. The initial returning value of _LID may not be
- * reliable.
- * 2. The open event may not be reliable.
- * 3. The close event is reliable.
- *
- * But SW_LID is typed as input switch event, the input
- * layer checks if the event is redundant. Hence if the
- * state is not switched, the userspace cannot see this
- * platform triggered reliable event. By inserting a
- * complement switch event, it then is guaranteed that the
- * platform triggered reliable one can always be seen by
- * the userspace.
- */
- if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
- do_update = true;
- /*
- * Do generate complement switch event for "close"
- * as "close" is reliable and wrong "open" won't
- * trigger unexpected behaviors.
- * Do not generate complement switch event for
- * "open" as "open" is not reliable and wrong
- * "close" will trigger unexpected behaviors.
- */
- if (!state) {
- input_report_switch(button->input,
- SW_LID, state);
- input_sync(button->input);
- }
- }
- }
- /* Send the platform triggered reliable event */
- if (do_update) {
- input_report_switch(button->input, SW_LID, !state);
- input_sync(button->input);
- button->last_state = !!state;
- button->last_time = ktime_get();
- }
+ if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE)
+ input_report_switch(button->input, SW_LID, 0);
+
+ input_report_switch(button->input, SW_LID, !state);
+ input_sync(button->input);

if (state)
pm_wakeup_event(&device->dev, 0);
@@ -524,8 +451,6 @@ static int acpi_button_add(struct acpi_device *device)
strcpy(name, ACPI_BUTTON_DEVICE_NAME_LID);
sprintf(class, "%s/%s",
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
- button->last_state = !!acpi_lid_evaluate_state(device);
- button->last_time = ktime_get();
init_timer(&button->lid_timer);
setup_timer(&button->lid_timer,
acpi_lid_timeout, (ulong)device);
--
2.7.4
Benjamin Tissoires
2017-06-23 14:03:31 UTC
Permalink
Post by Lv Zheng
There are platform variations implementing ACPI lid device in different
1. Some platforms send "open" events to OS and the events arrive before
button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, and the update events arrive before
button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
update the cached _LID return value, but the update events arrive after
button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
"close", ex., Surface Pro 1.
Currently, all cases work find with systemd 233, but only case 1,2,3,4 work
fine with systemd 229.
After fixing all the other issues for old userspace programs, case 5 is the
only case that the exported input node is still not fully compliant to
SW_LID ABI and thus needs quirks or ABI changes.
The original button.lid_init_state=ignore ABI change solution is now too
complicated if its purpose is to only solve this final incompliant use
case. This patch re-works it by unconditionally prepending "open"
complement events.
Ouch. I thought the purpose of "ignore" was to not query the state on
boot/resume and only rely on acpi notification to switch the states.

I like the patch, the commit message is IMO too far from what it
actually does:
- it removes the caching update of _LID switch
- it removes the filtering of too close notifications.
Post by Lv Zheng
---
drivers/acpi/button.c | 85 +++------------------------------------------------
1 file changed, 5 insertions(+), 80 deletions(-)
diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index a8b119e..1256a8c 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -109,8 +109,6 @@ struct acpi_button {
struct timer_list lid_timer;
char phys[32]; /* for input device */
unsigned long pushed;
- int last_state;
- ktime_t last_time;
bool suspended;
};
@@ -118,10 +116,6 @@ static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
static struct acpi_device *lid_device;
static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
-static unsigned long lid_report_interval __read_mostly = 500;
-module_param(lid_report_interval, ulong, 0644);
-MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");
-
You shouldn't do that. It is kernel ABI, and if you remove the module,
people that tuned their kernel with this parameter will have the module
failing to load. You should keep this around for a few kernel releases,
and mark its usage deprecated by issuing a warning in the dmesg.

Cheers,
Benjamin
Post by Lv Zheng
static unsigned long lid_notify_timeout __read_mostly = 10;
module_param(lid_notify_timeout, ulong, 0644);
MODULE_PARM_DESC(lid_notify_timeout, "Timeout (s) before receiving lid notification");
@@ -157,79 +151,12 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
{
struct acpi_button *button = acpi_driver_data(device);
int ret;
- ktime_t next_report;
- bool do_update;
-
- /*
- * In lid_init_state=ignore mode, if user opens/closes lid
- * frequently with "open" missing, and "last_time" is also updated
- * frequently, "close" cannot be delivered to the userspace.
- * So "last_time" is only updated after a timeout or an actual
- * switch.
- */
- if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
- button->last_state != !!state)
- do_update = true;
- else
- do_update = false;
-
- next_report = ktime_add(button->last_time,
- ms_to_ktime(lid_report_interval));
- if (button->last_state == !!state &&
- ktime_after(ktime_get(), next_report)) {
- /* Complain the buggy firmware */
- pr_warn_once("The lid device is not compliant to SW_LID.\n");
- /*
- *
- * On most platforms, the lid device is reliable. However
- * 1. Platforms returning initial lid state as "close" by
- * https://bugzilla.kernel.org/show_bug.cgi?id=89211
- * https://bugzilla.kernel.org/show_bug.cgi?id=106151
- * https://bugzilla.kernel.org/show_bug.cgi?id=106941
- * On these buggy platforms, the usage model of the ACPI
- * 1. The initial returning value of _LID may not be
- * reliable.
- * 2. The open event may not be reliable.
- * 3. The close event is reliable.
- *
- * But SW_LID is typed as input switch event, the input
- * layer checks if the event is redundant. Hence if the
- * state is not switched, the userspace cannot see this
- * platform triggered reliable event. By inserting a
- * complement switch event, it then is guaranteed that the
- * platform triggered reliable one can always be seen by
- * the userspace.
- */
- if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
- do_update = true;
- /*
- * Do generate complement switch event for "close"
- * as "close" is reliable and wrong "open" won't
- * trigger unexpected behaviors.
- * Do not generate complement switch event for
- * "open" as "open" is not reliable and wrong
- * "close" will trigger unexpected behaviors.
- */
- if (!state) {
- input_report_switch(button->input,
- SW_LID, state);
- input_sync(button->input);
- }
- }
- }
- /* Send the platform triggered reliable event */
- if (do_update) {
- input_report_switch(button->input, SW_LID, !state);
- input_sync(button->input);
- button->last_state = !!state;
- button->last_time = ktime_get();
- }
+ if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE)
+ input_report_switch(button->input, SW_LID, 0);
+
+ input_report_switch(button->input, SW_LID, !state);
+ input_sync(button->input);
if (state)
pm_wakeup_event(&device->dev, 0);
@@ -524,8 +451,6 @@ static int acpi_button_add(struct acpi_device *device)
strcpy(name, ACPI_BUTTON_DEVICE_NAME_LID);
sprintf(class, "%s/%s",
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
- button->last_state = !!acpi_lid_evaluate_state(device);
- button->last_time = ktime_get();
init_timer(&button->lid_timer);
setup_timer(&button->lid_timer,
acpi_lid_timeout, (ulong)device);
--
2.7.4
kernel test robot
2017-06-30 01:34:02 UTC
Permalink
FYI, we noticed the following commit:

commit: c4a9ff748c1ca2fe81e012ac32e1daa70702778b ("ACPI: button: Add a workaround to fix an order issue for old userspace")
url: https://github.com/0day-ci/linux/commits/Lv-Zheng/ACPI-button-Fix-button-lid_init_state-method-mode/20170622-143854


in testcase: netperf
with following parameters:

ip: ipv4
runtime: 300s
nr_threads: 1
cluster: cs-localhost
test: TCP_CRR
cpufreq_governor: performance

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/


on test machine: 4 threads Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz with 4G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-----------------------------------------------------------------------------------------------------+-----------+------------+
| | v4.12-rc6 | c4a9ff748c |
+-----------------------------------------------------------------------------------------------------+-----------+------------+
| boot_successes | 9404 | 3 |
| boot_failures | 2560 | 9 |
| ACPI_Error:Field[CPB3]at#exceeds_Buffer[NULL]size#(bits)(#/dsopcode-#) | 365 | |
| ACPI_Error:Method_parse/execution_failed[~_SB._OSC](Node#),AE_AML_BUFFER_LIMIT(#/psparse-#) | 365 | |
| cpu_clock_throttled | 362 | |
| Kernel_panic-not_syncing:Fatal_hardware_error | 2 | |
| Mem-Info | 560 | |
| invoked_oom-killer:gfp_mask=0x | 528 | |
| Out_of_memory:Kill_process | 80 | |
| BUG:Bad_page_state_in_process | 25 | |
| BUG:Bad_rss-counter_state_mm:#idx:#val | 18 | |
| BUG:Bad_page_map_in_process | 7 | |
| page_allocation_failure:order:#,mode:#(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK),nodemask=(null) | 3 | |
| WARNING:at_lib/list_debug.c:#__list_add_valid | 2 | |
| INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 2 | |
| BUG:Bad_page_state:#messages_suppressed | 1 | |
| WARNING:at_drivers/gpu/drm/drm_irq.c:#drm_wait_one_vblank[drm] | 19 | |
| WARNING:at_lib/list_debug.c:#__list_del_entry_valid | 2 | |
| general_protection_fault:#[##] | 2 | |
| Kernel_panic-not_syncing:Fatal_exception | 3 | |
| page_allocation_failure:order:#,mode:#(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO),nodemask=(null) | 18 | |
| WARNING:at_drivers/gpu/drm/drm_vblank.c:#drm_wait_one_vblank[drm] | 5 | |
| page_allocation_failure:order:#,mode:#(GFP_KERNEL|__GFP_NORETRY),nodemask=(null) | 11 | |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c | 4 | |
| BUG:non-zero_nr_ptes_on_freeing_mm | 13 | |
| ACPI_Error:Field[CPB3]at_bit_offset/length#exceeds_size_of_target_Buffer(#bits)(#/dsopcode-#) | 116 | |
| ACPI_Error:Method_parse/execution_failed~_SB._OSC,AE_AML_BUFFER_LIMIT(#/psparse-#) | 116 | |
| ACPI_Error:Method_parse/execution_failed~_SB.PC00.IOTR._CRS,AE_AML_NO_RESOURCE_END_TAG(#/psparse-#) | 129 | |
| ACPI_Error:Method_execution_failed~_SB.PC00.IOTR._CRS,AE_AML_NO_RESOURCE_END_TAG(#/uteval-#) | 129 | |
| Kernel_panic-not_syncing:Hard_LOCKUP | 103 | |
| Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= | 20 | |
| ACPI_Error:[~_SB_.PCI0.XHC_.RHUB.HS11]Namespace_lookup_failure,AE_NOT_FOUND(#/dswload-#) | 15 | |
| ACPI_Error:#table_load_failures,#successful(#/tbxfload-#) | 15 | |
| WARNING:at_drivers/gpu/drm/i915/intel_display.c:#intel_modeset_init[i915] | 259 | 9 |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c | 3 | 1 |
| INFO:creating/lkp/benchmarks/ltp/output_directory | 351 | |
| INFO:creating/lkp/benchmarks/ltp/results_directory | 351 | |
| INFO:ltp-pan_reported_some_tests_FAIL | 88 | |
| INFO:ltp-pan_reported_all_tests_PASS | 208 | |
| page_allocation_failure:order:#,mode:#(GFP_HIGHUSER_MOVABLE|__GFP_ZERO),nodemask=(null) | 2 | |
| WARNING:at_kernel/events/core.c:#add_event_to_ctx | 5 | |
| WARNING:at_kernel/events/core.c:#event_sched_out | 5 | |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/irq/manage.c | 1 | |
| calltrace:init_netconsole | 1 | |
| calltrace:SyS_finit_module | 2 | |
| calltrace:SyS_write | 1 | |
| BUG:unable_to_handle_kernel | 1 | |
| Oops:#[##] | 1 | |
| WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup | 1 | |
| calltrace:parport_pc_init | 1 | |
| WARNING:at_lib/kobject.c:#kobject_add_internal | 1 | |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 447 | |
| BUG:kernel_hang_in_test_stage | 263 | |
| BUG:soft_lockup-CPU##stuck_for#s | 1 | |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 1 | |
| BUG:kernel_reboot-without-warning_in_boot_stage | 1 | |
| BUG:kernel_reboot-without-warning_in_test_stage | 12 | |
| WARNING:at_kernel/workqueue.c:#process_one_work | 1 | |
| BUG:scheduling_while_atomic | 0 | 8 |
| WARNING:at_kernel/time/timer.c:#call_timer_fn | 0 | 8 |
+-----------------------------------------------------------------------------------------------------+-----------+------------+



[ 14.828015] BUG: scheduling while atomic: swapper/0/0/0x00000102
[ 14.828552] Modules linked in:
[ 14.828846] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc6-00001-gc4a9ff74 #1
[ 14.829454] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
[ 14.830113] Call Trace:
[ 14.830363] <IRQ>
[ 14.831578] dump_stack+0x63/0x86
[ 14.831887] __schedule_bug+0x54/0x70
[ 14.832212] __schedule+0x633/0x850
[ 14.832530] schedule+0x3d/0x90
[ 14.832826] schedule_timeout+0x174/0x310
[ 14.833176] ? call_timer_fn+0x150/0x150
[ 14.833518] ec_guard+0x1c6/0x1d0
[ 14.833821] ? ec_guard+0x1c6/0x1d0
[ 14.834135] ? remove_wait_queue+0x70/0x70
[ 14.834487] acpi_ec_transaction+0x181/0x3b0
[ 14.834850] acpi_ec_read+0x42/0x50
[ 14.835166] acpi_ec_space_handler+0xa5/0x240
[ 14.835534] ? __update_load_avg_se+0x15b/0x180
[ 14.835955] ? acpi_os_signal_semaphore+0x4d/0x90
[ 14.836345] ? acpi_ut_release_mutex+0x11c/0x127
[ 14.836727] acpi_ev_address_space_dispatch+0x2ce/0x33a
[ 14.837147] ? ec_transaction+0x50/0x50
[ 14.837483] acpi_ex_access_region+0x414/0x4ca
[ 14.837858] ? __might_sleep+0x4a/0x80
[ 14.838187] acpi_ex_field_datum_io+0x178/0x3fa
[ 14.838564] acpi_ex_extract_from_field+0x151/0x2bb
[ 14.838964] acpi_ex_read_data_from_field+0x380/0x3d0
[ 14.839374] acpi_ex_resolve_node_to_value+0x326/0x443
[ 14.839790] acpi_ex_resolve_to_value+0x374/0x40e
[ 14.840179] acpi_ds_evaluate_name_path+0xa3/0x143
[ 14.840573] acpi_ds_exec_end_op+0xd1/0x6c6
[ 14.840930] acpi_ps_parse_loop+0x819/0x8b5
[ 14.841287] ? acpi_ut_trace+0x26/0x66
[ 14.841617] acpi_ps_parse_aml+0x1ac/0x4a1
[ 14.841968] acpi_ps_execute_method+0x1f4/0x2b6
[ 14.842349] acpi_ns_evaluate+0x2ee/0x435
[ 14.842696] acpi_evaluate_object+0x178/0x38e
[ 14.843065] ? acpi_lid_update_state+0x50/0x50
[ 14.843439] acpi_evaluate_integer+0x3e/0xd0
[ 14.843802] acpi_lid_update_state+0x27/0x50
[ 14.844164] acpi_lid_timeout+0x1d/0x30
[ 14.844501] call_timer_fn+0x35/0x150
[ 14.844828] run_timer_softirq+0x1e8/0x460
[ 14.845180] ? ktime_get+0x41/0xa0
[ 14.845490] ? lapic_next_deadline+0x26/0x30
[ 14.845851] ? clockevents_program_event+0x7a/0xf0
[ 14.846246] __do_softirq+0x104/0x2ab
[ 14.846572] irq_exit+0xf1/0x100
[ 14.846872] smp_apic_timer_interrupt+0x3d/0x50
[ 14.847249] apic_timer_interrupt+0x93/0xa0
[ 14.847608] RIP: 0010:cpuidle_enter_state+0x121/0x2d0
[ 14.848013] RSP: 0018:ffffffff81e03df0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[ 14.848618] RAX: 0000000000000000 RBX: 0000000000000005 RCX: 000000000000001f
[ 14.849151] RDX: 0000000373cf2cc5 RSI: ffff88011f2195d8 RDI: 0000000000000000
[ 14.849682] RBP: ffffffff81e03e28 R08: 00000000000f41c9 R09: 0000000000000018
[ 14.850214] R10: ffffffff81e03dc0 R11: 000000000007cf4d R12: ffff88011f225800
[ 14.850747] R13: ffffffff81fa9c58 R14: 0000000373cf2cc5 R15: ffffffff81fa9c40
[ 14.851281] </IRQ>
[ 14.851516] cpuidle_enter+0x17/0x20
[ 14.851837] call_cpuidle+0x23/0x40
[ 14.852152] do_idle+0x189/0x1e0
[ 14.852451] cpu_startup_entry+0x1d/0x20
[ 14.852791] rest_init+0x85/0x90
[ 14.853093] start_kernel+0x3f8/0x405
[ 14.853419] ? early_idt_handler_array+0x120/0x120
[ 14.853812] x86_64_start_reservations+0x2f/0x31
[ 14.854195] x86_64_start_kernel+0x143/0x152
[ 14.854557] secondary_startup_64+0x9f/0x9f
[ 14.854990] ACPI : button: The lid device is not compliant to SW_LID.
[ 14.855491] timer: acpi_lid_timeout+0x0/0x30 preempt leak: 00000101 -> 00000000
[ 14.856086] ------------[ cut here ]------------
[ 14.856473] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1275 call_timer_fn+0x13c/0x150
[ 14.857188] Modules linked in:
[ 14.857479] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.12.0-rc6-00001-gc4a9ff74 #1
[ 14.858158] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
[ 14.858818] task: ffffffff81e104c0 task.stack: ffffffff81e00000
[ 14.859279] RIP: 0010:call_timer_fn+0x13c/0x150
[ 14.859655] RSP: 0018:ffff88011f203e98 EFLAGS: 00010292
[ 14.860073] RAX: 0000000000000043 RBX: 0000000000000001 RCX: ffffffff81e61488
[ 14.860607] RDX: 0000000000000001 RSI: 0000000000000092 RDI: 0000000000000246
[ 14.861138] RBP: ffff88011f203ec0 R08: 0000000000000000 R09: 000000000000035e
[ 14.861672] R10: ffffea0004714b40 R11: 000000008237d701 R12: 0000000000000101
[ 14.862203] R13: ffff8800c7535990 R14: ffffffff815734e0 R15: ffff880101cb6800
[ 14.862735] FS: 0000000000000000(0000) GS:ffff88011f200000(0000) knlGS:0000000000000000
[ 14.863371] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.863819] CR2: 00007fc0dcc49267 CR3: 000000011de09000 CR4: 00000000001406f0
[ 14.864354] Call Trace:
[ 14.864605] <IRQ>
[ 14.864831] run_timer_softirq+0x1e8/0x460
[ 14.865185] ? ktime_get+0x41/0xa0
[ 14.865493] ? lapic_next_deadline+0x26/0x30
[ 14.865854] ? clockevents_program_event+0x7a/0xf0
[ 14.866248] __do_softirq+0x104/0x2ab
[ 14.866573] irq_exit+0xf1/0x100
[ 14.866874] smp_apic_timer_interrupt+0x3d/0x50
[ 14.867252] apic_timer_interrupt+0x93/0xa0
[ 14.867609] RIP: 0010:cpuidle_enter_state+0x121/0x2d0
[ 14.868017] RSP: 0018:ffffffff81e03df0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[ 14.868619] RAX: 0000000000000000 RBX: 0000000000000005 RCX: 000000000000001f
[ 14.869151] RDX: 0000000373cf2cc5 RSI: ffff88011f2195d8 RDI: 0000000000000000
[ 14.869682] RBP: ffffffff81e03e28 R08: 00000000000f41c9 R09: 0000000000000018
[ 14.870216] R10: ffffffff81e03dc0 R11: 000000000007cf4d R12: ffff88011f225800
[ 14.870747] R13: ffffffff81fa9c58 R14: 0000000373cf2cc5 R15: ffffffff81fa9c40
[ 14.871278] </IRQ>
[ 14.871513] cpuidle_enter+0x17/0x20
[ 14.871831] call_cpuidle+0x23/0x40
[ 14.872143] do_idle+0x189/0x1e0
[ 14.872443] cpu_startup_entry+0x1d/0x20
[ 14.872783] rest_init+0x85/0x90
[ 14.873083] start_kernel+0x3f8/0x405
[ 14.873411] ? early_idt_handler_array+0x120/0x120
[ 14.873807] x86_64_start_reservations+0x2f/0x31
[ 14.874192] x86_64_start_kernel+0x143/0x152
[ 14.874554] secondary_startup_64+0x9f/0x9f
[ 14.874910] Code: 03 48 85 c0 75 eb 65 ff 0d e2 9f f1 7e e9 0c ff ff ff 44 89 e2 4c 89 f6 48 c7 c7 08 5d ca 81 c6 05 d1 24 e2 00 01 e8 81 66 0a 00 <0f> ff e9 16 ff ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f
[ 14.876305] ---[ end trace dfc99d55faf65cc8 ]---


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Xiaolong

Loading...