Forked from Brainiarc7/ fix-intel_wifi_aer-avell_g1513_fire_v3
Created
July 31, 2020 19:06
-
-
Save awerlang/b107c057a3ceed3c975ea4bd8bae2e66 to your computer and use it in GitHub Desktop.
Revisions
-
flisboac revised this gist
May 7, 2017 . 1 changed file with 3 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,3 +7,6 @@ Type=oneshot # Change your device and vendor (or bus/slot/function accordingly) ExecStart=/usr/bin/setpci -v -d 8086:a114 CAP_EXP+0x8.w=0xe RemainAfterExit=yes [Install] WantedBy=network.target -
flisboac revised this gist
May 7, 2017 . 2 changed files with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1 @@ silly gist hack, why do we need you? :( File renamed without changes. -
flisboac created this gist
May 7, 2017 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,213 @@ # How to use Drop the `.service` file into `/etc/systemd/system/`, and then activate the script via `systemctl`: ```shell # systemctl daemon-reload # systemctl enable fix-intel_wifi_aer-avell_g1513_fire_v3.service # systemctl start fix-intel_wifi_aer-avell_g1513_fire_v3.service ``` This will effectively disable the "corrected" severity logging for the device, and save you loads of (logging) disk space. :) # Reasoning Sorry for the poor explanation, future self. I'm kinda tired right now. I don't even know if all of this is correct. :( When AER becomes too active in logging errors, it's generally something to do with buggy hardware or drivers. What most people recommend is to disable AER via a kernel parameter such as `pci=noaer`. If you know that the affected device is fine, and that the device's driver indeed has a bug that's still not fixed but won't affect proper usage, you can just disable AER for specific severity levels by setting the flags directly into the device via `setpci`, instead of disabling AER globally. For more info on `setpci`, please [see its docs](http://linuxcommand.org/man_pages/setpci8.html). AER (Advanced Error Reporting) is a PCIe capability. Linux adds support for it through a kernel module that is started sometime during `systemd-modules-load.service`'s execution. The AER driver initializes reporting for PCIe devices at startup, so it's important that we only reset the flags AFTER systemd's module loading service. According to the AER module's [source code](http://elixir.free-electrons.com/linux/latest/source/drivers/pci/pcie/aer/aerdrv_core.c#L41), the four severity levels (Corrected, Error, Fatal and Undefined) are always enabled when AER is enabled for a device: ```c // From `/usr/include/uapi/linux/pci_regs.h` #define PCI_EXP_DEVCTL 8 /* Device Control */ #define PCI_EXP_DEVCTL_CERE 0x0001 /* Correctable Error Reporting En. */ #define PCI_EXP_DEVCTL_NFERE 0x0002 /* Non-Fatal Error Reporting Enable */ #define PCI_EXP_DEVCTL_FERE 0x0004 /* Fatal Error Reporting Enable */ #define PCI_EXP_DEVCTL_URRE 0x0008 /* Unsupported Request Reporting En. */ // From `source/drivers/pci/pcie/aer/aerdrv_core.c` #define PCI_EXP_AER_FLAGS (PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \ PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE) int pci_enable_pcie_error_reporting(struct pci_dev *dev) { if (pcie_aer_get_firmware_first(dev)) return -EIO; if (!dev->aer_cap) return -EIO; return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS); } ``` Inspecting the kernel's source code some more, one can find that `PCI_EXP_DEVCTL` is an offset on the device's `dev->pcie_cap` PCIe capability flags, and that is itself yet another offset on the device's starting memory location. If you follow the implementation of `pcie_capability_set_word` and its dependencies (function calls), you end up in `pcie_capability_write_dword`: ```c // From `source/drivers/pci/access.c` int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val) { if (pos & 3) return -EINVAL; if (!pcie_capability_reg_implemented(dev, pos)) return 0; return pci_write_config_dword(dev, pci_pcie_cap(dev) + pos, val); } // From `/usr/include/linux/pci.h` static inline int pcie_capability_set_word(struct pci_dev *dev, int pos, u16 set) { return pcie_capability_clear_and_set_word(dev, pos, 0, set); } static inline int pci_pcie_cap(struct pci_dev *dev) { return dev->pcie_cap; } ``` Depending on the machine's setup, `setpci` may list the register name `CAP_EXP` as available through `setpci --dumpregs`. This register refers to the `dev->pcie_cap` offset. To identify how AER is configured, one needs the device/vendor or bus/slot/function combination for the affected device. AER's logged messages already have this information. Below is an example, from where we can take two different identifiers for the device: `8086:a114` (device/vendor ID) and `0000:00:1c.4` (domain/bus/slot/function). ```text # dmesg | tail -n 4 [ 4455.385233] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4 [ 4455.385242] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID) [ 4455.385250] pcieport 0000:00:1c.4: device [8086:a114] error status/mask=00000001/00002000 [ 4455.385254] pcieport 0000:00:1c.4: [ 0] Receiver Error (First) ``` To check which is the affected device, see `lshw` or `lspci`: ```text [flisboac@sonic ~]$ sudo lspci -v -s 00:1c.4 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 124 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: None Memory behind bridge: df200000-df2fffff [size=1M] Prefetchable memory behind bridge: None Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Device 1d05:1021 Capabilities: [a0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Access Control Services Capabilities: [220] #19 Kernel driver in use: pcieport Kernel modules: shpchp ``` In this case, the error may refer to a device attached to a PCIe port. One can check which device is attached to said port with `lshw`: ```text # lshw -numeric sonic description: Notebook product: 1513 (To be filled by O.E.M.) vendor: Avell High Performance version: To be filled by O.E.M. serial: To be filled by O.E.M. width: 4294967295 bits capabilities: smbios-3.0 dmi-3.0 smp vsyscall32 configuration: boot=normal chassis=notebook family=To be filled by O.E.M. sku=To be filled by O.E.M. uuid=00020003-0004-0005-0006-000700080009 *-core description: Motherboard physical id: 0 version: 0.1 serial: To be filled by O.E.M. slot: To be filled by O.E.M. (... lshw is so verbose ...) *-pci description: Host bridge product: Skylake Host Bridge/DRAM Registers [8086:1910] vendor: Intel Corporation [8086] physical id: 100 bus info: pci@0000:00:00.0 version: 07 width: 32 bits clock: 33MHz configuration: driver=skl_uncore resources: irq:0 (... lshw is so verbose ...) *-pci:2 description: PCI bridge product: Sunrise Point-H PCI Express Root Port #5 [8086:A114] vendor: Intel Corporation [8086] physical id: 1c.4 bus info: pci@0000:00:1c.4 version: f1 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:124 memory:df200000-df2fffff *-network description: Wireless interface product: Wireless 7265 [8086:95A] vendor: Intel Corporation [8086] physical id: 0 bus info: pci@0000:03:00.0 logical name: wlp3s0 version: 48 serial: 64:80:99:f3:9d:d7 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless configuration: broadcast=yes driver=iwlwifi driverversion=4.10.13-1-ARCH firmware=17.459231.0 ip=192.168.1.26 latency=0 link=yes multicast=yes wireless=IEEE 802.11 resources: irq:137 memory:df200000-df201fff ``` Summarizing, `CAP_EXP` is the base regitry, and we make some kind of pointer arithmetic with it. We offset `CAP_EXP` by `PCI_EXP_DEVCTL`, and write the proper flags to it as a single word. Just remember that `PCI_EXP_*` is defined as decimals, while `setpci` only accepts hexadecimals (have them the hexadecimal prefix `0x` or not), so some base conversion may be needed -- although that's not the case for `PCI_EXP_DEVCTL`. So, to read the current configuration: ```text [flisboac@sonic ~]$ sudo setpci -v -d 8086:a114 CAP_EXP+0x8.w 0000:00:1c.4 (cap 10 @40) @48 = 000f ``` `000f` tells us that all AER severity flags are set. The Corrected severity is bit 0 in that word, so we just need to set the new value to `000e` to disable only the Corrected severity reporting: ```text [flisboac@sonic ~]$ sudo setpci -v -d 8086:a114 CAP_EXP+0x8.w=0x0e 0000:00:1c.4 (cap 10 @40) @48 000e ``` And that's it! This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,9 @@ [Unit] Description=Fix for AER's excessive logging for Intel Wireless (Avell G1513 Fire V3) After=systemd-modules-load.service [Service] Type=oneshot # Change your device and vendor (or bus/slot/function accordingly) ExecStart=/usr/bin/setpci -v -d 8086:a114 CAP_EXP+0x8.w=0xe RemainAfterExit=yes