Intel 82574L 千兆网卡频繁掉线修复

ESXI中安装OpenWRT,两个Intel 82574L千兆网卡,eth0做LAN,eth1做WAN,eth1不管直通或者非直通给OpenWRT,都是用一段时间就会断网,短则几天,长则一个多月。断网后无法重新拨号,只能重启OpenWRT才能解决。有可能没有报错,也有可能报Reset adapter unexpectedly或者Detected Hardware Unit Hang。通过搜索,大致确定原因是省电模式ASPM出现问题。试过换OpenWRT镜像也没有效果。内核日志:

[292169.214098] e1000e 0000:03:00.0 eth1: Reset adapter unexpectedly
[292172.699452] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[292177.118943] e1000e 0000:03:00.0 eth1: Detected Hardware Unit Hang:
[292177.118943]   TDH                  <0>
[292177.118943]   TDT                  <8>
[292177.118943]   next_to_use          <8>
[292177.118943]   next_to_clean        <0>
[292177.118943] buffer_info[next_to_clean]:
[292177.118943]   time_stamp           <104595cea>
[292177.118943]   next_to_watch        <0>
[292177.118943]   jiffies              <104596138>
[292177.118943]   next_to_watch.status <0>
[292177.118943] MAC Status             <80080783>
[292177.118943] PHY Status             <796d>
[292177.118943] PHY 1000BASE-T Status  <3c00>
[292177.118943] PHY Extended Status    <3000>
[292177.118943] PCI Status             <10>
[292181.119093] e1000e 0000:03:00.0 eth1: Detecte

网卡信息:

root@OpenWrt:~# lspci | grep Ethernet
02:00.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

root@OpenWrt:~# ethtool -i eth1
driver: e1000e
version: 3.2.6-k
firmware-version: 0.5-0
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

升级驱动

可以看到现在的驱动版本是3.2.6-k,Intel官网上的最新驱动是3.8.4(https://www.intel.com/content/www/us/en/download/14611/15817/intel-network-adapter-driver-for-pcie-intel-gigabit-ethernet-network-connections-under-linux.html?),不过升级驱动需要自己编译OpenWRT,我使用的是eSir编译好的,自己编译需要的学习成本比较高。这边有在Ubuntu下更换驱动成功解决的案例(https://gist.github.com/amit177/63c86ee05110091f6fdda4c87a4209d0),而且他的原始驱动版本跟我的一致。不过这边又有人也是使用跟我一致的驱动版本出现一样的问题,升级驱动后,还是没有解决(https://www.codenong.com/cs110631505/)

ethtool -K eth1 tx off rx off

看到有人通过关闭网卡offload解决了问题(https://www.codenong.com/cs110631505/、https://blog.csdn.net/sxyllxy/article/details/110631505、https://gaomf.cn/2019/07/28/PVE_OpenWRT_Network_Broken/、https://www.dazhuanlan.com/clumsy_girl/topics/1507825),遂尝试,然后并没有效果

root@OpenWrt:~# ethtool --show-offload eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: on
    tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

root@OpenWrt:~# ethtool -K eth1 tx off rx off
Actual changes:
rx-checksumming: off
tx-checksumming: off
    tx-checksum-ip-generic: off
tcp-segmentation-offload: off
    tx-tcp-segmentation: off [requested on]
    tx-tcp-mangleid-segmentation: off [requested on]
    tx-tcp6-segmentation: off [requested on]

root@OpenWrt:~# ethtool --show-offload eth1
Features for eth1:
rx-checksumming: off
tx-checksumming: off
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: off
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
    tx-tcp-segmentation: off [requested on]
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: off [requested on]
    tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

ethtool -K eth1 gso off gro off tso off

同样是关闭offload,通过这个命令逐个关闭(https://serverfault.com/a/616623、https://web.archive.org/web/20160205153351/http://ehc.ac:80/p/e1000/bugs/378/、https://bbs.archlinux.org/viewtopic.php?id=162841、https://forum.proxmox.com/threads/eno1-detected-hardware-unit-hang.57025/、https://itniels.com/2019/10/28/proxmox-5x-e1000-driver-hang-fix/、https://bbs.ikuai8.com/forum.php?mod=viewthread&tid=106407&ordertype=2、https://www.right.com.cn/forum/thread-4066580-1-1.html),遂尝试,然而并没有效果

root@OpenWrt:~# ethtool --show-offload eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: on
    tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

root@OpenWrt:~# ethtool -K eth1 gso off gro off tso off

root@OpenWrt:~# ethtool --show-offload eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
    tx-tcp-segmentation: off
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: off
    tx-tcp6-segmentation: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

修复EEPROM尝试1

这个方法首先是在这里看到的(https://forum.openwrt.org/t/network-issues-on-new-openwrt-install-on-x86/74678/9)。在翻看Intel官网下载的驱动的README中,也看到Known Issues/Troubleshooting中,有一节叫做82573(V/L/E) TX Unit Hang Messages,看症状很像,网卡型号也很接近,原文如下:

82573(V/L/E) TX Unit Hang Messages
Several adapters with the 82573 chipset display “TX unit hang” messages during
normal operation with the e1000edriver. The issue appears both with TSO enabled
and disabled and is caused by a power management function that is enabled in
the EEPROM. Early releases of the chipsets to vendors had the EEPROM bit that
enabled the feature. After the issue was discovered newer adapters were
released with the feature disabled in the EEPROM.
If you encounter the problem in an adapter, and the chipset is an 82573-based
one, you can verify that your adapter needs the fix by using ethtool:
# ethtool -e eth0
Offset Values
—— ——
0x0000 00 12 34 56 fe dc 30 0d 46 f7 f4 00 ff ff ff ff
0x0010 ff ff ff ff 6b 02 8c 10 d9 15 8c 10 86 80 de 83
^^
The value at offset 0x001e (de) has bit 0 unset. This enables the problematic
power saving feature. In this case, the EEPROM needs to read “df” at offset
0x001e.
A one-time EEPROM fix is available as a shell script. This script will verify
that the adapter is applicable to the fix and if the fix is needed or not. If
the fix is required, it applies the change to the EEPROM and updates the
checksum. The user must reboot the system after applying the fix if changes
were made to the EEPROM.
Example output of the script:
bash fixeep-82573-dspd.sh eth0
eth0: is a “82573E Gigabit Ethernet Controller”
This fixup is applicable to your hardware executing command:
ethtool -E eth0 magic 0x109a8086 offset 0x1e value 0xdf
Change made. You MUST reboot your machine before changes take effect!
The script can be downloaded at
http://e1000.sourceforge.net/files/fixeep-82573-dspd.sh.

于是把http://e1000.sourceforge.net/files/fixeep-82573-dspd.sh这个地址的sh下载下来看代码,发现有通过设备号判断是否是82573(V/L/E),不是的话不给刷,是的话,通过ethtool -E命令修改EEPROM,代码如下:

#!/bin/bash

if [ -z "$1" ]; then
	echo "Usage: $0 \<interface\>"
	echo "       i.e. $0 eth0"
	exit 1
fi

if ! ifconfig $1 > /dev/null; then
	exit 1
fi

dev=$(ethtool -e $1 | grep 0x0010 | awk '{print "0x"$13$12$15$14}')

case $dev in
	0x108b8086)
		echo "$1: is a \"82573V Gigabit Ethernet Controller\""
		;;
	0x108c8086)
		echo "$1: is a \"82573E Gigabit Ethernet Controller\""
		;;
	0x109a8086)
		echo "$1: is a \"82573L Gigabit Ethernet Controller\""
		;;
	*)
		echo "No appropriate hardware found for this fixup"
		exit 1
		;;
esac

echo "This fixup is applicable to your hardware"

var=$(ethtool -e $1 | grep 0x0010 | awk '{print $16}')
new=$(echo ${var:0:1}`echo ${var:1} | tr '02468ace' '13579bdf'`)

if [ ${var:0:1}${var:1} == $new ]; then
	echo "Your eeprom is up to date, no changes were made"
	exit 2
fi

echo "executing command: ethtool -E $1 magic $dev offset 0x1e value 0x$new"
ethtool -E $1 magic $dev offset 0x1e value 0x$new

echo "Change made. You *MUST* reboot your machine before changes take effect!"

原理是取EEPROM中0x1e位置的字节,最右边的bit从0改成1,知道了原理我就可以跳过设备型号验证,自己用命令行尝试。

这里用到的命令行是(https://blog.csdn.net/yiyeshuanglinzui/article/details/98584028):

ethtool -e|–eeprom-dump devname [raw on|off] [offset N] [length N]
ethtool -E|–change-eeprom devname [magic N] [offset N] [length N] [value N]

其中magic的参数由DeviceID + VendorID组成,82574L的DeviceID为10d3,Intel的VendorID为8086,所以magic的参数为0x10d38086。

root@OpenWrt:~# ethtool -e eth1
Offset        Values
------        ------
0x0000:        00 e8 4c 68 7d c4 ff ff ff ff 50 00 ff ff ff ff
0x0010:        ff ff ff ff 6b 02 40 6c 62 14 d3 10 ff ff d8 83
0x0020:        00 00 01 20 74 7e ff ff 00 00 c8 00 00 00 04 27
0x0030:        c9 6c 50 21 3e 07 0b 45 84 2d 40 00 00 f0 06 07
0x0040:        00 60 80 00 04 0f ff 7f 01 4d ec 92 5c fc 83 f0
0x0050:        20 00 83 00 a0 00 1f 7d 61 19 83 01 50 00 ff ff
0x0060:        00 01 00 40 1c 12 07 40 ff ff ff ff ff ff ff ff
0x0070:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff 88 df
0x0080:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0100:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0110:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0120:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0130:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0140:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0150:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0160:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0170:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0180:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0190:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0200:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0210:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0220:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0230:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0240:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0250:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0260:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0270:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0280:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0290:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0300:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0310:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0320:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0330:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0340:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0350:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0360:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0370:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0380:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0390:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
root@OpenWrt:~# ethtool -e eth1 | grep 0x0010 | awk '{print "0x"$13$12$15$14}'
0x10d3ffff
root@OpenWrt:~# ethtool -e eth1 | grep 0x0010 | awk '{print $16}'
d8
root@OpenWrt:~# echo 8 | tr '02468ace' '13579bdf'
9
root@OpenWrt:~# ethtool -E eth1 magic 0x10d38086 offset 0x1e value 0xd9 length 1
root@OpenWrt:~# ethtool -e eth1
Offset        Values
------        ------
0x0000:        00 e8 4c 68 7d c4 ff ff ff ff 50 00 ff ff ff ff
0x0010:        ff ff ff ff 6b 02 40 6c 62 14 d3 10 ff ff d9 83
0x0020:        00 00 01 20 74 7e ff ff 00 00 c8 00 00 00 04 27
0x0030:        c9 6c 50 21 3e 07 0b 45 84 2d 40 00 00 f0 06 07
0x0040:        00 60 80 00 04 0f ff 7f 01 4d ec 92 5c fc 83 f0
0x0050:        20 00 83 00 a0 00 1f 7d 61 19 83 01 50 00 ff ff
0x0060:        00 01 00 40 1c 12 07 40 ff ff ff ff ff ff ff ff
0x0070:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff 87 df
0x0080:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0100:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0110:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0120:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0130:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0140:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0150:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0160:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0170:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0180:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0190:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0200:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0210:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0220:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0230:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0240:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0250:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0260:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0270:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0280:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0290:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0300:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0310:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0320:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0330:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0340:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0350:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0360:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0370:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0380:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0390:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

可见原来0x1e位置的值为d8,现在改成d9。由于eth1是直通给OpenWRT,所以改完EEPROM后,我直接重启了OpenWRT,可是几分钟后,竟然直接掉线了,赶紧刷回d8,继续重启,然后又正常了。看来这个方法只适用于82573。

刷的过程中有遇到两个坑:

ethtool -E eth1 magic 0x10d3ffff offset 0x1e value 0xd9 
offset & length out of bounds

命令后面加上 length 1就可以了(https://forums.servethehome.com/index.php?threads/patching-intel-x520-eeprom-to-unlock-all-sfp-transceivers.24634/post-324427)

ethtool -E eth1 magic 0x10d3ffff offset 0x1e value 0xd9 length 1
Cannot set EEPROM data: Bad address

看来是设备号错误,设备号由DeviceID + VendorID组成(8086 == Intel, 1019 == 82547EI Gigabit Ethernet Controller in my example) (http://blog.vodkamelone.de/archives/146-Unbricking-an-Intel-Pro1000-e1000-network-interface.html),可见我获取到的DeviceID正常,但是VendorID为ffff导致的,由于Intel的VendorID默认都是8086,于是直接把设备号改成0x10d38086再次尝试,刷写成功。

修复EEPROM尝试2

继续网上搜索,看到了有专门针对82574网卡的修复sh(https://support.unitrends.com/hc/en-us/articles/360013179057-Prevent-Intel-82574-NICs-from-going-offline),于是从文中的地址下载(ftp://ftp.unitrends.com/support/scripts/nic82574.sh),代码如下:

#!/bin/sh
log=/var/log/nicfix.log

function fix_eeprom()
{
   eth=$1
   if [ -z "$eth" ]; then
      return 1;
   fi
   if ! ifconfig $eth > /dev/null; then
      echo "$eth does not exist" 
      return 1;
   fi
   bdf=$(ethtool -i $eth | grep "bus-info:" | awk '{print $2}')
   dev=$(lspci -s $bdf -x | grep "00: 86 80" | awk '{print "0x"$5$4$3$2}')
   case $dev in
	0x10d38086)
		echo "$eth: is a 82574L Gigabit Network Connection"
		;;
	0x10f68086)
		echo "$eth: is a 82574L Gigabit Network Connection" 
		;;   
	0x150c8086)
		echo "$eth: is a 82583V Gigabit Network Connection" 
		;;
	*)
		echo "No appropriate hardware found for this fixup" 
		return 2
		;;
   esac
   echo "This fixup is applicable to your hardware" 
   var=$(ethtool -e $eth | grep 0x0010 | awk '{print $16}')
   new=$(echo ${var:0:1}`echo ${var:1} | tr '014589bc' '2367abef'`)
   if [ ${var:0:1}${var:1} == $new ]; then
	echo "Your eeprom is up to date, no changes were made" 
	return 0
   fi
   echo "Applying Intel NIC EEPROM fix ..."
   echo "executing: ethtool -E $eth magic $dev offset 0x1e value 0x$new" 
   ethtool -E $1 magic $dev offset 0x1e value 0x$new
   echo "Change made. You must reboot before changes take effect." 
   return 0
}

function ethmatch()
{
   bdf=$(ethtool -i $1 | grep "bus-info:" | awk '{print $2}')
   dev=$(lspci -s $bdf -x | grep "00: 86 80" | awk '{print "0x"$5$4$3$2}')
   case $dev in
        0x10d38086)
                echo "$1: is a 82574L Gigabit Network Connection"
                ret=1
                ;;
        0x10f68086)
                echo "$1: is a 82574L Gigabit Network Connection"
                ret=1
                ;;
        0x150c8086)
                echo "$1: is a 82583V Gigabit Network Connection" 
                ret=1
                ;;
        *)
                echo "$1: is not a match" 
                ret=0
                ;;
   esac
   return $ret
}

lspci |grep Eth |grep -q 82574
i1=$?
lspci |grep Eth |grep -q 82583
i2=$?
if [ $i1 -eq 0 -o $i2 -eq 0 ]; then
   # found matching NICs, set kernel param
   grubby --update-kernel="$( grubby --default-kernel )" --args="pcie_aspm=off"

   # find which eth device matches these NICs
   doeth=eth0
   ethlist=`ifconfig |grep eth |awk '{ print $1 }' |sed -e 's/\n/ /g'`
   echo "$0 started" >$log
   for eth in $ethlist
   do
     echo "check $eth ..."  >>$log
     ethmatch $eth   >>$log
     if  [ $? -eq 1 ]; then
        echo "ethmatch $eth found"  >>$log
	doeth=$eth
	break
     fi
   done
   echo "matching eth = $doeth" >>$log

   # do the EEPROM fixup from Intel
   fix_eeprom $doeth  >>$log
fi

可见这个sh是专门修复82574L和82583V的网卡。

由于这个出处不是Intel官方,决定找找官方有没有类似的文件,结果还真有(https://sourceforge.net/projects/e1000/files/e1000e%20historic%20archive/eeprom_fix_82574_or_82583/)。

Packet drop issues may occur in some 82574 and 82583-based adapters. Neither the
e1000e driver nor the hardware itself show any packets being dropped, however
packets ARE actually being dropped.
If you encounter packet drop issues in an 82574 or 82583-based adapter, you can
verify that your adapter needs the fix by using ethtool:
# ethtool -e eth0
Offset Values
—— ——
0x0000 00 1b 21 51 39 8c 20 0d 46 f7 a1 10 ff ff ff ff
0x0010 29 e6 02 64 6b 02 00 00 86 80 0c 15 ff ff 58 9c
^^
The value at offset 0x001e (58) has bit 1 unset. This enables the problematic
power saving feature. In this case, the EEPROM needs to read “5a” at offset
0x001e.
A one-time EEPROM fix is available as a shell script. This script will verify
that the adapter is applicable to the fix and whether the fix is needed or not.
If the fix is required, it applies the change to the EEPROM and updates the
checksum. The user must reboot the system after applying the fix if changes
were made to the EEPROM.
Example output of the script:
# bash fixeep-82574_83.sh eth0
eth0: is a “82583V Gigabit Network Connection”
This fixup is applicable to your hardware
executing command: ethtool -E eth0 magic 0x150c8086 offset 0x1e value 0x5a
Change made. You MUST reboot your machine before changes take effect!
The script can be downloaded at
[https://sourceforge.net/projects/e1000/files/e1000e%20stable/eeprom_fix_82574_or_82583/fixeep-82574_83.sh]

sh文件跟上述那个非官方文件不大一样,不过核心的地方是一样的:

#!/bin/bash

if [ -z "$1" ]; then
	echo "Usage: $0 \<interface\>"
	echo "	     i.e. $0 eth0"
	exit 1
fi

if ! ifconfig $1 > /dev/null; then
	exit 1
fi

bdf=$(ethtool -i $1 | grep "bus-info:" | awk '{print $2}')
dev=$(lspci -s $bdf -x | grep "00: 86 80" | awk '{print "0x"$5$4$3$2}')

case $dev in
	0x10d38086)
		echo "$1: is a \"82574L Gigabit Network Connection\""
		;;
	0x10f68086)
		echo "$1: is a \"82574L Gigabit Network Connection\""
		;;   
	0x150c8086)
		echo "$1: is a \"82583V Gigabit Network Connection\""
		;;
	*)
		echo "No appropriate hardware found for this fixup"
		exit 1
		;;
esac

echo "This fixup is applicable to your hardware"

var=$(ethtool -e $1 | grep 0x0010 | awk '{print $16}')
new=$(echo ${var:0:1}`echo ${var:1} | tr '014589bc' '2367abef'`)

if [ ${var:0:1}${var:1} == $new ]; then
	echo "Your eeprom is up to date, no changes were made"
	exit 2
fi

echo "executing command: ethtool -E $1 magic $dev offset 0x1e value 0x$new"
ethtool -E $1 magic $dev offset 0x1e value 0x$new

echo "Change made. You *MUST* reboot your machine before changes take effect!"

仔细看代码,发现与之前尝试失败的82573的脚本有个不同

之前的:
new=$(echo ${var:0:1}`echo ${var:1} | tr '02468ace' '13579bdf'`)
现在的:
new=$(echo ${var:0:1}`echo ${var:1} | tr '014589bc' '2367abef'`)

原来的那份文件是把最右边的那个bit从0改成1,而这个文件是把右边第二个的那个bit从0改成1,看来这是82573和82574的区别,于是继续尝试

root@OpenWrt:~# ethtool -e eth1
Offset        Values
------        ------
0x0000:        00 e8 4c 68 7d c4 ff ff ff ff 50 00 ff ff ff ff
0x0010:        ff ff ff ff 6b 02 40 6c 62 14 d3 10 ff ff d8 83
0x0020:        00 00 01 20 74 7e ff ff 00 00 c8 00 00 00 04 27
0x0030:        c9 6c 50 21 3e 07 0b 45 84 2d 40 00 00 f0 06 07
0x0040:        00 60 80 00 04 0f ff 7f 01 4d ec 92 5c fc 83 f0
0x0050:        20 00 83 00 a0 00 1f 7d 61 19 83 01 50 00 ff ff
0x0060:        00 01 00 40 1c 12 07 40 ff ff ff ff ff ff ff ff
0x0070:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff 88 df
0x0080:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0100:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0110:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0120:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0130:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0140:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0150:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0160:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0170:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0180:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0190:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0200:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0210:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0220:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0230:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0240:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0250:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0260:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0270:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0280:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0290:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0300:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0310:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0320:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0330:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0340:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0350:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0360:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0370:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0380:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0390:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
root@OpenWrt:~# ethtool -e eth1 | grep 0x0010 | awk '{print "0x"$13$12$15$14}'
0x10d3ffff
root@OpenWrt:~# ethtool -e eth1 | grep 0x0010 | awk '{print $16}'
d8
root@OpenWrt:~# echo 8 | tr '014589bc' '2367abef'
a
root@OpenWrt:~# ethtool -E eth1 magic 0x10d38086 offset 0x1e value 0xda length 1
root@OpenWrt:~# ethtool -e eth1
Offset        Values
------        ------
0x0000:        00 e8 4c 68 7d c4 ff ff ff ff 50 00 ff ff ff ff
0x0010:        ff ff ff ff 6b 02 40 6c 62 14 d3 10 ff ff da 83
0x0020:        00 00 01 20 74 7e ff ff 00 00 c8 00 00 00 04 27
0x0030:        c9 6c 50 21 3e 07 0b 45 84 2d 40 00 00 f0 06 07
0x0040:        00 60 80 00 04 0f ff 7f 01 4d ec 92 5c fc 83 f0
0x0050:        20 00 83 00 a0 00 1f 7d 61 19 83 01 50 00 ff ff
0x0060:        00 01 00 40 1c 12 07 40 ff ff ff ff ff ff ff ff
0x0070:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff 87 df
0x0080:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0100:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0110:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0120:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0130:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0140:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0150:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0160:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0170:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0180:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0190:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x01f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0200:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0210:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0220:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0230:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0240:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0250:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0260:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0270:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0280:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0290:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x02f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0300:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0310:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0320:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0330:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0340:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0350:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0360:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0370:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0380:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0390:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03a0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03b0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03c0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03d0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03e0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x03f0:        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

使用了19小时,暂时没有出现问题,RX的errors也一直是0

root@OpenWrt:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:E8:4C:68:7D:C4
          inet6 addr: fe80::2e8:4cff:fe68:7dc4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:68415745 errors:0 dropped:14056 overruns:0 frame:0
          TX packets:139180940 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:23126153714 (21.5 GiB)  TX bytes:155759343593 (145.0 GiB)
          Interrupt:18 Memory:fd4c0000-fd4e0000

OpenWRT连续稳定运行62天了,没有重启,RX的errors也一直是0

root@OpenWrt:/# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:E8:4C:68:7D:C4  
          inet6 addr: fe80::2e8:4cff:fe68:7dc4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2537497798 errors:0 dropped:1316658 overruns:0 frame:0
          TX packets:5027265910 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1290535609583 (1.1 TiB)  TX bytes:4870561140560 (4.4 TiB)
          Interrupt:18 Memory:fd4c0000-fd4e0000 

更新:发现依然偶尔会出现断线,或者断线断线后自动重连的情况。且error会有个位数的增长。

更新OpenWRT,发现e1000e的驱动使用内核驱动,显示的版本号跟随内核版本号,待观察。

https://www.intel.sg/content/www/xa/en/support/articles/000005480/ethernet-products.html

Both the e1000e and e1000 drivers have changed to a kernel-only support model. Thus, the latest e1000e release is 3.8.7 and the latest for e1000 is 8.0.35. In brief, the kernel drivers (drivers included with the Operating System) will be the latest. Bug fixes and changes are made upstream in the Linux kernel.
root@OpenWrt:~# ethtool -i eth1
driver: e1000e
version: 5.15.53
firmware-version: 0.5-0
expansion-rom-version: 
bus-info: 0000:0b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

查看ASPM状态

由于WAN口的网卡直通给了OpenWRT,所以在OpenWRT下安装pciutils这个软件包,再在ssh下执行lspci -vv命令查看网卡的ASPM状态,发现显示的竟然是关闭,可是我实际上没有关闭过

1b:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
	DeviceName: pciPassthru1
	Physical Slot: 256
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 30
	Region 0: Memory at fd2c0000 (32-bit, non-prefetchable) [size=128K]
	Region 2: I/O ports at 7000 [size=32]
	Region 3: Memory at fd2fc000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [c8] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00000  Data: 0025
	Capabilities: [e0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- SlotPowerLimit 0W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x32, ASPM L0s, Exit Latency L0s <64ns
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x32
			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
	Capabilities: [a0] MSI-X: Enable- Count=3 Masked-
		Vector table: BAR=3 offset=00000000
		PBA: BAR=3 offset=00002000

GRUB中添加pcie_aspm=off

GRUB中添加pcie_aspm=off后,除了一次坚持了四十几天后我手动重启坚持的比较久,又出现了三次Reset adapter unexpectedly

最终解决方案:BIOS中关闭ASPM

BIOS中有各个PCIE口点进去,都有一个ASPM开关,全部关掉,从此再也没有遇到过了

ASPM(Active State Power Management)和ACPI(Advanced Configuration and Power Interface)是与电源管理相关的两个不同的技术标准,通常在计算机硬件和操作系统中使用。
ACPI(Advanced Configuration and Power Interface): ACPI是一种电源管理标准,旨在帮助操作系统有效地管理硬件设备的能源消耗和配置。它定义了一组用于控制系统硬件设备状态、电源模式和系统资源分配的接口和数据结构。ACPI允许操作系统在需要时通过合理的方式控制设备的电源状态,以实现节能和优化性能。此外,ACPI还可以用于执行系统休眠和唤醒操作,以及管理各种硬件事件。
ASPM(Active State Power Management): ASPM是ACPI的一部分,专门用于管理PCI Express(PCIe)总线上的电源状态。PCIe是一种高速总线技术,用于连接各种硬件组件,如显卡、存储设备等。ASPM允许PCIe设备在不活动时进入低功耗模式,从而降低系统能耗。ASPM可以在操作系统和系统BIOS中进行配置,以便根据系统需求启用或禁用。
综上所述,ASPM是ACPI的一部分,而ACPI是一个更广泛的电源管理标准,用于控制整个计算机系统的电源状态和配置。ASPM则更专注于PCIe总线上的电源管理,以减少PCIe设备的能源消耗。

提到的文件

Share

You may also like...

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注