View unanswered posts | View active topics It is currently Fri Sep 21, 2018 9:27 am



Reply to topic  [ 3 posts ] 
 XA-10: FreeBSD / pfSense NIC freeze 
Author Message

Joined: Fri Jan 29, 2016 7:18 pm
Posts: 3
Post XA-10: FreeBSD / pfSense NIC freeze
Hi,

we and our customers are using fitlet-XA-10 with pfSense and have the same problems for a very long time now: The NICs simply do hang after a while. We are very experienced with pfSense and manage pfSense installations on different platforms since many years (most of them on APUs and Supermicro). But we are not able to solve the problem.

After a while (2 hours to 20 days) one or more of the NICs hang. You can only reboot. ifconfig shows not active/down for the specific link/NIC, the switches show the port as down. You cannot restart the links (up/down) they are frozen. We did many "optimizations" and fiddled around with many, many settings (at OS level), we disabled all ACPI/ASPM stuff in BIOS. Nothing changed.

We see things like this:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
igb2 1500 <Link#3> 00:01:c0:1a:37:87 149834229777726 899005374320220 149834229053370 149834230284128 299668458106740 149834229053370

The LEDs at the NIC port stop at the point where they are, means nothing blinks and some are off, some are on - frozen. There are no specific NICs where we see this more often, it happens to all of them.

Yes, we did a fresh install on a fitlet with BIOS defaults (pfSense 2.4.3-RELEASE-p1, BIOS SBCFLT_0.08.08). Same same.

I saw others having similar problems. Even with linux (quote from amazon "Every so often, one or more network ports would fail. I ended up running a software watchdog to reboot the machine if it was unable to ping some known-good IPs."). And I saw that this problem is not uncommon for pfSense or OpnSense installations on fitlet.

But the issue does not seem to be specific to the XA-10, it seems as if this issue has to do with the Intel I211 ASPM being enabled. There are more than a few discussions and very good "debugging" sessions showing that ASPM on I211 seems to be the reason. As I heard some vendors using I211 do have BIOS version where ASPM is disabled on the NICs - exactly because of this problem. And I talked to tech people from supermicro. They confirmed that there seems to be a problem with I211 (they said that the I211 is not the only one).

We disabled ASPM in BIOS but still get this:
[root@ipfw02 ~]# pciconf -lvc igb2
igb2@pci0:3:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
vendor = 'Intel Corporation'
device = 'I211 Gigabit Network Connection'
class = network
subclass = ethernet
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR NS
link x1(x1) speed 2.5(2.5) ASPM L0s/L1(L0s/L1)
ecap 0001[100] = AER 2 0 fatal 1 non-fatal 1 corrected
ecap 0003[140] = Serial 1 0001c0ffff1a3787
ecap 0017[1a0] = TPH Requester 1
If ASPM would be fully disabled the output did not mention ASPM (as verified with another machine).

On all pfSense installations we configured the LAN side to be a LAGG because normally only one link breaks a the same time and all WAN links are always redundant. But this is not the way it should work. Firewalls and their hardware should run for years without the need to be rebooted (except upgrades).

Some say that they are running pfSense on fitlet for several 100 days without problems. Are they using them in a production environment? What is the difference? Why do all the installations on fitlet that we manage behave the same (and freeze).

Does anyone have an idea what we can do?
Does anybody know how to disable ASPM at NIC level?
Or maybe there is a new/other NIC firmware?

If we cannot fix this we have to change to other hardware and recommend our customers to do the same (this would not be nice because we recommended fitlet to them).

Thank you very much for any help!

-lutzn


Thu Jun 28, 2018 10:07 am
Profile
Site Admin

Joined: Mon Dec 25, 2017 4:21 pm
Posts: 138
Post Re: XA-10: FreeBSD / pfSense NIC freeze
Compulab provides full support for Linux mint only.
Nevertheless, we will try to suggest optional solutions, but cannot guarantee the proper functionality on FreeBSD.

forwarded the information to the R&D for advice.

Thank you.


Mon Jul 02, 2018 10:23 am
Profile

Joined: Sun Apr 26, 2009 3:24 pm
Posts: 295
Post Re: XA-10: FreeBSD / pfSense NIC freeze
Hi lutzn,

We are not aware of mass network ports failures on Linux/Windows7/Windows10 machines. Possibly it's something specific to FreeBSD family kernels.

Please clarify what exactly hangs. Do you see related kernel messages? Do you see device in PCI listing after the hang? Are you able to dump it's PCI space after the hang?

Can you please test and confirm the same behaviour on mainstream Linux distro? For example LinuxMint19/Ubuntu18.04

If you could share fast reproducing scenario it would help us a lot.

Thanks,
Denis

_________________
Compulab's Linux support


Tue Jul 10, 2018 6:41 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 3 posts ] 

Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.