Random Packet Loss

rutman286 · August 26, 2024, 9:08pm

I will leave that one for the Alta team to answer. I know it is being worked on.

Alta-MikeD · August 27, 2024, 2:55pm

Yes. We’ve already fixed several cases where changes cause unexpected drops in connectivity. We’ll always work to resolve these, where possible. Obviously some changes are more disruptive than others, but modifying client information should not be one of those cases typically.

Alta-MikeD · August 27, 2024, 3:12pm

The exposed setting in Control is for Switches. We don’t actually do STP at all on the APs, the kernel messages are indicating a state change though (which could indicate blocking, or the interface flapping).

Were you intentionally doing anything to make changes at the time in question or is this a seemingly random port state change? Will the gateway reply to ICMP? How often does this happen? Is there any known activity or device that triggers it?

Joe · August 27, 2024, 3:29pm

No changes were being made. It happens randomly, sometimes early in the morning. I believe what is happening is some clients are connecting to both APs, maybe preparing to roam from one to another, and the switch is seeing this as a potential loop.

I do not allow ICMP traffic to the gateway. The gateway is a Palo Alto PA-440.

Before I added “spanning-tree bpdufilter enable” to the switchport interface, multiple times a day. This morning there appeared to be a client jumping back and forth between the two access points and this was logged in the switch.

Aug 27 06:40:43 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 8a73.d6e8.5453 in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5

Each port connects to an AP6-PRO.

Not that I’m aware of.

Alta-MikeD · August 27, 2024, 11:25pm

Thanks for the information!

By the way, about ICMP. This should only matter if meshing is in use, or Always On is disabled (default is enabled). ICMP to the gateway is part of the health check. Just to explain why I asked about it.

Have you tried enabling PortFast on the two AP interfaces? I’m curious if it would help here.

This guide may help with that: Catalyst 3560 Switch Software Configuration Guide, Rel. 12.2(25)SE - Configuring Optional Spanning-Tree Features [Cisco Catalyst 3560 Series Switches] - Cisco

BPDUs generally shouldn’t be passed, unless you were using meshing, but you mention that’s disabled so shouldn’t be relevant here.

So while you can see a client STA on more than one AP STA at a time, with roaming it’s a pre-authorization, once the actual full authentication process is complete it would only be associated with one AP STA. So it’s not fully connected to both, and shouldn’t be causing a loop.

The frequent flapping between APs could be an issue for that specific device at least. Do you have other known devices behaving as such? Or just this device? If it’s just this device, how does the overlap between both APs look in the area it’s typically used?

EDIT: updated linked article to point to the correct article

Joe · August 28, 2024, 2:01am

Thanks for sharing. I also just confirmed mesh is disabled and always on is enabled for both access points.

I did add “spanning-tree portfast trunk” and it did not seem to help.

I added “spanning-tree bpdufilter enable” directly on the interfaces to stop them from participating in spanning-tree. Since doing this, the access points have remained online.

I’m not entirely sure. I do feel I’m seeing this issue more with Apple devices than anything. I’d need to reset the interfaces to my baseline I posted in my initial message and then reshare the access point logs. I don’t mind doing that, but I need help from support with reviewing the access point logs. The log below is from a client and the client is an iPhone.

Aug 27 17:15:55 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:18:11 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:18:43 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:18:52 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/5 and port Gi1/0/44
Aug 27 17:19:08 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:19:42 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/5 and port Gi1/0/44
Aug 27 17:19:50 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:20:06 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:20:31 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:22:47 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:23:01 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5

Joe · August 28, 2024, 3:06pm

With my interface configurations as shown below, the access points have been stable and only go offline after I apply an update in the cloud controller, such as changing the device name, icon, etc. I did receive a power alert a few times within a week, which caused the access points to go offline and reboot. The switch is capable of 775.0 watts, and only 169.4 watts are utilized. Quick research showed that this seemed more like a Cisco-related issue. I enabled LLDP on the switch and will monitor it. I’m unsure if the access points use LLDP to communicate their power needs, but I figured it was worth a shot.

interface GigabitEthernet1/0/5
description Family Room AP6-PRO
switchport trunk native vlan 666
switchport trunk allowed vlan 10,20,30,100
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk
spanning-tree bpdufilter enable
end

interface GigabitEthernet1/0/44
description Garage AP6-PRO
switchport trunk native vlan 666
switchport trunk allowed vlan 10,20,30,100
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk
spanning-tree bpdufilter enable
end

3650-CORE#sh lldp neighbors
Capability codes:
(R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
(W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other

Device ID Local Intf Hold-time Capability Port ID
Garage Gi1/0/44 120 B,W,R bcb9.2305.d870
FamilyRoom Gi1/0/5 120 B,W,R bcb9.2305.d648

Total entries displayed: 2

Alta-MikeD · August 28, 2024, 8:01pm

So to my understanding PortFast and BPDU Filter are typically used hand in hand. I think by default, when enabling PortFast it also enables the BPDU filter (for PortFast enabled interfaces). The main difference being, doubtful that I need to say this, but that it keeps the interface in a Forwarding state. My thought process was to add this to see if it helped mitigate unexpected interface state changes that are being reported from the AP and/or improve client connectivity.

Glad to hear things are generally stable.

We do use LLDP-MED to exchange PoE info. The info should be reviewable in the power-via-MDI part of the TLVs. Please let us know if this helps or if the issue persists.

Okay, well that can make it easier. It’s likely some finessing of the config that’s required. Did you end up re-enabling Fast Roaming? If so, are there certain areas where you notice it worse than others? My first guess would be that power levels need some fine tuning for the environment, but isn’t necessarily accurate either. I’m just guessing that there may be locations with an overlap that makes it more favourable to flap between APs.

Leveraging the iPhone Wi-Fi diagnostic profile likely would be very beneficial during the fine tuning stage, and an iPhone probably is a better client to use for that initially at least (or another mobile). The profile can be used to see how the phone sees the network, and when it roams, to better understand the overlap or why it’s making decisions.

The full URL for the profile and instructions on Apple’s developer portal are HERE (including a search to narrow results). The direct link to the profile you need is HERE. Full instructions HERE.

If you read the instructions, then steps 1-3 under Enable Logging are really all that matter for what we’re doing here. Download/install on device, go to Settings, there should be a profile installed prompt just below the iCloud account info, and above Airplane Mode, and when you follow that it will ask you for the device password when installing. After that you can get to diagnostics by going to Settings>Wi-Fi, tap the i on the right beside the SSID (your device must be connected to the SSID), then tap Diagnostics. Near the top it will tell you the connected BSSID, channel, and the received signal level, and it updates frequently enough to be useful.

If it would be helpful to discuss some of the nuances to fine tuning the mobility domain over the phone, I’m more than happy to do so if that would be better for you. Please just let me know. Happy to continue discussion here, too.

jeffk931 · August 29, 2024, 12:33pm

@Joe I’d be curious now that you’ve enabled LLDP for the power side of things, if you remove the bpdufilter enable command from the interface if things remain stable? On the Cisco side, if a portfast enabled interface receives a BPDU it will transition out of portfast. Bpdufilter ignores all BPDUs, but you might just be masking a downstream problem. It’s worth making sure the AP is not bridging frames back to the switch that it shouldn’t be.

Joe · August 29, 2024, 1:10pm

Good idea. LLDP can help with spanning-tree and VLAN tagging. I’ve removed “spanning-tree bpdufilter” and I’ll see how it goes.

Joe · August 29, 2024, 1:12pm

Thank you for the in-depth information. I believe the issue with devices jumping back and forth is due to too much overlap. Where my children place their iPhones in the evening is right in the middle of the access points. Interestingly, after enabling LLDP, I’m no longer seeing the devices flapping in the logs.

Alta-MikeD · August 29, 2024, 3:42pm

No problem! I’m really glad to hear that it has improved. Interesting that enabling LLDP has helped with the flapping messages.

When it comes to overlap, a difference of 1dB could make enough of a difference to help prevent the devices from flapping, assuming that doesn’t cause issues with devices elsewhere in the cell. If it’s helpful, HERE is Apple’s guide on roaming design considerations. Their trigger threshold is typically -75, but details are all there.

Joe · August 30, 2024, 10:08am

Close to 24 hours with no issues after enabling lldp.

Joe · August 31, 2024, 2:17pm

Still no issues. I believe enabling LLDP was the key here.