I will leave that one for the Alta team to answer. I know it is being worked on.
Yes. Weāve already fixed several cases where changes cause unexpected drops in connectivity. Weāll always work to resolve these, where possible. Obviously some changes are more disruptive than others, but modifying client information should not be one of those cases typically.
The exposed setting in Control is for Switches. We donāt actually do STP at all on the APs, the kernel messages are indicating a state change though (which could indicate blocking, or the interface flapping).
Were you intentionally doing anything to make changes at the time in question or is this a seemingly random port state change? Will the gateway reply to ICMP? How often does this happen? Is there any known activity or device that triggers it?
No changes were being made. It happens randomly, sometimes early in the morning. I believe what is happening is some clients are connecting to both APs, maybe preparing to roam from one to another, and the switch is seeing this as a potential loop.
I do not allow ICMP traffic to the gateway. The gateway is a Palo Alto PA-440.
Before I added āspanning-tree bpdufilter enableā to the switchport interface, multiple times a day. This morning there appeared to be a client jumping back and forth between the two access points and this was logged in the switch.
Aug 27 06:40:43 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 8a73.d6e8.5453 in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Each port connects to an AP6-PRO.
Not that Iām aware of.
Thanks for the information!
By the way, about ICMP. This should only matter if meshing is in use, or Always On is disabled (default is enabled). ICMP to the gateway is part of the health check. Just to explain why I asked about it.
Have you tried enabling PortFast on the two AP interfaces? Iām curious if it would help here.
This guide may help with that: Catalyst 3560 Switch Software Configuration Guide, Rel. 12.2(25)SE - Configuring Optional Spanning-Tree Features [Cisco Catalyst 3560 Series Switches] - Cisco
BPDUs generally shouldnāt be passed, unless you were using meshing, but you mention thatās disabled so shouldnāt be relevant here.
So while you can see a client STA on more than one AP STA at a time, with roaming itās a pre-authorization, once the actual full authentication process is complete it would only be associated with one AP STA. So itās not fully connected to both, and shouldnāt be causing a loop.
The frequent flapping between APs could be an issue for that specific device at least. Do you have other known devices behaving as such? Or just this device? If itās just this device, how does the overlap between both APs look in the area itās typically used?
EDIT: updated linked article to point to the correct article
Thanks for sharing. I also just confirmed mesh is disabled and always on is enabled for both access points.
I did add āspanning-tree portfast trunkā and it did not seem to help.
I added āspanning-tree bpdufilter enableā directly on the interfaces to stop them from participating in spanning-tree. Since doing this, the access points have remained online.
Iām not entirely sure. I do feel Iām seeing this issue more with Apple devices than anything. Iād need to reset the interfaces to my baseline I posted in my initial message and then reshare the access point logs. I donāt mind doing that, but I need help from support with reviewing the access point logs. The log below is from a client and the client is an iPhone.
Aug 27 17:15:55 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:18:11 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:18:43 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:18:52 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/5 and port Gi1/0/44
Aug 27 17:19:08 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:19:42 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/5 and port Gi1/0/44
Aug 27 17:19:50 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:20:06 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:20:31 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:22:47 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
Aug 27 17:23:01 CDT: %SW_MATM-4-MACFLAP_NOTIF: Host 9a4b.7148.6a7f in vlan 30 is flapping between port Gi1/0/44 and port Gi1/0/5
With my interface configurations as shown below, the access points have been stable and only go offline after I apply an update in the cloud controller, such as changing the device name, icon, etc. I did receive a power alert a few times within a week, which caused the access points to go offline and reboot. The switch is capable of 775.0 watts, and only 169.4 watts are utilized. Quick research showed that this seemed more like a Cisco-related issue. I enabled LLDP on the switch and will monitor it. Iām unsure if the access points use LLDP to communicate their power needs, but I figured it was worth a shot.
interface GigabitEthernet1/0/5
description Family Room AP6-PRO
switchport trunk native vlan 666
switchport trunk allowed vlan 10,20,30,100
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk
spanning-tree bpdufilter enable
end
interface GigabitEthernet1/0/44
description Garage AP6-PRO
switchport trunk native vlan 666
switchport trunk allowed vlan 10,20,30,100
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk
spanning-tree bpdufilter enable
end
3650-CORE#sh lldp neighbors
Capability codes:
(R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
(W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
Device ID Local Intf Hold-time Capability Port ID
Garage Gi1/0/44 120 B,W,R bcb9.2305.d870
FamilyRoom Gi1/0/5 120 B,W,R bcb9.2305.d648
Total entries displayed: 2
So to my understanding PortFast and BPDU Filter are typically used hand in hand. I think by default, when enabling PortFast it also enables the BPDU filter (for PortFast enabled interfaces). The main difference being, doubtful that I need to say this, but that it keeps the interface in a Forwarding
state. My thought process was to add this to see if it helped mitigate unexpected interface state changes that are being reported from the AP and/or improve client connectivity.
Glad to hear things are generally stable.
We do use LLDP-MED to exchange PoE info. The info should be reviewable in the power-via-MDI
part of the TLVs. Please let us know if this helps or if the issue persists.
Okay, well that can make it easier. Itās likely some finessing of the config thatās required. Did you end up re-enabling Fast Roaming? If so, are there certain areas where you notice it worse than others? My first guess would be that power levels need some fine tuning for the environment, but isnāt necessarily accurate either. Iām just guessing that there may be locations with an overlap that makes it more favourable to flap between APs.
Leveraging the iPhone Wi-Fi diagnostic profile likely would be very beneficial during the fine tuning stage, and an iPhone probably is a better client to use for that initially at least (or another mobile). The profile can be used to see how the phone sees the network, and when it roams, to better understand the overlap or why itās making decisions.
The full URL for the profile and instructions on Appleās developer portal are HERE (including a search to narrow results). The direct link to the profile you need is HERE. Full instructions HERE.
If you read the instructions, then steps 1-3 under Enable Logging are really all that matter for what weāre doing here. Download/install on device, go to Settings, there should be a profile installed prompt just below the iCloud account info, and above Airplane Mode, and when you follow that it will ask you for the device password when installing. After that you can get to diagnostics by going to Settings>Wi-Fi, tap the i on the right beside the SSID (your device must be connected to the SSID), then tap Diagnostics. Near the top it will tell you the connected BSSID, channel, and the received signal level, and it updates frequently enough to be useful.
If it would be helpful to discuss some of the nuances to fine tuning the mobility domain over the phone, Iām more than happy to do so if that would be better for you. Please just let me know. Happy to continue discussion here, too.
@Joe Iād be curious now that youāve enabled LLDP for the power side of things, if you remove the bpdufilter enable command from the interface if things remain stable? On the Cisco side, if a portfast enabled interface receives a BPDU it will transition out of portfast. Bpdufilter ignores all BPDUs, but you might just be masking a downstream problem. Itās worth making sure the AP is not bridging frames back to the switch that it shouldnāt be.
Good idea. LLDP can help with spanning-tree and VLAN tagging. Iāve removed āspanning-tree bpdufilterā and Iāll see how it goes.
Thank you for the in-depth information. I believe the issue with devices jumping back and forth is due to too much overlap. Where my children place their iPhones in the evening is right in the middle of the access points. Interestingly, after enabling LLDP, Iām no longer seeing the devices flapping in the logs.
No problem! Iām really glad to hear that it has improved. Interesting that enabling LLDP has helped with the flapping messages.
When it comes to overlap, a difference of 1dB could make enough of a difference to help prevent the devices from flapping, assuming that doesnāt cause issues with devices elsewhere in the cell. If itās helpful, HERE is Appleās guide on roaming design considerations. Their trigger threshold is typically -75, but details are all there.
Close to 24 hours with no issues after enabling lldp.
Still no issues. I believe enabling LLDP was the key here.