Me too! I’m holding my breath because it has lasted overnight before, but we are certainly beyond the longest iv ever been able to keep it up.
I will ping back if it goes down. If it doesn’t then I will start playing with the transceiver see if its just bad. I have a few so I can just swap it. Maybe its some kind of strange incompatibility.
Thanks for the diligence!
2 Likes
I concur about not calling it resolved, or even “safely improved” until we have higher uptime. Based on the overall report, I would say 36 hours is a solid soft metric and then a solid hard metric being Monday, maybe?
I’m still trying to find the actual root of the issue, but if it is what I think it is, it’s very much timing related. e.g. there may only be brief periods measured in minutes in which I would be able to confirm my theory.
2 Likes
Yeah that sounds great to me, both would be longer than any amount of sustained workload iv been able to manage.
If you need me to try and do anything if it goes down again, I’m more then willing to help you get whatever you need!
Just passed the 36 hour mark.
3 Likes
I see! I’m excited. Now; what do you think? Is this a product of the super secret changes? Or the move to WAN1?
I see the DNS servers are manually assigned in LAN management. Im not sure what else you may have done as iv not really looked.
Do you think we should move in a certain direction to see whats failing it?
1 Like
The DNS change was the change I referenced earlier.
I think the key here was switching all the Alta gear to DHCP. After doing so, I checked the ARP tables across the Route10 and the S8. It showed addresses that were assigned to Alta gear long after the cache should’ve timed out.
The Route10 showed an IP assigned to the S8. The S8 showed a different IP assigned to an AP. In both cases, the MAC portion of ARP showed <incomplete>
. They persisted for well over 15 minutes. I cleared the ARP cache to see if they would show back up and they haven’t thus far.
For now, I’ve reverted the DNS change to make sure that was a benign change. Other than that I’d recommend we let it sit until we get to that hard metric of Monday. If there are no anomalous issues between now and then, we can be confident that it’s the IP issue. Then we can start re-assigning static IPs, one device per day and see if the issue comes back again. On paper, it appeared to be an IP conflict, but I would’ve expected to see a complete ARP entry in that event; very peculiar situation.
1 Like