Odd behavior requiring reboot every 24 hours

donsandro · October 28, 2024, 10:26am

Hi all,

Interesting behavior recently, and no other network changes except for introducing the route 10, which is not really a small change.

In the past 25 hours have had to unplug and plug back in the route10 a few times.
The behavior seen is being on the network (phone or laptop, local and internet works fine) then it doesn’t, can’t connect to the cloud controller or anywhere on the internet. Local network is fine and you can see from the image pihole is seeing excessive traffic. The spikes are where the egress disappears. This shows 2 events and is missing the event from 5am the prior morning.
Have tried to unplug the cable modem and wait then plug back in and no joy. Once the route 10 is unplugged, + 30s wait, then plug back in, everything springs back to life.
Thoughts on what I can check or logging that I can pull?

Thank you in advance,
Sandro

Alta-Jeff · October 28, 2024, 3:24pm

@donsandro We aren’t seeing anything like this in any of our many testing sites. When the problem occurs, you’ll need to document what exactly doesn’t work, whether that is:

DNS lookups
Reachability of your ISP’s gateway (Run “arp -an” and “ip r” on Route10)
Routability of the Route10 itself (ping 1.1.1.1 from a LAN device, and then from the Route10 itself)

If you can grab logs (/var/log/messages, or use a remote logging server) from the Route10, that should help us understand more, as well.

Also if you can remove any non-standard variables like PiHoles, that would be helpful.

donsandro · October 31, 2024, 11:13am

Morning @Alta-Jeff

Sharing the issue presented itself again.
Happy to share logs via DM or via desired method.
I have logs both from the router itself as well as from a logging server.

A few points to note are the internal network is working fine (routing, DNS, …)
Externally I was able to ping out from the router and run the arp commands as well.

Any apps, http, to the internet though did not succeed. I wasn’t about to complete a packet capture and can do that if this occurs again.

Something additional to note, in the router logs, I only see events from 10/24 and this morning, no other dates, which I find odd considering the reboots earlier this week.

Thank you in advance,
Sandro

Alta-Jeff · October 31, 2024, 12:36pm

@donsandro Sure, I’ll DM you so we can exchange logs. Were the events only as new as 10/24 from the logging server, as well? How are you obtaining the logs from the router? Did you disable the PiHole and any other non-standard potential ARP poisoning devices?

donsandro · November 2, 2024, 11:17am

Thanks @Alta-Jeff for the help and engagement on this.

For anyone watching the thread sharing what was accomplished so far.

Had another event on 10/31 and ran the some of the steps above and captured logs.

The 10/24 date is the default date stamped when the router is restarted. I’ve pulled logs from the router and from my external log server.
I did not disable the pihole, however, with the events I had on 10/31 I reset my route10, moved from WAN2 (SFP) to WAN1, and set it back up like a new device.

What we observed is internal routing, DNS is fine. ICMP externally also works, but application access (http, app, …) looks to have failed.

Will be monitoring again since the reset and will update with additional details.

Thanks again