Alta Switch Stopped Working

This morning left for work and shortly after 1 of my 2 S8-POE switches “stopped working”. Came home and the ports were lit but no traffic to and from anything else on my network. Unplug and plug back it and everything was back up.
Unfortunately no logs to pull from after having switched my Syslog server IP.

Similar to this prior thread. Switch took a dive today

You’ve been quite active on the community, would it be fair to assume you’ve set up SSH keys to SSH into the devices?

Identical thing happened to me yesterday during the night with my 8 port switch. At 4.30am GMT I received an email to say my switch had gone offline. Port lights were on and I believe the APs were powered but connectivity had been lost.

This occurred within about 12 hours of installing the new POE firmware.

I pulled the power to reboot it and everything came back up and has been fine ever since.

I don’t currently have SSH keys enabled so not able to access anything. This is the first time this has happened to me and hopefully just a one off. Something must have crashed.

Here is an article on setting up SSH keys. Of course, this will need to be done when the switch is in a normal, operating state. Once done, or if the switch is operational, I can either provide a command to make logs persist a bit more or do the command myself; the latter would require an invite to the site.

Thanks @Alta-Matt_v2.
SSH is configured now, didn’t realize the
Is the command: touch /cfg/.persistent.log ?
Assuming the logs persist a reboot once this command is run? Also will these rotate so I don’t have a space issue?

Thanks again

Correct on all counts. touch /cfg/.persistent.log will create an empty file named .persistent.log which the preceding period makes it hidden from a normal ls command. (I’m sure you know this but throwing it out there for those that may not). This will rotate /var/log/messages up to 3 times (you end up with messages.1 messages.2 and the normal messages) when the device restarts, which doesn’t fill up the AP/switch’s storage. Obviously there’s logic here to rotate the files so it’s not infallible but it’s something to try to retain logs.

Hi @Alta-Matt_v2, @Alta-Chase ,

Had another event this morning where my switch was unresponsive (from the network) until I physically unplugged and plugged the power back in.

Sharing a snippet from the logs. Looks like the 3 log files rotated as I only have about 1 hour across all 3 logs. It seems to show a repeating set of events like the below.

Thank you in advance

Feb 12 22:29:43 Alta-Switch-1 user.notice root: netd: Failed, failures: 30
Feb 12 22:29:44 Alta-Switch-1 user.notice root: netd: Next state: enet_normal
Feb 12 22:29:45 Alta-Switch-1 user.notice root: netd: Failed, failures: 1
Feb 12 22:29:47 Alta-Switch-1 user.notice root: netd: Failed, failures: 2
Feb 12 22:29:48 Alta-Switch-1 user.notice root: netd: Failed, failures: 3
Feb 12 22:29:50 Alta-Switch-1 user.notice root: netd: Failed, failures: 4
Feb 12 22:29:51 Alta-Switch-1 user.notice root: netd: Failed, failures: 5
Feb 12 22:29:53 Alta-Switch-1 user.notice root: netd: Failed, failures: 6
Feb 12 22:29:53 Alta-Switch-1 daemon.notice rc: [2024/02/12 17:29:53:7888] N: [wsicli|3|WS/h1/default/manage.alta.inc]: lws_client_connect_3_connect: dns lookup failed -3
Feb 12 22:29:53 Alta-Switch-1 daemon.err rc: [2024/02/12 17:29:53:7894] E: CLIENT_CONNECTION_ERROR: Closed before conn
Feb 12 22:29:53 Alta-Switch-1 daemon.notice rc: [2024/02/12 17:29:53:7918] N: __lws_lc_untag:  -- [wsicli|3|WS/h1/default/manage.alta.inc] (0) 15.949s
Feb 12 22:29:54 Alta-Switch-1 user.notice root: netd: Failed, failures: 7
Feb 12 22:29:55 Alta-Switch-1 user.notice root: netd: Failed, failures: 8
Feb 12 22:29:57 Alta-Switch-1 user.notice root: netd: Failed, failures: 9
Feb 12 22:29:58 Alta-Switch-1 user.notice root: netd: Failed, failures: 10
Feb 12 22:30:00 Alta-Switch-1 user.notice root: netd: Failed, failures: 11
Feb 12 22:30:01 Alta-Switch-1 user.notice root: netd: Failed, failures: 12
Feb 12 22:30:02 Alta-Switch-1 user.notice root: netd: Failed, failures: 13
Feb 12 22:30:04 Alta-Switch-1 user.notice root: netd: Failed, failures: 14
Feb 12 22:30:05 Alta-Switch-1 user.notice root: netd: Failed, failures: 15
Feb 12 22:30:07 Alta-Switch-1 user.notice root: netd: Failed, failures: 16
Feb 12 22:30:08 Alta-Switch-1 user.notice root: netd: Failed, failures: 17
Feb 12 22:30:09 Alta-Switch-1 user.notice root: netd: Failed, failures: 18
Feb 12 22:30:11 Alta-Switch-1 user.notice root: netd: Failed, failures: 19
Feb 12 22:30:12 Alta-Switch-1 user.notice root: netd: Failed, failures: 20
Feb 12 22:30:14 Alta-Switch-1 user.notice root: netd: Failed, failures: 21
Feb 12 22:30:15 Alta-Switch-1 user.notice root: netd: Failed, failures: 22
Feb 12 22:30:16 Alta-Switch-1 user.notice root: netd: Failed, failures: 23
Feb 12 22:30:17 Alta-Switch-1 user.alert : Port: 3  Link: Down  Speed: 0 Mb/s  Duplex: Half
Feb 12 22:30:18 Alta-Switch-1 user.notice root: netd: Failed, failures: 24
Feb 12 22:30:19 Alta-Switch-1 user.alert : Port: 3  Link: Up  Speed: 100 Mb/s  Duplex: Full
Feb 12 22:30:19 Alta-Switch-1 user.notice root: netd: Failed, failures: 25
Feb 12 22:30:21 Alta-Switch-1 user.notice root: netd: Failed, failures: 26
Feb 12 22:30:22 Alta-Switch-1 user.notice root: netd: Failed, failures: 27
Feb 12 22:30:23 Alta-Switch-1 user.notice root: netd: Failed, failures: 28
Feb 12 22:30:25 Alta-Switch-1 user.notice root: netd: Failed, failures: 29
Feb 12 22:30:26 Alta-Switch-1 user.notice root: netd: Failed, failures: 30
Feb 12 22:30:27 Alta-Switch-1 daemon.notice rc: [2024/02/12 17:30:27:1341] N: __lws_lc_tag:  ++ [wsicli|4|WS/h1/default/manage.alta.inc] (1)
Feb 12 22:30:27 Alta-Switch-1 user.notice root: netd: Next state: enet_normal
Feb 12 22:30:29 Alta-Switch-1 user.notice root: netd: Failed, failures: 1

Yes, the logs will rotate for a total of 3 with that touch /cfg/.persistent.log command. We have to put the limit somewhere, otherwise the switch’s storage will fill up.

As you can probably surmise from the log output, the uplink on the switch is failing.

As I understand it, the switch is plugged directly into your router, correct? If so, do you have logs from the router that can confirm the loss of link?

I would try a different port on the switch; same cable, same port on the router (or whatever the switch is plugged into) and continue to monitor.

Hi @Alta-Matt_v2

Thanks for looking into this.
Port on the router, the router itself shows no issues, just logs of duplicate leases during the 2/12 outage. example below

I’ll move the cable to another port on the Alta switch and see what happens.

Thank you,
Sandro

Feb 12 10:45:16 EdgeRouter-X-5-Port dhcpd3: uid lease 192.168.1.46 for client 00:b0:4c:36:0a:36 is duplicate on LAN
Feb 12 12:51:30 EdgeRouter-X-5-Port dhcpd3: uid lease 192.168.1.40 for client 00:b0:4c:36:0b:89 is duplicate on LAN