Just wanted to circle back here. My apologies, the above script has a pretty nasty bug in a failover multi-WAN env. My PPPoE WAN is my the primary and the backup is DHCP, but it may occur in other multi-WAN combinations.
I have worked on a new version that handles cold/warm boots, firmware upgrades and provisioning more gracefully, and it seems to work albeit it adds some time to some of the aforementioned events. At least it’s graceful—the old one would trigger the routing tables and connections to get mixed up (state drift of sorts).
If you hit this bug and can reach the device via SSH, you should be able to issue a reboot which should fix it, otherwise a powercycle seems to fix it reliably for me. Do note I use a XGS-PON module, and it’s slower boot time might be why a cold-boot works reliably as a fix, and it may not work in all envs.
I’m refraining from sharing just yet as it’s specific to my connection, but I wanted to at least make anyone using it aware. I can try and share an early version if you reach out via DM but ideally I hope to iterate on it to improve the time for the mwan3 logic and also make it portable/universal (say for both failover and load-balancing envs).