All Yealink Phones Down at One Site



  • So this is about the weirdest phone issue I've seen in a long time. We have a FreePBX customer with one large site and hundreds of satellite users. Satellite users are fine, some are Yealink, some are Linphone, etc. No issues.

    The main site has twenty or so phones, all Yealink except for one, a Polycom in a conference room. The Polycom is on and working normally (which is what is so weird.)

    Other than that one Polycom, every Yealink at that one site is offline and cannot register. We've updated firmware, changed DNS, removed DNS (moved to IP registration), checked STUN (tried on and off), etc. Nothing. PBX is showing no attempts to register, either. Site is whitelisted in both IDS and Firewall. And we tested with them off, just in case.

    All Yealinks are T42S. FreePBX 14 with Asterisk 15.5. (Issue started with 15.3 I believe, we updated as part of the process.) OS and Modules are fully up to date.

    We just can't figure out what is common about these phones and what could be making them not work. They all stopped working last night, so were okay yesterday, but not working first thing this morning.



  • So only yealink devices on premise can't register with the server? Is the server able to be reached via telnet from on-premise and off?



  • @scottalanmiller said in All Yealink Phones Down at One Site:

    sterisk 15.5. (Issue started with 15.3 I believe, we updated as part of the process.) OS and Modules are fully up to date.
    We just can't figure out what is common about these phones and what could be making them not work. They all stopped working last night, so were okay yesterday, but not working first thing this morning.

    I am assuming you restarted the firewall service and or the entire PBX already? I had an issue one where a Grandstream phone (just 1 out of 10) failed to register. Rebooting the PBX resolved the issue, and I think it had to do with something in the firewall.



  • Also, are all the Yealinks connected to a common switch? Has that been power cycled?



  • Are the phones getting DHCP addresses?



  • @dustinb3403 said in All Yealink Phones Down at One Site:

    Are the phones getting DHCP addresses?

    Yes, networking is working.



  • @fuznutz04 said in All Yealink Phones Down at One Site:

    Also, are all the Yealinks connected to a common switch? Has that been power cycled?

    Yes, but so is everything else. One switch for the whole facility.



  • I see similar behavior sometimes here. Come in and all the phones will not be working, but everythign else is. Usually this is because the firewall/IPS/IDS has been triggered due to lots of SIP attemtps at once, like if many of the phones want to try an reregister at once. I just go and look for blocked sites in the firewall/utm/whatever, and usually it is our voip/sip provider's ip listed there. Remove it and the phones start working almost instantly.



  • @momurda said in All Yealink Phones Down at One Site:

    I see similar behavior sometimes here. Come in and all the phones will not be working, but everythign else is. Usually this is because the firewall/IPS/IDS has been triggered due to lots of SIP attemtps at once, like if many of the phones want to try an reregister at once. I just go and look for blocked sites in the firewall/utm/whatever, and usually it is our voip/sip provider's ip listed there. Remove it and the phones start working almost instantly.

    We control all those points and have it whitelisted.



  • @scottalanmiller said in All Yealink Phones Down at One Site:

    @momurda said in All Yealink Phones Down at One Site:

    I see similar behavior sometimes here. Come in and all the phones will not be working, but everythign else is. Usually this is because the firewall/IPS/IDS has been triggered due to lots of SIP attemtps at once, like if many of the phones want to try an reregister at once. I just go and look for blocked sites in the firewall/utm/whatever, and usually it is our voip/sip provider's ip listed there. Remove it and the phones start working almost instantly.

    We control all those points and have it whitelisted.

    Have you actually temporarily disabled the Responsive FW?



  • Factory reset a phone and try manually programming? How about a packet capture at the site firewall, do you see attempts from that IP phone to where? I am assuming that the PBX is hosted elsewhere.



  • @momurda said in All Yealink Phones Down at One Site:

    sometimes here. Come in and all the phones will not be working, but everythign else is. Usually this is because the firewall/IPS/IDS has been triggered due to lots of SIP attemtps at once, like if many of the phones want to try an reregister at once. I just go and look for blocked sites in the firewall/utm/whatever, and usually it is our voip/sip provider's ip listed there. Remove i

    @scottalanmiller Have you rebooted the PBX yet?



  • Did fail2ban block that entire subnet? That wouldn't explain why a single polycom device was working correctly though.



  • I had this issue with a small office network but Polycom phones. I upgraded all firmware on the switches and router, rebooted everything to no prevail. It only started working when I factory re-set the main router and re-deployed from scratch! so strange!



  • @coliver said in All Yealink Phones Down at One Site:

    Did fail2ban block that entire subnet? That wouldn't explain why a single polycom device was working correctly though.

    No, that's the IDS and we turned it off to test.



  • @fuznutz04 said in All Yealink Phones Down at One Site:

    @momurda said in All Yealink Phones Down at One Site:

    sometimes here. Come in and all the phones will not be working, but everythign else is. Usually this is because the firewall/IPS/IDS has been triggered due to lots of SIP attemtps at once, like if many of the phones want to try an reregister at once. I just go and look for blocked sites in the firewall/utm/whatever, and usually it is our voip/sip provider's ip listed there. Remove i

    @scottalanmiller Have you rebooted the PBX yet?

    Yes, we did early on. No change.



  • @i3 said in All Yealink Phones Down at One Site:

    Factory reset a phone and try manually programming? How about a packet capture at the site firewall, do you see attempts from that IP phone to where? I am assuming that the PBX is hosted elsewhere.

    This is basically where we went. We did this series of events and it appears to be fixed...

    1. Rebooted the firewall.
    2. Removed SIP-ALG, but this appears to have had no effect.
    3. Updated the firmware on each phone.
    4. Went to static DNS rather than DHCP set.
    5. DNS to CloudFlare instead of AD.
    6. Ensures STUN was on and set correctly.
    7. Added the PBX as a Proxy, manually enabled STUN for the proxy.

    After those seven steps, all phones are now registered. Couldn't find any step there that individually made a difference.



  • @scottalanmiller said in All Yealink Phones Down at One Site:

    @i3 said in All Yealink Phones Down at One Site:

    Factory reset a phone and try manually programming? How about a packet capture at the site firewall, do you see attempts from that IP phone to where? I am assuming that the PBX is hosted elsewhere.

    This is basically where we went. We did this series of events and it appears to be fixed...

    1. Rebooted the firewall.
    2. Removed SIP-ALG, but this appears to have had no effect.
    3. Updated the firmware on each phone.
    4. Went to static DNS rather than DHCP set.
    5. DNS to CloudFlare instead of AD.
    6. Ensures STUN was on and set correctly.
    7. Added the PBX as a Proxy, manually enabled STUN for the proxy.

    After those seven steps, all phones are now registered. Couldn't find any step there that individually made a difference.

    Ah yes, I love it when the solution is unknown. Been there plenty of times.



  • At least the fix was repeatable.



  • @scottalanmiller said in All Yealink Phones Down at One Site:

    At least the fix was repeatable.

    I think that is silly.

    Was a packet capture not possible?



  • @scottalanmiller said in All Yealink Phones Down at One Site:

    @i3 said in All Yealink Phones Down at One Site:

    Factory reset a phone and try manually programming? How about a packet capture at the site firewall, do you see attempts from that IP phone to where? I am assuming that the PBX is hosted elsewhere.

    This is basically where we went. We did this series of events and it appears to be fixed...

    1. Rebooted the firewall.
    2. Removed SIP-ALG, but this appears to have had no effect.
    3. Updated the firmware on each phone.
    4. Went to static DNS rather than DHCP set.
    5. DNS to CloudFlare instead of AD.
    6. Ensures STUN was on and set correctly.
    7. Added the PBX as a Proxy, manually enabled STUN for the proxy.

    After those seven steps, all phones are now registered. Couldn't find any step there that individually made a difference.

    To me looks like the DNS was caching something for the devices to get to. That's my theory.

    @scottalanmiller said in All Yealink Phones Down at One Site:

    1. Updated the firmware on each phone.
    2. Went to static DNS rather than DHCP set.
    3. DNS to CloudFlare instead of AD.

    After those seven steps, all phones are now registered. Couldn't find any step there that individually made a difference.



  • @dbeato said in All Yealink Phones Down at One Site:

    @scottalanmiller said in All Yealink Phones Down at One Site:

    @i3 said in All Yealink Phones Down at One Site:

    Factory reset a phone and try manually programming? How about a packet capture at the site firewall, do you see attempts from that IP phone to where? I am assuming that the PBX is hosted elsewhere.

    This is basically where we went. We did this series of events and it appears to be fixed...

    1. Rebooted the firewall.
    2. Removed SIP-ALG, but this appears to have had no effect.
    3. Updated the firmware on each phone.
    4. Went to static DNS rather than DHCP set.
    5. DNS to CloudFlare instead of AD.
    6. Ensures STUN was on and set correctly.
    7. Added the PBX as a Proxy, manually enabled STUN for the proxy.

    After those seven steps, all phones are now registered. Couldn't find any step there that individually made a difference.

    To me looks like the DNS was caching something for the devices to get to. That's my theory.

    If so, why did changing the DNS alone not fix it?



  • @scottalanmiller said in All Yealink Phones Down at One Site:

    @dbeato said in All Yealink Phones Down at One Site:

    @scottalanmiller said in All Yealink Phones Down at One Site:

    @i3 said in All Yealink Phones Down at One Site:

    Factory reset a phone and try manually programming? How about a packet capture at the site firewall, do you see attempts from that IP phone to where? I am assuming that the PBX is hosted elsewhere.

    This is basically where we went. We did this series of events and it appears to be fixed...

    1. Rebooted the firewall.
    2. Removed SIP-ALG, but this appears to have had no effect.
    3. Updated the firmware on each phone.
    4. Went to static DNS rather than DHCP set.
    5. DNS to CloudFlare instead of AD.
    6. Ensures STUN was on and set correctly.
    7. Added the PBX as a Proxy, manually enabled STUN for the proxy.

    After those seven steps, all phones are now registered. Couldn't find any step there that individually made a difference.

    To me looks like the DNS was caching something for the devices to get to. That's my theory.

    If so, why did changing the DNS alone not fix it?

    Not sure, I am just going by what you said. Basically very strange.