SIP Troubles
-
I thought I'd just share my SIP issues.
I'll start by telling you that we have:
Axxess digital system with 60 phones, Building 1
Mitel 5000 VOIP with 35 phones, Building 2
Mitel 5000 VOIP with 20 phones, Building 3Building 1 and 2 are connected via private fiber at 1 Gb.
Building 2 and 3 are connected via VPNWe have SIP provided by Cox Communication with an onsite Edgemark Session Border Controller (SBC) appliance.
IP to the SBC is limited only to the Mitel 5000 in building 2 (same building, same subnet, same VLAN, flows through two switches)About 2 weeks ago one of my operators tells me that she's getting an alarm 145 on her phone. It would be there for a few seconds then gone.
I looked alarm 145 up and found this: A145 ALARM - SIP Peer Out-of-Service Mitel 5000 HX Ver5.1
OK a SIP related problem. So I started with my PBX vendor - please look over the logs and tell me what you see. They tell me they see no problems - sigh, OK, then they suggest I call Cox.
I call Cox, they open a ticket and call me back the next day.
Cox reports that they see some error, nothing big, but they believe that they are related to the fact that network side of the Edgemark is set to 10 half duplex and they want to change it to 100 full. I schedule for them to make this change at 6 PM that night. They call me at 6:10 and tell me the change is made and the error they were seeing were gone - Great, or so I thought.
So the next day, my operator reports more Alarm 145's. Hmmm... So I call Cox again. They report no errors on their side. So I call the PBX vendor and ask them to look again. They say they will look into and get back to me.
Now keep in mind that no one in the office has reported any issues with the phones, the only reason we are looking at anything is because of the Alarm 145's on the operator console.
So 3 or 4 days go by, we're now in the second week 2/8/2016, and I get an email from a user with a picture of her phone with and error on it.
Destination is not Responding
Now we have production impact. We've probably been having it since the first reports of the Alarm 145, but no one was complaining.
As it turns out, this only happens when making outgoing calls, and appears to happen randomly.
This information was provided to the vendor and a global email was sent to the company asking people specifically to report any weirdness with the phones. Well that just opened the flood gates. Within 5 mins of sending that email I had around 10 people telling me it's been happening for days.
No one bothered to mention it because after a try or two or three, they could usually make their phone call, and well, heck I'm done calling now so what does it matter?
Moving on - I reported this additional information to our PBX vendor. The PBX vendor collected a new set of logs and sent them off to the PBX manufacturer.
Now by Wednesday, the PBX vendor and I agreed that they and Cox needed to get together to try to solve this problem. I called Cox and told them to contact the vendor directly so they could work on this problem. Cox presented no issue with this request. Come some time on Thursday the manufacturer suggested a change - that change was made which made the Destination is not Responding problem much worse. Now when trying to dial out, it would take an average of 4 tries to make a call. That change was undone within 30 mins and things settled back down.
Now this was happening on a Friday. This report was provided to the PBX manufacturer, who responded after we closed with a recommendation to enable SIP logging.Note - Cox never called anyone in response to my request that they work directly with the vendor to solve this.
Monday SIP logging was enabled and a request was made to me to grab the logs as quickly after someone had the issue. No problem - within 20 mins I had sent three sets of logs (three different people reported the problem in short order, I grabbed a log after each call). The PBX vendor sent those logs to the manufacturer - who didn't respond, and as far as I know still hasn't responded.
So yesterday, at the end of the day I called Cox and told them that they will put a tech on the phone with my vendor at 7:30 Tuesday morning. The plan was made and it was kept.
Shortly there after I get a call from the PBX vendor telling me that Cox is looking into it. At 11:30 my PBX vendor calls again, says Cox needs me to inform them (Cox) of when errors are happening so they can watch the logs specifically at those times in an effort to work on this.
I schedule for the PBX vendor to be onsite at 1 PM and we'll conference Cox in and see if we can make the issue happen.**This is important: ** I don't recall the exactly timing, but at somepoint before this, Cox had discovered that there were rejections in their logs for invalid authentication on calls.
Once we were all conference together, the very first test call I made failed with Destination is not Reponding - I give Cox the phone number I was trying to dial and they locate the record. We make calls for another 30 mins, most succeeding, a few failing.
We were off the phone by 1:50 and Cox is currently still trying to find why authentication is failing on some calls and not all or none.
I'll continue the story as there are more details.
-
I have a similar situation but on a completely different platform. In my case we are on PRIs with NEC PBX SV9100 phone system. All phones connected via LAN. The first couple of weeks I would receive reports that phone are "locking up" meaning they are not responsive to any thing. The issue usually resolve by soft reset the phone.
It is very annoying to end-users. After a whole week of monitoring and troubleshooting we found the problem with DHCP and Subnet. I was not aware that our scope on DHCP was running out. When the phones were installed half of them picked up subnet .1.x while the other half picked up subnet .2.x. This cause issue because the phone sometime lose connection to the PBX phone system.
The fix for us was to move everything to static IP address and put every phone on the same subnet. The issue was resolved and I have no heard of anything since.
I highly doubt this is the same cause as to what OP is experiencing, but I thought it is worth to shade some light.
-
@LAH3385 said:
I have a similar situation but on a completely different platform. In my case we are on PRIs with NEC PBX SV9100 phone system. All phones connected via LAN. The first couple of weeks I would receive reports that phone are "locking up" meaning they are not responsive to any thing. The issue usually resolve by soft reset the phone.
It is very annoying to end-users. After a whole week of monitoring and troubleshooting we found the problem with DHCP and Subnet. I was not aware that our scope on DHCP was running out. When the phones were installed half of them picked up subnet .1.x while the other half picked up subnet .2.x. This cause issue because the phone sometime lose connection to the PBX phone system.
The fix for us was to move everything to static IP address and put every phone on the same subnet. The issue was resolved and I have no heard of anything since.
I highly doubt this is the same cause as to what OP is experiencing, but I thought it is worth to shade some light.
That's an interesting problem.
So you were deploying to different subnets worth of IPs (assuming /24) on the same VLAN?
What was the cause for the phone to not be able to reach the PBX? a router problem?
-
@Dashrender said:
@LAH3385 said:
I have a similar situation but on a completely different platform. In my case we are on PRIs with NEC PBX SV9100 phone system. All phones connected via LAN. The first couple of weeks I would receive reports that phone are "locking up" meaning they are not responsive to any thing. The issue usually resolve by soft reset the phone.
It is very annoying to end-users. After a whole week of monitoring and troubleshooting we found the problem with DHCP and Subnet. I was not aware that our scope on DHCP was running out. When the phones were installed half of them picked up subnet .1.x while the other half picked up subnet .2.x. This cause issue because the phone sometime lose connection to the PBX phone system.
The fix for us was to move everything to static IP address and put every phone on the same subnet. The issue was resolved and I have no heard of anything since.
I highly doubt this is the same cause as to what OP is experiencing, but I thought it is worth to shade some light.
That's an interesting problem.
So you were deploying to different subnets worth of IPs (assuming /24) on the same VLAN?
What was the cause for the phone to not be able to reach the PBX? a router problem?
That... I don't know the real cause. I am too noobish to understand how signals are translate from phone to PBX and vice versa. My uneducated guess what was the problem would be that... it's the router fault. Sorry I just don't know what happened.
We only have one floor/office so everything is connected via LAN. Unlike yours which run across buildings and VPN. -
You need to configure quite a bit on the router for SIP phones to transerve with no issues depending on the phone system.
-
@Jason said:
You need to configure quite a bit on the router for SIP phones to transerve with no issues depending on the phone system.
His network seems pretty simple. Unless he was doing any type of filtering between networks via the router, there shouldn't have been anything affecting this in the router. He's not using SIP to the telco provider, he said he has a PRI.
-
@LAH3385 said:
@Dashrender said:
@LAH3385 said:
I have a similar situation but on a completely different platform. In my case we are on PRIs with NEC PBX SV9100 phone system. All phones connected via LAN. The first couple of weeks I would receive reports that phone are "locking up" meaning they are not responsive to any thing. The issue usually resolve by soft reset the phone.
It is very annoying to end-users. After a whole week of monitoring and troubleshooting we found the problem with DHCP and Subnet. I was not aware that our scope on DHCP was running out. When the phones were installed half of them picked up subnet .1.x while the other half picked up subnet .2.x. This cause issue because the phone sometime lose connection to the PBX phone system.
The fix for us was to move everything to static IP address and put every phone on the same subnet. The issue was resolved and I have no heard of anything since.
I highly doubt this is the same cause as to what OP is experiencing, but I thought it is worth to shade some light.
That's an interesting problem.
So you were deploying to different subnets worth of IPs (assuming /24) on the same VLAN?
What was the cause for the phone to not be able to reach the PBX? a router problem?
That... I don't know the real cause. I am too noobish to understand how signals are translate from phone to PBX and vice versa. My uneducated guess what was the problem would be that... it's the router fault. Sorry I just don't know what happened.
We only have one floor/office so everything is connected via LAN. Unlike yours which run across buildings and VPN.mind If I ask you position in the company? Are there full time IT personal there?
-
@Dashrender said:
@LAH3385 said:
@Dashrender said:
@LAH3385 said:
I have a similar situation but on a completely different platform. In my case we are on PRIs with NEC PBX SV9100 phone system. All phones connected via LAN. The first couple of weeks I would receive reports that phone are "locking up" meaning they are not responsive to any thing. The issue usually resolve by soft reset the phone.
It is very annoying to end-users. After a whole week of monitoring and troubleshooting we found the problem with DHCP and Subnet. I was not aware that our scope on DHCP was running out. When the phones were installed half of them picked up subnet .1.x while the other half picked up subnet .2.x. This cause issue because the phone sometime lose connection to the PBX phone system.
The fix for us was to move everything to static IP address and put every phone on the same subnet. The issue was resolved and I have no heard of anything since.
I highly doubt this is the same cause as to what OP is experiencing, but I thought it is worth to shade some light.
That's an interesting problem.
So you were deploying to different subnets worth of IPs (assuming /24) on the same VLAN?
What was the cause for the phone to not be able to reach the PBX? a router problem?
That... I don't know the real cause. I am too noobish to understand how signals are translate from phone to PBX and vice versa. My uneducated guess what was the problem would be that... it's the router fault. Sorry I just don't know what happened.
We only have one floor/office so everything is connected via LAN. Unlike yours which run across buildings and VPN.mind If I ask you position in the company? Are there full time IT personal there?
Yes. I am full time system admin.
-
@LAH3385 said:
@Dashrender said:
@LAH3385 said:
@Dashrender said:
@LAH3385 said:
I have a similar situation but on a completely different platform. In my case we are on PRIs with NEC PBX SV9100 phone system. All phones connected via LAN. The first couple of weeks I would receive reports that phone are "locking up" meaning they are not responsive to any thing. The issue usually resolve by soft reset the phone.
It is very annoying to end-users. After a whole week of monitoring and troubleshooting we found the problem with DHCP and Subnet. I was not aware that our scope on DHCP was running out. When the phones were installed half of them picked up subnet .1.x while the other half picked up subnet .2.x. This cause issue because the phone sometime lose connection to the PBX phone system.
The fix for us was to move everything to static IP address and put every phone on the same subnet. The issue was resolved and I have no heard of anything since.
I highly doubt this is the same cause as to what OP is experiencing, but I thought it is worth to shade some light.
That's an interesting problem.
So you were deploying to different subnets worth of IPs (assuming /24) on the same VLAN?
What was the cause for the phone to not be able to reach the PBX? a router problem?
That... I don't know the real cause. I am too noobish to understand how signals are translate from phone to PBX and vice versa. My uneducated guess what was the problem would be that... it's the router fault. Sorry I just don't know what happened.
We only have one floor/office so everything is connected via LAN. Unlike yours which run across buildings and VPN.mind If I ask you position in the company? Are there full time IT personal there?
Yes. I am full time system admin.
Then you need to hire a IT Service Provider to help you get things cleaned up.
From the sound of your post, your network is a mess.
-
Getting this thread back on track.
My PBX vendor came onsite and did a remote session with Mitel. Mitel made several tweaks that solved nothing.
Then they added
5000 CP > System > Devices and Feature Codes > SIP Peers > SIP Trunk Groups > 9203 > Configuration >Route Sets > 1I'm not sure what this even does. My PBX vendor swears that they have never enabled this option before for any SIP trunks, including Cox.
Plus the fact that this has been working fine since November until Jan 29.
But shortly after this change, the errors stopped and calls have been working well ever since.
tosses hat in the air whatev's.
-
OK the problem seems to be gone.
My local vendor contacted Mitel and they offered the following fix.
as mentioned above, We've been on the SIP trunk since Nov/Dec, but now for whatever reason this change was required to make the problems stop... and they appear to have done just that.
-
@Dashrender what was the change? the entire section?
-
@JaredBusch said:
@Dashrender what was the change? the entire section?
yes the addition of the entire 1 option under Route Sets.
-
@Dashrender said:
@JaredBusch said:
@Dashrender what was the change? the entire section?
yes the addition of the entire 1 option under Route Sets.
That is setting your SIP to UDP Interesting.
-
@JaredBusch said:
@Dashrender said:
@JaredBusch said:
@Dashrender what was the change? the entire section?
yes the addition of the entire 1 option under Route Sets.
That is setting your SIP to UDP Interesting.
Interesting.. I stared at that for 20 mins trying to figure out what it was doing...
Isn't SIP UDP by default? if not, why would it need to be TCP?
-
@Dashrender said:
@JaredBusch said:
@Dashrender said:
@JaredBusch said:
@Dashrender what was the change? the entire section?
yes the addition of the entire 1 option under Route Sets.
That is setting your SIP to UDP Interesting.
Interesting.. I stared at that for 20 mins trying to figure out what it was doing...
Isn't SIP UDP by default? if not, why would it need to be TCP?
Honestly, SIP has no default that I am aware of. SIP works better over TCP and is explained decently in this thread.
RTP is a UDP default protocol.
-
Interesting. In my situation there are no routers only a switch between my PBX and their first box. And their box is locked down to only talk to my IP.