howto://troubleshoot microsoft vpn connections part three-tales from the trenches

by Ed Fisher on 2010-02-12

in Security

 vpn

 

Well if you are still with me after reading part one and part two of this series, thank you for your dedication. I’m glad you found something there to keep you interested. This last part of the series is where I want to share a few things that I have found that just didn’t fit in with my expectations, didn’t throw an error code that pointed me to the solution, and won’t fit in any other neat and tidy category.

In all of these, the root cause was not readily apparent. In most cases, the resolution took an hour or more and require more than the normal amount of detective work, an occasional swag, or divine inspiration to find and resolve the boggle. Maybe one of these might save you time and frustration some day.

 

RFC1918 gone wild!

I once encountered a user who could successfully connect to the VPN, but could not access any resources. No matter what was tried, the user just could not get anything to respond. There were no errors on the VPN server, none on the client side…we were clueless. As is usually the case when I can’t find a clue, I resorted to Wireshark to see what traffic was coming across the VPN. Once connection was made, there was no further traffic,either on the VPN tunnel, or from the client’s Internet connection. On our VPN, we assigned ip.addrs from a pool 10.255.255.0/24, and the rest of our internal addresses were in the 10’s range. In this particular hotel, the pool was 10.0.0.0/8. Yes, you see it now don’t you? The hotel pool encompassed the entire /8 as a local subnet, so the client did exactly as it should have…it tried to ARP everything as local since according to its routing table, everything was local (well, at least its DNS servers.) Some static routing table magick got him up and running, and we added that particular hotel to the list of places never to stay. Basically we had to add routing entries for the specific machines/subnets he needed to reach on the internal network to use his VPN assigned ip.addr as the gateway instead of his hotel assigned ip.addr. It wasn’t pretty, but it worked.

Kerberos error 34

This one affects lots of VPN clients, not just Windows’ version, but only with Windows XP. Vista and 7 will use TCP by default, but Windows XP’s implementation of Kerberos favours UDP, and will always try to use it. Most of the time that will work just fine, but when you have a user with a larger number of group memberships, on a pipe with a smaller than normal MTU, things could get ugly. The symptom will be that the user can connect to the VPN, but cannot authenticate to anything. Checking the client’s logs only shows authentication failures, Netlogon 5719s, or SPNEGO 40960s. Checking the domain controller’s logs shows Kerberos error 34, Response too large. When the client tries to use UDP, and the server determines that the ticket will be too large for UDP, it will send back an error 34 message, which should be enough to force the client to try again using UDP. However, these messages often don’t make it, either because the appear to be an unsolicited UDP message, or they get filtered by NAT, or they can’t make it back up the pipe.
You can determine the MTU using PING. By default, Ethernet, DSL, and Cable should all support a maximum transmission unit of 1518 bytes. When you take into account the headers and checksums, that leaves 1372 bytes worth of payload. To see if your connection does, try this, using the fqdn of one of your internal servers.

ping –l 1372 –f fqdn [enter]

That says to send a ping with 1372 bytes of payload, and to set the “Don’t Fragment” flag. If you have an MTU issue, you’ll get this as a response.

Packet needs to be fragmented by DF set.

You can play around with the MTU to find the maximum, but if it is much below 1200, Kerberos over UDP is going to be an issue. Setting the client to always use Kerberos over TCP is a fast and easy fix, and won’t make any noticeable difference to them when on the LAN. See KB244474 for the steps to force Kerberos to always use TCP, and see this post for a complete list of Kerberos response codes.

Hard coded DNS servers, HOSTS files, and LHOSTS files

We tend to assume name resolution works, because it usually does. When it doesn’t, things can get ugly fast, and if we aren’t doing things like pings to notice it, we might spend a lot of time troubleshooting the wrong thing. It only takes a second to make sure that DNS servers aren’t hard coded, or to peak at the HOSTS and LMHOSTS files to see if they have bad data.

Hard coded Ethernet port speeds/duplex settings

At some point in the past, a well-meaning but misguided desktop support tech went into the properties of the Ethernet port on a user’s laptop, and set it to 10Mbps/Half-duplex. While that worked in the office (the switches were set to auto-detect,) the user called up from the hotel because they couldn’t get any connection out to the Internet, let alone to the VPN. Seems the hotel’s wired services in the room were 100-Full only. It probably took us a full two hours to get down low enough in the OSI model to check that. If the user cannot even get a DHCP assigned address, that is the first thing I check now.

Local Credentials

One of the neat time saving options in the Microsoft VPN client is the option to log on using the credentials of the currently signed-in user. Of course, if they logged on to the local o/s using a machine local account, that is not going to successfully authenticate them to the domain when they try to access the VPN.

Binding order of NDIS/WAN versus physical interfaces

I’ve only run into this on a couple of XP image builds, and we never did figure out what it was about that particular build that caused this issue, but something made the client ALWAYS query their local DNS server (provided by their ISP) for names. Since internal names of course would not resolve, the end result was the ability to connect to the VPN, and to access any resources using ip.addrs or NetBIOS names, but never using FQDN. KB311218 contains the reg hack needed to fix this behaviour.  

Preshared Key typos

IPSec can used pre-shared keys for authentication, instead of certificates. You can also drive without a seatbelt. Neither are recommended, and while you can get away with it for a very long time, eventually something comes up that makes you regret it. PSKs are case sensitive, and most folks think more complicated PSKs are better. While a longer PSK might resist brute-force attacks better than a short one, PSKs are visible in the clear by accessing the properties of the connection, and are generally commonly known. I’ve seen manually created connections where the PSK that was typed in sounds like the correct one, but wasn’t, and I’ve also seen them entered in all lowercase when the correct key is all caps. If you can’t get past phase 1 of your IPSec negotiation, and you’re using PSKs, check them.

Assuming NAT when it is not there

Some hotel access lets you select the option of getting a live ip.addr instead of a NAT’ed one. The wording that usually accompanies that check box is something like “Check to get a live address.” Heck, that sounds better than a dead address, so users who read, but don’t understand, might do just that. Remember that IPSec and NAT don’t get along well unless the AssumeUDPEncapsulationContextOnSendRule registry key is set appropriately. Don’t assume..if you are using IPSec, use http://www.whatismyip.com and ipconfig to be sure you know what is happening on the user’s side, and set your flag appropriately. Remember, changing that requires a reboot to take effect!

Final thoughts

This series has tried to cover all the bases, but there is so much more out there, you may have no choice but to dig deeper. You can enable various debug logging by following the steps at http://technet.microsoft.com/en-us/library/cc783498.aspx, and there are several tools for further troubleshooting available at http://technet.microsoft.com/en-us/library/cc754825.aspx. You’ll also find the complete list of VPN error codes at http://support.microsoft.com/kb/923944. If you encounter a unique situation, please leave a comment to share what you found and how you fixed it. We can all benefit from one another when we share this way. In parting, let me just make sure we are all keeping it real by sharing this little nugget I found on the tubes. After all our efforts and brilliant troubleshooting to resolve a user’s issue, we may just find that they didn’t really need to use the VPN after all.
 

Do you have any war stories you’d like to share? Leave a comment letting us know your most unique situation.

You might also enjoy:

  1. howto://troubleshoot microsoft vpn connections part one-server side issues
  2. howto://troubleshoot microsoft vpn connections part two-client side issues
  3. howto://Installing Microsoft Forefront TMG 2010, part one
  4. howto://Installing Microsoft Forefront TMG 2010, part two

{ 2 comments… read them below or add one }

Alan C 2011-01-25 at 08:25

Thanks for this – very good article – helped me solve a problem that has been bugging me for ages with one user – the XP using local gateway.

Reply

Ed Fisher 2011-01-25 at 08:50

Delighted to be of service! Thanks for letting me know.
Cheers,
Ed

Reply

Leave a Comment

Previous post:

Next post: