Device and User Tunnel on Windows 11 - it was hell

It all began a few months ago when I started a new job and took over some problems from the previous client engineer. One of those problems was setting up Microsoft’s AlwaysOn VPN. I’m not particularly familiar with Microsoft’s AlwaysOn solution, which is why I had to read up on it at the beginning. I quickly realized that this was not as easy as I had hoped.

The initial situation:

I think that everyone who has dealt with the configuration of AlwaysOn Microsoft will sooner or later come across this article:

https://docs.microsoft.com/en-us/windows-server/remote/remote-access/vpn/always-on-vpn/deploy/vpn-deploy-client-vpn-connections#bkmk_fullscript

We used these scripts to distribute VPN for the Windows 10 devices. This has worked well so far. We use PSADT and MECM (SCCM) for software distribution and have distributed both the DeviceTunnel (split tunnel) and the UserTunnel (full tunnel) to the clients via SCCM (both in the TS and additionally as a “Required” deployment).

The problem:

With Windows 11, the problem was that only the DeviceTunnel was created during setup and installation via the “Required” deployment. The UserTunnel was not available. The script ran without errors (from PSADT), but the tunnel was not present, although no changes were made to the script compared to the deployment of Windows 10.

The script by Richard Hicks has not been changed, only an XML file for the DeviceTunnel and an XML file for the UserTunnel were created. The following Powershell script was used in “Deploy-Application.ps1”:

Pre-Installation:

## <Disconnect Device Tunnel>
rasdial.exe DeviceTunnel /d
		
## <Remove Device Tunnel>
if (Get-VpnConnection -Name "DeviceTunnel" -AllUserConnection) 
{Execute-Process -Path "powershell.exe" -Parameters "-Command & {Remove-VpnConnection DeviceTunnel -force -AllUserConnection}" -Wait}
	
else {
    	Write-Host "VPN *DeviceTunnel* doesnt exists"
 }
		
		
## <Disconnect User Tunnel>
rasdial.exe UserTunnel /d
		
## <Remove Device Tunnel>
if (Get-VpnConnection -Name "UserTunnel" -AllUserConnection) 
{Execute-Process -Path "powershell.exe" -Parameters "-Command & {Remove-VpnConnection UserTunnel -force -AllUserConnection}" -Wait}
	
else {
    	Write-Host "VPN *UserTunnel* doesnt exists"
 }	

Installation:

Execute-Process -Path "powershell.exe" -Parameters "-Command & { & `"$dirFiles\AovpnConnection.ps1 -xmlFilePath $dirFiles\VPNtoDAP_DeviceTunnel_1.1.0.xml -ProfileName 'DeviceTunnel' -DeviceTunnel`"; Exit `$LastExitCode }" -Wait

## <User Tunnel VPNtoDAP Profile>	
Execute-Process -Path "powershell.exe" -Parameters "-Command & { & `"$dirFiles\AovpnConnection.ps1 -xmlFilePath $dirFiles\VPNtoDAp_UserTunnel.xml -ProfileName 'UserTunnel' -AllUserConnection`"; Exit `$LastExitCode }" -Wait

I started troubleshooting by reading up on the whole issue thoroughly. I saw various blog entries in which they talked about VPN problems in relation to Windows 11, but these were supposedly fixed by a KB from Microsoft. I could not confirm this.

The Solution:

To begin with, I decided to create a separate installation for the two tunnels within SCCM. At the beginning I did not change anything in the DeviceTunnel except that I used the latest script from Richard Hicks. For the UserTunnel I suddenly came across the following page:

https://github.com/ConfigJon/AlwaysOnVPN/blob/master/New-AovpnUserTunnel.ps1

I used this script and only changed the name of the tunnel and the version within the PS file. I integrated this in the “Deploy-Application.ps1” as follows:

Pre-Installation:

Execute-Process -Path "$PSHome\powershell.exe" -Parameters "-Command & { & `"$dirFiles\Remove-AovpnUserTunnel.ps1 -ProfileName 'UserTunnel'`"; Exit `$LastExitCode }" -Wait

Installation:

Execute-Process -Path "$PSHome\powershell.exe" -Parameters "-Command & { & `"$dirFiles\New-AovpnUserTunnel.ps1 -xmlFilePath $dirFiles\ProfileXML-User_1.2.xml`"; Exit `$LastExitCode }" -Wait

Deployment Settings:

Install behaviour: Install for system
Logon requirement: Only when a user is logged on

… and BAM! IT WORKED!

A few days later, my boss told me I need to add a new IP-Address to the routing table in the XML because an IP changed for the DeviceTunnel. So I thought: “Well, that shouldn’t be a problem because the DeviceTunnel never was a problem”. But I thought wrong.

The next problem

I tried to update the DeviceTunnel by simply removing it and creating it again. This worked as long as the DeviceTunnel was not connected. But as soon as the device was connected to a hotspot or home WiFi, this did not work. The PSADT-Log reported that a connected VPN Tunnel cannot be removed.

After a few hours of troubleshooting, I stumbled across the following Reddit post:

https://www.reddit.com/r/SCCM/comments/h9emsg/aovpn_profile_deployment_with_sccm_lessons_learned/

This article describes exactly this problem. The OP explains that you can change the authentication method so that the DeviceTunnel is disconnected and then you can remove it:

 Set-VPNConnection -AllUserConnection -Name "TunnelName01" -AuthenticationMethod EAP 

Unfortunately, this didn’t quite work out. The DeviceTunnel then had the status “Action needed” and could not be removed. I then found out that if I clicked on “Retry” several times in the Software Center, the installation suddenly worked. And indeed, if you enter the command “rasdial.exe DeviceTunnel /d” four times (exactly four times!) after changing the authentication method, the DeviceTunnel is disconnected. So I have adapted my script as follows:

Pre-Installation:

 #Set a different AuthenticationMethod to disconnect Device Tunnel
 if (Get-VpnConnection -Name "DeviceTunnel" -AllUserConnection) 
    {
      Set-VPNConnection -AllUserConnection -Name "DeviceTunnel" -AuthenticationMethod EAP        
      ## <Disconnect Device Tunnel>
      rasdial.exe DeviceTunnel /d
      Start-Sleep -Seconds 15
      rasdial.exe DeviceTunnel /d
      Start-Sleep -Seconds 3
      rasdial.exe DeviceTunnel /d
      Start-Sleep -Seconds 3
      rasdial.exe DeviceTunnel /d
      Start-Sleep -Seconds 3       
     }
else
     {
      Write-Host "VPN *DeviceTunnel* doesnt exists"
     }

## <Remove Device Tunnel>
if (Get-VpnConnection -Name "DeviceTunnel" -AllUserConnection) 
{Execute-Process -Path "powershell.exe" -Parameters "-Command & {Remove-VpnConnection DeviceTunnel -force -AllUserConnection}" -Wait}

else {
    Write-Host "VPN *DeviceTunnel* doesnt exists"
  }

Installation:

Execute-Process -Path "powershell.exe" -Parameters "-Command & { & `"$dirFiles\AovpnConnection.ps1 -xmlFilePath $dirFiles\ProfileXML-Device_1.6.xml -ProfileName 'DeviceTunnel' -DeviceTunnel`"; Exit `$LastExitCode }" -Wait

… and finally I was able to update the DeviceTunnel with the new IP-Address in the XML File.

The files and folder structure for DeviceTunnel:

AppDeploymentToolkit
Files
_AovpnConnection.ps1
_ProfileXML-Device_1.6.xml
_Remove-AovpnConnection.ps1
SupportFiles
Deploy.Application.exe
Deploy-Application.exe.config
Deploy-Application.ps1

The files and folder structure for UserTunnel:

AppDeploymentToolkit
Files
_New-AovpnUserTunnel.ps1
_ProfileXML-User.xml
_Remove-AovpnUserTunnel.ps1
SupportFiles
Deploy.Application.exe
Deploy-Application.exe.config
Deploy-Application.ps1

I hope this post is helpful for some of you guys. Have a nice weekend everyone!

None of this applies to my environment but upvoted because of the effort in that write up.

I always assumed this was just something people in MCSE tests used, and no actual company ever used it.

All of that sounds horrible.

We followed Richard Hicks guides and it was pretty seamless moving from Direct Access to AoVPN. We have the device tunnel with limited access to our DCs, SCCM and CA. User forced tunnel with MS exceptions which is deployed like the device via SCCM or Intune. We autopilot devices from home and it’s seamless and have over 2000 connections most days.

I will add the user tunnel doesn’t let you change the config using powershell on windows 11. Like you said it needs removed or a new one created with another name then delete the existing one. Richard describes running it while a user is logged on as the user tunnel is per profile unless you set it as all user connection. I would suggest you give the device tunnel access to your SCCM servers.

If you have Intune you may be better using that for Windows 11 it seems to work better and you just import the same XML.

I have been though the same struggles with AoVPN!Created from scratch a deployment script but we went a slightly different way. This is because we have 2 domains and we require AlwaysOn True & False (Auto Connect or not) for specific users (mainly IT). We also have two user configurations one has the default route out to the internet and the other defaults into our datacentre. The type of connection the user receives is based on registry keys written to the machine with targeted GP Preferences.

Instead of SCCM deploying it directly we have SCCM deploying what we call a “Deployer”. This contains a copy of the current PS1 file. The PS1 file itself is split into different sections (The script is run with different command line switches in order to run a specific section). The Install/Uninstall of the deployer, the repair tool and the 3 tunnel configurations (1 device, 2 user). The install places the PS1 script into a folder, then creates a scheduled task that runs the PS1 on every user login. The benefit of this is that GP Preferences can be used to update the PS1 meaning less administration.

The repair basically runs a total reinstall of the deployer and device/user tunnels so if any issues occur it can be fixed without the users having to be on the network. THis is done from the Software Centre “repair” button.

What we call the run section pretty much works as normal in that it will check the version of the currently applied script and skip if its matches (both for device and user tunnel). It also can switch between the two default routes if needed. Also specific DNS server IP’s for each domain.

Uninstall section removes everything.

The sections are based on PSADT’s Install/Uninstall/Repair method with slight tweaks.

I see you are having the same issues as I did recently with rasdial /disconnect not working correctly every time so this code might help: (It’s what I currently use, you will need to change the wildcard name its looking for to what you named the device connection )

Well I would share the code if I could get it to format correctly!!

How do I get code to show correctly on here!?

Hope that helps!

Did you think about using the application rules to stop processes so that the tunnel would no longer be in use? I believe powershell has those same abilities.

We’re in the middle of a companywide AOVPN device tunnel rollout (just crossed the 50% mark) and it’s by far one of the smoothest rollouts I’ve ever been involved it (I’m running it, LOL, so I might be a little biased).

We’re using just a GPO and two tasks - one copies the latest version of the tunnel script to a specific location on the PC, and the other that actually starts the script based on specific triggers and events (e.g. event 10000 (change in network connections - I might drop that), event 5719 (PC cannot establish session with a DC) and event 1129 (GP failed because of lack of network connectivity to a DC). This ensures that the PC will always try to start or rebuild the device tunnel if there’s no connectivity back to the office, and if some numpty with admin rights manages to stop it, the task will start it up again after the next GP refresh attempt. I’ve also got the script creating log files whenever it runs and these are logs are copied to a central location for any required analysis. Works smooth as butta …

Connectivity through the device tunnels is proving to be so reliable that I’m going to make a case for replacing our 3rd party VPN solution with AOVPN user tunnels.

Ok, some gotchas…

It’s annoying that you have to restart the RRAS server if you want to increase or decrease the number of “ports”. Why can’t this be done on the fly? Yes, of course, I can just configure the server with a huge number of ports from the beginning, but this is the first time we’re deploying it and I’d prefer if a prospective client does not connect, rather than connects and then is impacted by a congested server. I’m increasing the port count slowly through the deployment while monitoring the maximum number of connections and the server’s performance.

Oh, restarting the RRAS service while it’s supporting active device tunnel connections can lead to some interesting network connectivity problems on the clients. You’d think that their device tunnels would just drop, but no, some of them remain up for some reason, but providing no actual connection, impacting the PC’s operation if they connect back to the corporate network by other means (e.g. 3rd party VPN). I updated the device tunnel script with a fairly easy workaround for this - if it was triggered and found that the tunnel was up but there was no connectivity, it called rasdial/disconnect to stop the tunnel and then rebuilt it.

There are no fancy charts or graphics available in RRAS, and you know the TPTB love their fancy charts! No real problem - I configured a scheduled task on the server that collected several statistics (active PCs in the domain, PCs configured to use the service, maximum number of connections, etc.) and then e-mailed them to a DL as a CSV file every morning. From there it’s an easy cut’n’paste into an Excel file to generate the required charts to show how the deployment was going.

Oh, “stale connections” are still happening, even though MS supposedly provided a fix for it. I updated the statistics script to identify and stop the stale connections before capturing the statistics.

Of course, TPTB, after seeing the fancy deployment chart also wanted to know what the usage of the AOVPN service was like during the day. No problem - I configured another scheduled task to run on the server that basically takes a snapshot of active connections every hour and dumps it into a CSV file in a way that it keeps data for the last 7 days. Oh, and that lead to some amusing observations about the WFH people and the times that they actually started working, but that’s a story for another day … :-).

Finally, the device tunnel uses certificate-based authentication, using certificates distributed from our PKI. It seemed to make sense to provide access to the subordinate CA through the device tunnel, just in case one of the PC clients needed to auto-update any of its assigned certificates.

It’s the mirror universe version of DenverCoder9

I guess it works fine once it is setup and running. But yes, getting there was horrible lol but I guess that is what you have to expect from a Microsoft Solution

It’s actually a pretty good solution and basically trouble-free once you’ve got it up and running. And I’ve seen worse with other IT deployments. A LOT worse. Deploying AOVPN was a breeze in comparison.

welcome to microsoft! Bend over and give us your wallet!

Hey Mienzo

Appreciate your advice. Unfortunately, we didn’t use Intune for Software deployment so far. But I will keep this in mind!

Did you run into issues with user logons taking up to 5 minutes while on Wi-Fi with the device vpn? Experiencing this in our current IKEV2 deployment with the RRAS Server deployed in azure.

Does he have a specific guide for migrating? I cant seem to find it on his site.

How do you repair the application? I have repair function as well but basically it just remove and add the tunnels again.

I used the codeblock function but I think it is only available on PC.

Are you able to share the code for the rasdial disconnect issue? I’m currently doing proof of concept at my job and would love to know what you used to solve this issue.

Oh, I forgot one more thing - the stupid “metric” problem, where if the device tunnel received a high metric, stupid Windows would route ALL DNS queries to the device tunnel, basically breaking DNS for the client. While there’s a workaround that involves fondling the device tunnel’s assigned metric via script, I went with the lazy-ass method - I just configured the RRAS server to run its own DNS, with a conditional forwarder to the domain. As the clients take their DNS settings from the server, this ensures that DNS still works for them if poxy Windows decides to route all DNS traffic through the tunnel.

The only time we see a slight delay is if we do a fresh start when they are at home. This is more due to it needing another re-boir after renaming the device. Do you give your device tunnel access to all your DCs ect… It could be a group policy issue.

I followed Richard Hicks article and only provided access to the DC in Azure on the same subnet. The strange thing is that the there is no logon delay if the device is wired-in. I will try to expand the list of DCs it has access to and see if that helps.