Script to clone a VM with free VMware ESXi

clones

Many people run free versions of ESXi, particularly in lab environments. Unfortunately with the free version of ESXi, the VMware API is read-only. This limits (or complicates) automation.

I was looking for a way to clone guest VMs with the minimum of effort. This script, which took inspiration from many sources on the internet, is the result. It takes advantage of the fact that although the API is limited, there are plenty of actions you can take via SSH, including calls to vim-cmd.

<#

.SYNOPSIS
    Clones a VM,

.DESCRIPTION
    This script:

    - Retrieves a list of VMs attached to the host
    - Enables the user to choose which VM to clone
    - Clones the VM

    It must be run on a Windows machine that can connect to the virtual host.

    This depends on the Posh-SSH and PowerCLI modules, so from an elevated
    PowerShell prompt, run:

        Install-Module PoSH-SSH
        Install-Module VMware.PowerCLI

    For free ESXi, the VMware API is read-only. That limits what we can do with
    PowerCLI. Instead, we run certain commands through SSH. You will therefore
	need to enable SSH on the ESXi host before running this script.
	
	The script only handles simple hosts with datastores under /vmfs. And it
	clones to the same datastore as the donor VM. Your setup and requirements
	may be more complex. Adjust the script to suit.

.EXAMPLE
    From a PowerShell prompt:

      .\New-GuestClone.ps1 -ESXiHost 192.168.101.100

.COMPONENT
    VMware scripts

.NOTES
    This release:

        Version: 1.0
        Date:    8 July 2021
        Author:  Rob Pomeroy

    Version history:

        1.0 - 8 July 2021 - first release

#>
param
(
    [Parameter(Mandatory = $true, Position = 0)][String]$ESXiHost
)

####################
## INITIALISATION ##
####################

# Load necessary modules
Write-Host Loading PowerShell modules...
Import-Module PoSH-SSH
Import-Module VMware.PowerCLI

# Change to the directory where this script is running
Push-Location -Path ([System.IO.Path]::GetDirectoryName($PSCommandPath))


#################
## CREDENTIALS ##
#################

# Check for the creds directory; create it if it doesn't exist
If(-not (Test-Path -Path '.\creds' -PathType Container)) {
    New-Item -Path '.\creds' -ItemType Directory | Out-Null
}

# Looks for credentials file for the VMware host. Passwords are stored encrypted
# and will only work for the user and machine on which they're stored.
$credsFile = ('.\creds\' + $ESXiHost + '.creds')
If(-not (Test-Path -Path $credsFile)) {
    # Request credentials
    $creds = Get-Credential -Message "Enter root password for VMware host $ESXiHost" -User root
    $creds.Password | ConvertFrom-SecureString | Set-Content $credsFile
}
$ESXICredential = New-Object System.Management.Automation.PSCredential( `
    "root", `
    (Get-Content $credsFile | ConvertTo-SecureString)
)


#########################
## List VMs (PowerCLI) ##
#########################
#
# Disable HTTPS certificate check (not strictly needed if you use -Force) in
# later calls.
Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false | Out-Null

# Connect to the ESXi server
Connect-VIServer -Server $ESXiHost -Protocol https -Credential $ESXICredential -Force | Out-Null
If(-not $?) {
    Throw "Connection to ESXi failed. If password issue, delete $credsFile and try again."
}

# Get all VMs, sorted by name
$guests = (Get-VM -Server $ESXiHost | Sort-Object)

# Work out how much we need to left-pad the array index, when outputting
$padWidth = ([string]($guests.Count - 1)).Length

# Output the list of VMs, with array index padded so it lines up nicely
Write-Host ("Existing VMs (" + $guests.Count + "), sorted by name:")
for ( $i = 0; $i -lt $guests.count; $i++)
{
    If($guests[$i].PowerState -eq "PoweredOn") {
        Write-Host -ForegroundColor Red ("[" + "$i".PadLeft($padWidth, ' ') + "](ON) : " + $guests[$i].Name) 
    } Else {
        Write-Host ("[" + "$i".PadLeft($padWidth, ' ') + "](off): " + $guests[$i].Name) 
    }
}
Write-Host


##########################
## Choose a VM to clone ##
##########################

$chosenVM = 0
do {
    $inputValid = [int]::TryParse((Read-Host 'Enter the [number] of the VM to clone (the donor)'), [ref]$chosenVM)
    if($chosenVM -lt 0 -or $chosenVM -ge $guests.Count) {
        $inputValid = $false
    }
    if (-not $inputValid) {
        Write-Host ("Must be a number in the range 0 to " + ($guests.Count - 1).ToString() + ". Try again.")
    }
} while (-not $inputValid)

# Check the VM is powered off
if($guests[$chosenVM].PowerState -ne "PoweredOff") {
    Throw "ERROR: VM must be powered off before cloning"
}

# Get VM's datastore, directory and VMX; we assume this is at /vmfs/volumes
If(-not ($guests[$chosenVM].ExtensionData.Config.Files.VmPathName -match '\[(.*)\] ([^\/]*)\/(.*)')) {
    Throw "ERROR: Could not calculate the datastore"
}
$VMdatastore = $Matches[1]
$VMdirectory = $Matches[2]
$VMXlocation = ("/vmfs/volumes/" + $VMdatastore + "/" + $VMdirectory + "/" + $Matches[3])
$VMdisks     = $guests[$chosenVM] | Get-HardDisk


###############################
## File test (PoSH-SSH SFTP) ##
###############################

# Clear any open SFTP sessions
Get-SFTPSession | Remove-SFTPSession | Out-Null

# Start a new SFTP session
(New-SFTPSession -Computername $ESXiHost -Credential $ESXICredential -Acceptkey -Force -WarningAction SilentlyContinue) | Out-Null

# Test that we can locate the VMX file
If(-not (Test-SFTPPath -SessionId 0 -Path $VMXlocation)) {
    Throw "ERROR: Cannot find donor VM's VMX file"
}


#################
## New VM name ##
#################

$validInput = $false
While(-not $validInput) {
    $newVMname = Read-Host "Enter the name of the new VM"
    $newVMdirectory = ("/vmfs/volumes/" + $VMdatastore + "/" + $newVMname)

    # Check if the directory already exists
    If(Test-SFTPPath -SessionId 0 -Path $newVMdirectory) {
        $ynTest = $false
        While(-not $ynTest) {
            $yn = (Read-Host "A directory already exists with that name. Continue? [Y/N]").ToUpper()
            if (($yn -ne 'Y') -and ($yn -ne 'N')) {
                Write-Host "ERROR: enter Y or N"
            } else {
                $ynTest = $true
            }
        }
        if($yn -eq 'Y') {
            $validInput = $true
        } else {
            Write-Host "You will need to choose a different VM name."
        }
    } else {
        If($newVMdirectory.Length -lt 1) {
            Write-Host "ERROR: enter a name"
        } else {
            $validInput = $true

            # Create the directory
            New-SFTPItem -SessionId 0 -Path $newVMdirectory -ItemType Directory | Out-Null
        }
    }
}


###################################
## Copy & transform the VMX file ##
###################################

# Clear all previous SSH sessions
Get-SSHSession | Remove-SSHSession | Out-Null

# Connect via SSH to the VMware host
(New-SSHSession -Computername $ESXiHost -Credential $ESXICredential -Acceptkey -Force -WarningAction SilentlyContinue) | Out-Null

# Replace VM name in new VMX file
Write-Host "Cloning the VMX file..."
$newVMXlocation = $newVMdirectory + '/' + $newVMname + '.vmx'
$command = ('sed -e "s/' + $VMdirectory + '/' + $newVMname + '/g" "' + $VMXlocation + '" > "' + $newVMXlocation + '"')
($commandResult = Invoke-SSHCommand -Index 0 -Command $command) | Out-Null

# Set the display name correctly (might be wrong if donor VM name didn't match directory name)
$find    = 'displayName \= ".*"'
$replace = 'displayName = "' + $newVMname + '"'
$command = ("sed -i 's/$find/$replace/' '$newVMXlocation'")
($commandResult = Invoke-SSHCommand -Index 0 -Command $command) | Out-Null

# Blank the MAC address for adapter 1
$find    = 'ethernet0.generatedAddress \= ".*"'
$replace = 'ethernet0.generatedAddress = ""'
$command = ("sed -i 's/$find/$replace/' '$newVMXlocation'")
($commandResult = Invoke-SSHCommand -Index 0 -Command $command) | Out-Null


#####################
## Clone the VMDKs ##
#####################

Write-Host "Please be patient while cloning disks. This can take some time!"
foreach($VMdisk in $VMdisks) {
    # Extract the filename
    $VMdisk.Filename -match "([^/]*\.vmdk)" | Out-Null
    $oldDisk = ("/vmfs/volumes/" + $VMdatastore + "/" + $VMdirectory + "/" + $Matches[1])
    $newDisk = ($newVMdirectory + "/" + ($Matches[1] -replace $VMdirectory, $newVMname))

    # Clone the disk
    $command = ('/bin/vmkfstools -i "' + $oldDisk + '" -d thin "' + $newDisk + '"')
    Write-Host "Cloning disk $oldDisk to $newDisk with command:"
    Write-Host $command
    # Set a timeout of 10 minutes/600 seconds for the disk to clone
    ($commandResult = Invoke-SSHCommand -Index 0 -Command $command -TimeOut 600) | Out-Null
    #Write-Host $commandResult.Output
   
}


########################
## Register the clone ##
########################

Write-Host "Registering the clone..."
$command = ('vim-cmd solo/register "' + $newVMXlocation + '"')
($commandResult = Invoke-SSHCommand -Index 0 -Command $command) | Out-Null
#Write-Host $commandResult.Output


##########
## TIDY ##
##########

# Close all connections to the ESXi host
Disconnect-VIServer -Server $ESXiHost -Force -Confirm:$false
Get-SSHSession | Remove-SSHSession | Out-Null
Get-SFTPSession | Remove-SFTPSession | Out-Null

# Return to previous directory
Pop-Location

You can download the latest version of the script from my GitHub repository.

Cover photo by Dynamic Wang on Unsplash

Using a Canon EOS 60D as a webcam

Canon EOS 60D
At the time of writing, the excellent EOS 90D is the modern equivalent of my 60D. I really want a Sony Alpha though! In the interests of transparency: these are affiliate links. See my affiliate disclosure page for an explanation.

Necessity is the mother of invention. In the midst of the trauma and struggles of coronavirus, one positive theme has consistently emerged: innovation. In particular, the explosive rise of home working, podcasting and vlogging has resulted in significant improvements in associated technology.

So when I recently started researching ways to raise my webcam game (for conference calls and church meetings), I was spoilt for choice. Since I’m a fan of both cost-efficiency and quality, I was particularly interested in seeing what could be achieved with my existing equipment, including my faithful DSLR camera – a Canon EOS 60D.

The last time I looked into this, the main way to take a feed from this camera model, was through its HDMI port and a separate video capture device (which I don’t own). But Canon has pulled an innovation blinder, releasing and improving its EOS Webcam Utility and ensuring that it not only works with the latest hardware, but also with such aging models as my ten-year-old 60D. Oh Canon, I love you.

“There must be a catch,” I thought, as I read reports of camera sensors overheating, or timing out after 30 minutes. So with no great expectations, I downloaded and tested the software.

Oh. My. Word. Did I mention Canon how I love you?

Screenshot of a Zoom session using the EOS webcam utility, my Canon EOS 60D and a 17-55 f/2.8 lens.

With zero effort and no tweaking of camera settings, the improvement was immediately visible. From the screenshot you can see I need to work on contrast and lighting. But for a first test, this made me very happy.

Did the camera overheat or timeout? No. I ran the session for about two hours. The camera was slightly warm at the end. Granted it’s not a hot summer’s day here in the UK (is it ever!) but it looks to me like this setup would work all day, every day. The only snag is battery life. So instead of being on the market for a superior webcam, I’m instead on the market for an external power supply (like Canon’s ACK-E6). Clones of the OEM adapter are available for about £23. Bargain.

Photo of EOS 60D courtesy of John Torcasio, CC BY-SA 3.0, via Wikimedia Commons.

SOLVED: first-time login problems when enforcing MFA with AWS

If you’re reading this page, you probably already have an MFA strategy sorted. But for those still making decisions: I love my YubiKey 5 NFC. I use it constantly to log into AWS, to secure my GitHub account, to protect my “I would be completely ruined if it were hacked” email account, etc. My wife was extremely puzzled when I asked for one for my birthday. She didn’t really know what it was she was buying me. What a great present though!

In the interests of transparency: this is an affiliate link. See my affiliate disclosure page for an explanation.

AWS has a tutorial about enforcing MFA for all users. The general thrust of the article is to create a policy that allows users without MFA to do nothing other than log in and set up MFA. Having enabled and logged in using MFA, other permissions become available to the user (according to whatever other permissions are assigned).

This works well apart from one snag: having created a user, and set the flag forcing the user to change password on first login, the user cannot log in. Instead the user is greeted with the following error:

Either user is not authorized to perform iam:ChangePassword or entered password does not comply with account password policy set by administrator

The problem lies in a policy statement called “DenyAllExceptListedIfNoMFA”. As its name suggests, for a user without MFA, this blocks all bar the specified actions. In AWS’s recommended policy, the section effectively allows the following actions:

"iam:CreateVirtualMFADevice",
"iam:EnableMFADevice",
"iam:GetUser",
"iam:ListMFADevices",
"iam:ListVirtualMFADevices",
"iam:ResyncMFADevice",
"sts:GetSessionToken"

You’ll notice that those actions don’t include anything about changing a password! So without MFA already enabled on your account, there’s no way to change your password when first logging on (if “force password change” is enabled). The trick is to add two more permissions:

"iam:ChangePassword",
"iam:CreateLoginProfile",

For a user that has not yet logged into the AWS console, this will allow creation of the user’s login profile and setting a new password.

SOLVED: Windows 10 forbidden port bind

Angry budgie - Photo by Егор Камелев on Unsplash

Ever have this problem, launching a Docker container (in this case, Nginx on port 8000)?

Error: Unable to start container: Error response from daemon: Ports are not available: listen tcp 0.0.0.0:8000: bind: An attempt was made to access a socket in a way forbidden by its access permissions.

or maybe this problem, trying to run PHP’s built-in webserver?

php -S localhost:8080
[Fri Sep 11 09:00:09 2020] Failed to listen on localhost:8080 (reason: An attempt was made to access a socket in a way forbidden by its access permissions.)

Who to trust?

Like me, you may already have read many “solutions”, on a whole bunch of spammy websites. The “fixes” are often no more than workarounds – and in some cases, pretty bad workarounds, at that. Such as:

  • Disable VPN
  • Disable Internet Connection Sharing
  • Disable third party firewall
  • Disable antivirus (for goodness’ sake!)

More sensibly, use (e.g.) netstat to find out if something has already bound to the port.

None of these helped in my case. (Well I didn’t try disabling my antivirus or firewall, because c’mon!) Nothing was bound to the ports in question. I couldn’t disable ICS because I’m using its capabilities to provide NAT routing for Hyper-V networks.

The cause

It turns out the problem is down to Docker and Hyper-V reserving a shed load of ports. You can verify if this is the case for you by running the following command (which despite advice elsewhere on the internet does not need to be in an elevated PowerShell prompt; plain old no-privileges cmd will do):

netsh interface ipv4 show excludedportrange protocol=tcp

In my case, I could see that a lot of ports were reserved, between 1128 and 55437:

Start Port End Port
---------- --------
1128 1227
1228 1327
1328 1427
1428 1527
1528 1627
1628 1727
1728 1827
1828 1927
1928 2027
[snip]
50000 50059 *
53610 53709
53710 53809
54210 54309
54610 54709
54710 54809
54910 55009
55113 55212
55214 55313
55338 55437

* - Administered port exclusions.

I confirmed that this is the issue by picking a port that hadn’t been reserved:

php -S localhost:50080
[Fri Sep 11 09:12:21 2020] PHP 7.4.8 Development Server (http://localhost:50080) started

(For me, the PHP web server would also start quite happily on port 80, incidentally. But you probably shouldn’t do that!)

People who have identified this issue tend to recommend disabling Hyper-V, excluding whatever ports you need and re-enabling Hyper-V. I’m nervous of that approach however, having spent a lot of time configuring Hyper-V networking and having seen this approach nuke networking in the past.

If you’re happy taking that approach, I suggest reading and understanding this Microsoft article. Personally, I prefer to approach this as follows.

Find gaps in port reservations

Hyper-V and Docker between them seem to reserve different sets of ports on each reboot. Helpful. You can look for gaps in the port reservations using the following method, but note that these gaps will not persist, without other measures. Here’s how to find the gaps:

  1. Run the netsh command above.
  2. Copy and paste the output into Notepad++ and use search and replace (in regular expression mode) to turn all the spaces into tabs – replace ( +) with \t.
  3. Copy and paste the result into Excel (which will now put all the ports nicely into cells.
  4. Use an Excel formula to identify gaps in the reserved ranges: =IF(A4=(B3+1), "continuous", "## " & TEXT(A4-B3-1, "0") & " PORT GAP ##")
    List of reserved port ranges, showing any gaps
  5. Where “PORT GAP” appears, there is a gap between the end port on that line and the start port on the next (this would be 2115-2379 in the example above, which is 265 ports, inclusive).

As you can see, this approach does find you an available port (unless something else has bound to it):

php -S localhost:2115
[Fri Sep 11 09:36:58 2020] PHP 7.4.8 Development Server (http://localhost:2115) started

The fix: reserve your own ports

Well, two can play that game. Once you’ve found a gap, you can permanently reserve it for your own use. I found the largest gap between 12970 and 49670, so decided to reserve a memorable slice of ports: 20000- 21000. The appropriate incantation follows, which does need to be elevated this time. Swap port numbers and range to suit your environment and requirements:

netsh int ipv4 add excludedportrange protocol=tcp startport=20000 numberofports=1000 store=persistent

You will see that the range is now showing as administratively reserved (indicated by the asterisk):

netsh interface ipv4 show excludedportrange protocol=tcp

Protocol tcp Port Exclusion Ranges
Start Port End Port
---------- --------
1215 1314
…
20000 20999 *
…
51490 51589
- Administered port exclusions.

And once again, I can use a port within my preferred range:

php -S localhost:20080
[Fri Sep 11 10:37:23 2020] PHP 7.4.8 Development Server (http://localhost:20080) started

This exclusion persists between reboots and protects your range from being stolen by Hyper-V or anything else.

Angry budgie featured photo by Егор Камелев on Unsplash

Hyper-V virtual switch creation woes

man rubbing his temples in frustration
UPDATE: You may wish to leap straight to the comment below from Craig S., in which he recommends a workaround for this problem. Many subsequent commenters have tried his suggestion, with success. Thanks, Craig!

For my own part, I moved to using an internal switch configured for NAT and run a pfSense VM in that network, for DHCP. Without doubt, Craig’s solution is easier to implement!

If you found this post helpful, you might also be interested in John Savill’s book, Mastering Windows Server 2016 Hyper-V. In the interests of transparency: this is an affiliate link. See my affiliate disclosure page for an explanation.

High-end laptop hardware does not approach enterprise-grade server quality. That’s my takeaway.

I’ve been wrangling with Hyper-V on a very nice ultrabook that has 32GB of RAM and a Core i7 processor (quad-core). Highly portable and useful for running multiple VMs, which indeed was the idea.

This ultrabook is also now sporting the 2004 release of Windows 10. Security-obsessed folks like me have taken a keen interest in the application sandbox features.

Since sandboxing is all about virtualisation, it made sense, I reasoned, to use Hyper-V rather than any other virtualisation platform. It’s native to Windows and this way there should be fewer potential conflicts between hypervisors.

And it was all going well until I attempted to create an “external” (bridged) switch. The creation process failed with the following unhelpful Virtual Switch Manager errors:

  • “Adding ports to the switch ‘[switch name]‘ failed. The operation failed because the object was not found.”
  • “Failed while adding virtual Ethernet switch connections. Ethernet port ‘{[insert long GUID here]}’ bind failed: Cannot create a file when that file already exists. (0x800700B7).”

Worse than that, whatever process had partially completed could not easily be reversed – clicking ‘cancel’ did not back out any changes. Rather, it left my laptop with networking utterly ruined.

I confirmed this by repeated attempts to create a vSwitch, following resetting the network stack and removing/reinstalling Hyper-V. Same results every time. If you find yourself in this mess, utter the incantation “netcfg -d” at an elevated command prompt and reboot. (You may also need to remove and re-add WAN miniport devices as described here, since this process can break existing L2TP VPN connections.)

Is there a way to fix this problem? I believe not, at present. It’s almost certainly connected to the attempted use of a WiFi adapter which Hyper-V can’t always support. My adapter is an Intel Wi-Fi 6 AX200, FWIW. No matter how expensive the laptop, you can’t assume that the hardware will provide across-the-board support for virtualisation. For that, you really need a decent-quality server-grade NIC.

All is not lost. Although it’s not possible to bridge the Wi-Fi adapter, the default network (which offers NAT) works fine. And internal virtual networks (which aren’t hardware bound) are also unaffected. Granted, this means you can’t route externally into your VMs, but that’s not the end of the world, particularly if you’re au fait with port forwarding (read also the comments at that link).

(Yes, you could also use an external network interface for this, though that reduces your laptop’s portability of course. Quality matters for that too – YMMV.)

If you have cracked this problem or can provide any further thoughts or guidance, please do let me know in the comments!

This article’s featured photo by Siavash Ghanbari on Unsplash

How to recruit the perfect employee (with free resources)

You may have arrived at this page thinking, “Okay, this is a clickbait article, but I’ll bite.” Clickbaity, because you’d say there’s no such thing as a perfect employee. And I confess, I’d half agree with you. That said, when hiring new people for a role, we’re usually looking not simply for someone who can just do the work but who also will be a good fit. When you get someone with both attributes, that’s pretty close to perfect, I’d say.

Interview processes are artificial. If like me you don’t have lots of time to spare for recruiting and you don’t have a department dedicated to taking new recruits through a fortnight-long getting-to-know-you exercise, you’ll appreciate some effective time-savers.

Most interviews I’ve conducted have been on the telephone or face-to-face in the office – and relatively short. These are not ideal scenarios for really getting to know someone. And for me, it is vitally important to know who the candidate is, not just what the candidate can do.

Will candidates fit in well with the team? Will they share our values? Will they bring toxicity and poor attitudes? Some people pride themselves on being a good judge of character. If this is you, I’m really, genuinely happy for you. The rest of us though, we need some tools to help us. Some tools that take us closer to understanding the core of someone’s being, within the time constraints of an hour-long interview.

Virtues toolkit

If you want to know what someone is like, one approach is to ask them to comment on their own character. Many people find that difficult enough when there’s no pressure, never mind in an interview situation, not least because they’re second-guessing, wondering what might be the ‘right answer’.

Of course there is no right answer. You will always and only be you. Even if you’re a good actor and can play the part of someone with a different personality, in time, under stress, the real you will emerge. And in the meantime, you’ll waste a lot of emotional energy.

So with our candidates, to get to the real person we need to draw out the values that are important to them. To what characteristics would they say they aspire? Chances are, they are already innately keyed into those characteristics and are striving to improve. Maybe they’re passionate about justice, or desperate to be more professionally dispassionate (to become a better negotiator). None of these characteristics or values are good or bad, per se. But some will be more pertinent to your business and in interactions within the team.

My virtues toolkit is really simple. Without judgement, I created a diverse list of characteristics that might be considered to be virtues. I printed them, laminated them and cut them up into individual pieces. Then within interviews I gave candidates the pile of virtues and asked them to select the five to which they most aspired or which they most admired.

Can candidates ‘game’ this system? Well yes, of course. But that’s why we have probationary periods! If you discover your candidates do not in any way reflect the values they claimed to espouse, you can be sure you have an issue of integrity, which may make them unsuitable to work in your business. Or possibly, if the gap is not too wide, it identifies areas for improvement during the probationary period or beyond.

Here are the templates. Feel free to use as they are or customise them according to your own tastes and requirements. At the very least I have no doubt you can make them look better. My design skills are feeble at best!

Categorisation exercise

This one’s a bit more specialised. It may or may not be of use to you, but if nothing else, I hope it might inspire other creative methods of conducting candidate assessments.

When I was recruiting for a Security Analyst position recently, I spent some time thinking about the qualities that might be advantageous for such a role. How might an analyst think?

In many different analyst roles, not only within information security, it is fundamentally important to be able to sift through data. To see the wood for the trees. To identify different characteristics in the information presented. To spot those factors that are relevant. To think like an analyst, you probably need to be pretty good at classification, categorisation, developing or utilising taxonomies.

As I said before, finding the right person for the role is important – possibly more important than finding someone with the right experience. If you recruit someone who has all the right tendencies and the ability to learn quickly, a lack of experience is of less importance in many roles. (Leaving aside for the moment those roles where prestige and documented credibility or political power are key.)

So I developed a categorisation exercise. I produced a sheet of 48 different items of clothing, all different from each other in various ways. Again, I laminated them and cut them up into individual cards. I gave these to the candidates and invited them to make notes on the various different ways in which you might categorise the cards – and whether a particular method stood out as most useful or appropriate.

While the candidates completed the task, I observed and made notes on how readily they adapted to this and how quickly they were able to work. A logical analytical mind makes light work of this sort of task, even if the person has no previous experience of a game like this. (Let’s face it. It’s definitely a game, albeit not a very fun game.)

I made it clear to the candidates that this was not a colour perception test. The results were illuminating and as far as I can tell, their interaction with this test correlated strongly with what I knew about them or later came to know.

The original template for this exercise is in SVG format. You can edit this using the free (and very competent) graphics program Inkscape.

Conclusion

With a bit of innovation and some lateral thinking, there are definitely some strategies you can use to get closer to understanding the people you interview. Clearly there’s no substitute for putting your candidates at their ease, giving them time and space and getting to know them. But for a busy hiring manager, these tips and resources might get you a shade closer to perfect – and if not these exact resources, whatever new resources you are now inspired to create.

I’d love to hear if this post has helped you or if you have any other ideas about how we can improve the interview process for recruiter and candidate. Let me know your thoughts in the comments below. If you have great resources of your own, get in touch and perhaps I can provide links to them here.

Better than a risk matrix

There’s something better than a risk matrix? It’s a bold claim. But risk matrices have significant weaknesses, as I have discussed elsewhere.

In information security, we know (I hope) that our role is primarily concerned with the control of risk. We may agree that’s what we’re doing – but unless we can measure risk and show how our efforts change our risk exposure, where’s our credibility?

When suggesting we should measure risk in infosec, a common objection is that we can’t measure this – there are too many unknowns. But we do ourselves a disservice here: we know a lot. We’ve trained, we have experience. And much as I hate to use the term, we’re experts. (Certainly when compared to people outside our field.)

So we take our expert opinions, maybe some historical data and we feed it into a model. ‘Modelling’ can be an intimidating concept until we understand what we mean. We model all the time – tabletop DR exercises, threat modelling, run books, KPI projections. The point is once we know what we’re doing, it’s not intimidating at all.

So let’s get over that hump.

Risk modelling is a matter of statistics. Again, don’t let that put you off. You don’t need to be a stats whizz to use a statistical model. Many of us are engineers. We know how to use tools; we like using tools; we often make our own tools. Statistical models are simply tools; we just need to learn how to use them. We don’t necessarily need to be able to derive them. But usually we like to have some idea of what’s going on ‘under the hood’. So here’s a whistle-stop tour of the concepts underpinning the model I present below.

Distributions

Let’s say we’re finding out how many televisions there are in the households in our city. We take a poll, where each household reports how many televisions they have and then we plot a graph. The x-axis shows number of televisions, say from one to ten. The y-axis shows number of households with that number of televisions.

We might expect to see a graph a bit like this (yeah, that’s a lot of televisions):

This is a normal distribution. Normal distributions are common in statistics and produce a curve that’s often called a ‘bell curve’, due to its shape. The data tends to congregate around a central point, with roughly equal values either side. The amount of spread is called the ‘standard deviation’.

Reality often shapes itself into a normal distribution. But in information security risk, you may find that other types of distribution better reflect the circumstances. One such option is the log-normal distribution. Here’s its curve:

If we’re measuring the impact of an event, a log-normal distribution often fits the bill. It does not go below zero (an impact by definition means an above-zero loss). The values tend to congregate toward the left of the graph, but it leaves open the option for a low probability of a very high number, to the right of the graph.

What we’re doing is creating a mathematical/statistical model of reality. Since it’s a model, we choose whatever tool works best. A log-normal distribution is a good starting point.

Confidence interval

In measuring risk, we take account of our uncertainty. If we were in the enviable position of having absolute certainty about future events, we would be better off turning this astonishing talent to gambling. As it is, we’re not sure about the likelihood and impact of detrimental events, and hence we estimate.

When estimating impact, we could simply select a value from one to five and produce a risk matrix. But as we saw before, this is not particularly informative and in fact it can be misleading. One aspect that the ordinal scoring overlooks is that impact is best expressed as a range. If the negative event occurs, e.g. we experience a ransomware outbreak, that may prove to be a low-impact incident, or things may go horribly wrong and it costs us millions.

The confidence interval allows us to express a range for the impact, based on our level of uncertainty.

A confidence interval of 90% is often used. This means that the actual impact has a 5% chance of being above this range and a 5% chance of being below the range. In practice this proves to be good enough for our purposes.

Side-note: the log-normal curve that models a 90% confidence interval has a standard deviation of 3.29. We’ll use this fact shortly.

Imagine you have ten identical slips of paper. On nine of them, you write ‘winner’; on the other one, you write ‘loser’. All ten slips are placed into a hat and you draw one out at random. If you draw the ‘winner’ slip, you win £1,000. Otherwise, you win nothing.

On the other hand, there is a football match coming up today and your favourite team is playing. You are asked to predict the likely number of goals your team will score, within a range, with 90% confidence (nine in ten). Again, if you are right, you win £1,000.

In the book How to Measure Anything in Cybersecurity Risk, the authors call this an ‘equivalent bet’ test. If you prefer the idea of drawing slips out of a hat, because you think you’re more likely to win, that means you didn’t really use a 90% confidence interval for the football score prediction. You need to widen the range.

On the other hand, if you prefer the football bet, that means that your range prediction was probably too wide. You’ve used perhaps a 95% confidence interval.

The trick is to balance the two such that the potential reward is equivalent in either scenario. In this way, you will have achieved a true 90% C.I. with your football prediction. It takes a little effort to wrap your head around this but press on: this is an invaluable concept in risk analysis, which we’ll use shortly.

Estimating likelihood

When you’ve defined your threat event, estimating its likelihood is straightforward: define a time period and a probability (percentage likelihood) of the event occurring. To be meaningful, don’t make the event too specific. So ‘a ransomware outbreak’ is probably a better event definition for most companies than ‘a solar flare causing communications anomalies that disrupt global networking such that intercontinental backups are delayed by three hours’.

Over a 12-month period, you may estimate that you have a 5% likelihood of experiencing a ransomware outbreak. This percentage need be no more than the considered opinion of one or more experts. You can improve the estimation through the use of data: historical ransomware attacks in your sector, global threat activity, etc. But the data is no more required for this model than it was for a traditional risk matrix.

At this stage, do not consider the severity of the attack, just the likelihood. This likelihood percentage stands in the place of the 1 to 5 numbering in the risk matrix; it is simply more helpful.

Estimating impact

For impact, you now estimate a range, using a 90% confidence interval. You can use the equivalent bet test if you like, to guide your estimate. Again, you can take into account any available data, including known costs of breaches as they are reported worldwide. You can also consider things like the possible duration of an outage, the costs associated to an outage, the costs of paying the ransom or the excess on a cyber risk insurance policy. So you might say, all things considered, you are 90% confident that the impact would be between £250k and £1.5m.

We’re going to use a log-normal distribution, as shown above. During modelling, due to the long tail on this graph, you may find that the model produces some extremely high values that are simply not realistic (£75m, say) – unrealistic for whatever reason, such as the fact perhaps this exceeds your company’s annual global sales. You can therefore introduce a cap to this impact; e.g. a 90% C.I. of £250k to £1.5m, capped at £10m.

Modelling the risk

You now have all the data you need, to model the risk. What does this mean? Merely that you will insert these numbers into a formula and find out what happens when you run the formula a thousand, ten thousand, a hundred thousand times.

And why will you see different answers each time you apply the formula? Because we introduce two random variables. The random variables are based on all the above. The first random variable represents whether the event occurred ‘this year’, based the likelihood probability. To express this as a formula in Excel, Google Sheets or >insert favourite spreadsheet editor here< you do:

= IF (RAND() > likelihood, 0, 1)

Here, likelihood is the percentage expressed as a decimal (0.05). And RAND() produces a decimal number less than 1. So if RAND() is greater than 5%, the event didn’t occur and the result is 0. If RAND() is less than 5%, it did occur, hence the result is 1. We will multiply this by the impact, which we calculate next.

For impact, we use an inverse log-normal function. We chose to use a log-normal distribution above, we know the impact is somewhere within that distribution and again we use RAND() to work out the precise impact (‘this time’) based on this knowledge. Remember that with a 90% CI, there’s a 5% chance the impact will be higher or lower than the range we specified. And remember further that 3.29 is the standard deviation for such a curve.

So we have the range and the confidence interval and we’re working back to a single figure. The formula is this:

= LOGNORMAL.INV(RAND(), mean, standard deviation)

Or, where high and low represent the upper and lower bounds:

= LOGNORMAL.INV(
  RAND(), 
  ( LN( high ) + LN( low ) ) / 2,
  ( LN( high ) - LN( low ) ) / 3.29
)

Use LOGNORMAL.INV in Excel. In Google Sheets the function is LOGNORM.INV. With log-normal distributions, the figures are based on the log of the mean and standard deviation – that’s why you see here the log function LN.

To prove that you don’t need fancy software or a powerful computer to do this modelling, I ran this simulation using Google Sheets on a modestly-specced Chromebook, over 1,000 rounds. You can download the spreadsheet at the end of this article to see how this worked. To cut a long story short, on one run of this simulation I ended up with the figure £45,540.19, being the annual risk exposure for this threat. (With just 1,000 rounds, you can expect to see some variation each time you recalculate, but it’s enough to demonstrate the model.)

You’ll see in the spreadsheet some columns mysteriously labelled “Loss Exceedance Calculations”. At some point, I may write an article to address that!

For now, I hope this has whetted your appetite and given you enough to start improving upon your risk matrices. Happy modelling! And if you’re interested in learning a bit more about statistics in general (some measure of stats fluency is well worth it), I suggest taking a look at Statistics for Dummies. Don’t be put off by the title!

How a risk matrix can kill you

As a security professional, unless you work for an MSSP (Managed Security Service Provider), security is simply a cost to the business. Fitting a swipe card door entry system will not in itself generate more revenue. Increased password complexity rules are unlikely to result in increased sales.

How then do we justify our existence? By the way we reduce risk.

If you work in penetration testing or you’re a network security engineer, you might find this to be a very unsexy subject. Risk. Boring. But it underpins everything you do and should be at the heart of every choice you make. We do not secure our networks simply because the manual says we should, or because the compliance department insists on it; we do it because it reduces risk. In reducing risk, we’re improving stability, longevity, our ability to service our customers; in short, we’re protecting our jobs.

Well then. How do you know if your actions as a pen tester or engineer reduce risk? Through risk assessments. Stay with me.

You may already have been subject to a risk assessment in your workplace – checking you know how to use your office chair, for example. And if this is your only experience of risk assessment, I can understand why you might be completely put off by the whole thing.

But there’s more to it than that. And risk assessment in information security can be a lot more interesting (and less patronising). Truly. I’m an engineer by disposition and yet I’m excited about risk.

Now if you’ve been around for any length of time, you’ve no doubt seen one of these, the humble risk matrix:

5 x 5 risk matrix

On the face of this, a risk matrix couldn’t be easier to understand. (Perhaps that’s why we like them.) For any given risk, figure out the likelihood of it occurring (on a scale from one to five) and also the consequence, again on a scale of one to five. You can now position this risk on the matrix. And hey, maybe even multiply those two numbers together, to give you a single overall risk figure. Why not. Likelihood 4, consequence 5, okay that’s a risk of 20. Simple.

But what do we mean by ’20’? What do you mean? What does your CEO mean? Is it an absolute number? Or relative? Is a 20 twice as bad as a 10? The fact is that different people can have completely different perspectives on the meaning of these figures. We only have the illusion of communication. We’re using ordinal numbers, not cardinals.

A five by five risk matrix is an example of qualitative (as opposed to quantitative) risk analysis. A relatively arbitrary value is assigned on the basis of a subjective view of a risk.

Some organisations partially overcome the limitations of this approach by assigning explicit values to different ranks in the risk matrix. The likelihood can be split into five bands of 20% each, for example. Financial values are applied to impact. This is semi-quantitative analysis. E.g.:

Such divisions can also be misleading, unfortunately. Consider:

  1. A 41% likelihood (3) of a £5.1m impact (4) would be scored as 12.
  2. A 60% likelihood (3) of a £5m impact (3) would be scored as 9.

In this risk matrix analysis, risk a seems to be more severe than risk b. But consider the probability calculations:

  1. A 41% likelihood of a £5.1m impact gives financial exposure of £2,091,000.
  2. A 60% likelihood of a £5m impact gives financial exposure of £3,000,000.

So the risk matrix shows a to be the higher risk, but the calculation shows b to be higher. Both can’t be correct. Using a probability scale combined with likelihood range slices demonstrably results in incorrect prioritisation. (And a lack of these clarifying measures must inevitably result in even less reliable ‘calculations’.)

This may seem like a trivial example, but this could be the difference between a project being approved or postponed – on the basis of faulty analysis. And if that project is life-or-death – which it might be, in military or healthcare applications – then the risk matrix could indeed kill you*.

I’ll leave it on that bombshell – in later posts I will discuss some alternatives.

*(You got me, I’m exaggerating for effect.)

Free CTF and Digital Forensics Resources

Are you interested in the forensic side of information security? Want to hone your pen testing skills but not sure where to start? Heard of reverse engineering, but it seems like a black art?

This article is a link dump (so it might go out of date, sorry) of free tools and resources to help you along the way. It started from an email I sent to a security analyst who was interested in learning more about this field.

My interests lie more in cybersecurity risk and management, rather than this low-level, detail-orientated stuff. Yes, that is my way of saying, “No way would I be good enough to reverse engineer malware.” But if I were going to get into this field, this is where I would start.

Feel free to suggest further links in the comments!

Capture The Flag

CTFs are intended to be a fun way to improve your forensic and testing skills.

Learn about CTF: https://ctftime.org/ctf-wtf/

How to get started: https://www.endgame.com/blog/technical-blog/how-get-started-ctf

Intro (biased towards competitions): https://www.alienvault.com/blogs/security-essentials/capture-the-flag-ctf-what-is-it-for-a-newbie

Online CTFs (and other similar challenges):

Forensics Courses

There are a few free digital forensics courses out there. Including:

And a bunch of articles and tutorials at Null Bytes: https://null-byte.wonderhowto.com/how-to/forensics/

Deliberately Vulnerable Services

For flexing all kinds of penetration testing muscles:

Be careful adopting IoT – seriously

IoT device cloud

Those who know me well have probably heard me grumbling about IoT devices – things like Nest, Google Homehub, Amazon Echo, etc. 🙂 That’s for a very good reason – you are surrendering your privacy and opening your home up to potential cyber invasion.

There’s a lot of technobable in the following articles, but the short versions are that there was a major security breach in relation to Samsung’s smart devices. And a particular model of D-Link webcam was so badly designed that it was possible to intercept the video stream without the owner’s knowledge. These kinds of thing are becoming increasingly prevalent. These two articles are from the last week alone.

Please think very carefully before you bring these devices into your home. Is convenience worth the associated risk?

Samsung private GitLab tokens exposed including source code, credentials, secret keys

Man-in-the-Middle vulnerabilities in D-Link cameras