VMware ESXi Error Code Reference

Searchable VMware vSphere and ESXi error reference with the real cause and a step-by-step fix for each one.

This VMware ESXi error code reference is the searchable list I keep open when a host throws a purple screen or vCenter just says Not Responding. Paste in the code, the red banner text, or a half-remembered symptom, and it filters live across PSOD, install and boot failures, host and vCenter link problems, storage (APD, PDL, VMFS locks, datastore full), networking, vMotion, HA and DRS, snapshots, VMs that will not power on, VMware Tools, licensing, vSAN and the VCSA appliance. Open any error and you get the actual cause, a numbered fix with the exact esxcli or PowerCLI command to run, which log to read, and a quick way to confirm it worked. There is a category filter too. The whole dataset is baked into the page, so the search runs as plain JavaScript on your own machine. Nothing you type ever leaves your browser, and it keeps working with the network cable yanked.

100% in your browser. Nothing you type ever leaves this page.

VMware vSphere / ESXi error reference, with fixes

Got an ESXi or vCenter error and no idea where to start? Paste in the code or the message, even a half-remembered symptom. You get back the cause and the exact steps to fix it: the esxcli or PowerCLI command to run, plus which log is worth reading or which service to bounce. Click any error to open the full resolution, plus a way to check it actually worked. There is a category filter as well, handy when you would rather browse everything storage, or just the PSOD entries. It all runs in your browser, nothing leaves the page.

How to use this VMware ESXi error reference

This VMware ESXi error reference pulls the common vSphere failures together so you can paste a code or a symptom and get the real cause plus a numbered fix. VMware breaks in a dozen different ways. A purple diagnostic screen (PSOD) on the host. A red banner in the vSphere Client. Some cryptic esxcli return that tells you almost nothing. Or just a quiet Host not responding sitting in vCenter. Here is the annoying part: the same root problem shows up worded completely differently depending on where you happen to catch it. So this reference covers ESXi, vCenter and vSphere, and for each one it gives you the actual cause and a fix you can run right away. Drop any piece of the message into the search box, or filter by category, then open the card.

Honestly, most vSphere trouble sorts itself into a handful of families. A host crash (PSOD). Boot and install failures. The host losing its link to vCenter. Storage, which is its own whole world: APD, PDL, datastore locks, VMFS. Then networking, the migration and availability stuff (vMotion, HA, DRS), snapshots, VMs that will not power on, and VMware Tools acting up. Figure out the family and you have usually figured out both the fix and which log to go open.

The logs and commands that solve most VMware issues

  • vmkernel.log (/var/log/vmkernel.log) is your first stop for host, driver and storage events, and anything near a PSOD.
  • hostd.log and vpxa.log when the management agent or the link to vCenter is the thing that broke.
  • vmware.log lives inside each VM folder on the datastore. That is where the per-VM power-on and disk errors hide.
  • Need to restart the management agents? It is safe from the host shell: /etc/init.d/hostd restart and /etc/init.d/vpxa restart.
  • Check storage paths with esxcli storage core path list, adapters with esxcli network nic list.

PSOD, APD and PDL: the families that trip people up

A Purple Screen of Death stops the ESXi host cold and prints the exception, the module that failed, and a backtrace. The exception type is the part that tells you the most. A #PF Exception 14 (page fault) is almost always a driver. A hardware LINT1/NMI means memory or a dying component. Configure a coredump target first so the host actually writes a dump you can dig into next time, then go after the driver, firmware or failing part. On the storage side, two states cause the most confusion. APD (All Paths Down) is a temporary loss of every path where the array might still come back, so ESXi keeps retrying. PDL (Permanent Device Loss) is the array flat-out telling ESXi the device is gone for good, spotted by its SCSI sense codes. APD is usually a fabric, zoning or controller problem to bring back. PDL means you pull the device and remediate the datastore. Mix these two up and you will burn hours chasing the wrong thing.

Privacy and how this tool runs

The whole error dataset sits right inside the page, and the search runs locally with plain JavaScript. Nothing you type is sent or kept anywhere. Load it once and it works with no connection at all, which is handy when the host that died was also your gateway.

Frequently asked questions

What causes a VMware ESXi purple screen (PSOD)?

Almost always a bad or mismatched driver. Failing hardware does it too, anything from a dying memory stick to a flaky PCIe card, and so does storage heap exhaustion. What it usually is not is ESXi itself. Read the exception on the screen first: a #PF Exception 14 is pointing at a driver, a LINT1 or NMI is pointing at hardware. Set up a coredump target, then update the driver and firmware it blamed, checking them against the VMware HCL.

How do I fix an ESXi host showing as Not Responding in vCenter?

Start by bouncing the management agents from the host shell: /etc/init.d/hostd restart and /etc/init.d/vpxa restart. Then make sure the management network is actually up and that vCenter can reach the host on ports 902 and 443. Now right-click the host and hit Reconnect. If the agents flat-out will not start, the usual culprit is disk space on the host scratch, so check that and the hostd.log.

What is the difference between APD and PDL?

APD (All Paths Down) is temporary. The device might come back, so ESXi just keeps retrying and hoping. PDL (Permanent Device Loss) is the array telling you straight up, via SCSI sense codes, that the device is gone for good. The fixes do not overlap either. APD is a fabric or array thing you bring back online. PDL means you pull the dead device and then remediate whatever datastores and VMs it took down with it.

How do I power on a VM that says the file is locked?

Something else is holding a lock on the VMDK or the .vmx, either another host or a stale process that never let go. Run vmkfstools -D on the locked file and read the MAC in the output, that tells you which host owns it. Make sure the VM really is not running over there. Then clear the stale lock by restarting the management agents on the host that is holding it, or reboot that host if it comes to that. One rule though: never just delete .lck files because you are in a hurry.

Why can't I delete or consolidate a snapshot?

Usually one of two things. Either the datastore does not have enough free space for the merge, or a backup job still has an open file handle on the delta disk. So free up space, at minimum the size of the whole delta chain, kill any backup that is touching the VM, then run Snapshot Consolidate. If it is still stuck, vmware.log in the VM folder names the exact file the merge is choking on.