Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
209 views10 pages

ESXi Boot and Log Guide

This file contains complete information about ESXi file partitions and logs information which will help administrators to find out cause for any failure. Also, there are few performance related Q/A which will hep administrators to fix issues in case failure.

Uploaded by

jay_tiwariSIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views10 pages

ESXi Boot and Log Guide

This file contains complete information about ESXi file partitions and logs information which will help administrators to find out cause for any failure. Also, there are few performance related Q/A which will hep administrators to fix issues in case failure.

Uploaded by

jay_tiwariSIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

[root@ukstr1vmp0101:~] grep booted /var/log/vmksummary.

log
2016-03-15T18:21:35Z bootstop: Host has booted

ESXi maintains two copy of boot partition - /bootbank and /altbootbank

/Bootbank: The Hypervisor image is located on this partition (250 MB). It is formatted with old FAT. The image file is
called s.v00 (124 MB Compressed) and is decompressed during boot. It will be extracted during the boot process and
loaded into the system memory.

/altbootbank: /altbootbank is more or less a backup copy of /bootbank, which is the running copy.

Partition is empty after a fresh installation of ESXi host.


Once you perform an upgrade of ESXi host, the current image is copied from /bootbank partition. This makes the
possibility to return to the previous version of ESXi by typing “Shift + R” while booting, if there is any issue occurred
during the ESXi host upgrade. It contains around 138 files in the directory.

/bin: /bin is a standard subdirectory of the root directory in Unix-like operating systems that contains the executable
(i.e., ready to run) programs that must be available in order to attain minimal functionality for the purposes of
booting (i.e., starting) and repairing a system.

/dev: /dev is the location of special or device files.

/etc: Usually contain the configuration files for all the programs that run on your Linux/Unix system.

/lib: A library files directory which contains all helpful library files used by the system. In simple terms, these are
helpful files which are used by an application or a command or a process for their proper execution. The commands
in /bin or /sbin dynamic library files are located just in this directory.
/locker: this partition contains vmtools information under /locker/var/vmtools.

/mbr: The master boot record (MBR) is a small program that is executed when a computer is booting (i.e., starting
up) in order to find the operating system and load it into memory.

/opt: This directory is reserved for all the software and add-on packages that are not part of the default installation.

/productlocker: this directory is subsidiary of /locker directory contains vmtools.


/sbin: /sbin is a standard subdirectory of the root directory in Linux and other Unix-like operating systems that
contains executable (i.e., ready to run) programs. They are mostly administrative tools, that should be made
available only to the root (i.e., administrative) user.
/scratch: When ESXi boots, the system tries to find a suitable partition on a local disk to create a scratch partition.
The scratch partition is not required. It is used to store vm-support output, which you need when you create a
support bundle. If the scratch partition is not present, vm-support output is stored in a ramdisk. In low-memory
situations, you might want to create a scratch partition if one is not present.
For the installable version of ESXi, the partition is created during installation and is selected.
/Store: /store is subsidiary to /locker and /productlocker

/tardisk: VIBs (VMware Installation Bundles) are always compressed files, mainly TGZ, V0n files, called Tardisks.
They’re not uncompressed inside the memory and used, but they’re mounted as mount points in the file system as
well as any other physical partition of boot disk. This makes these tardisks always read-only that read from the
memory. Another thing, tardisks can’t be changed. Each new bundle/tardisk overlays the old one and the file system
sees only the last one available. When the last bundle/tardisk is removed, the previous layer appears and is used by
the file system. What if a modification is required for certain file on any tardisk?? ESXi VMkernel uses a “Branching
Technique”.

When a modification is required, a read/write copy of the tardisk is created on a Ramdisk. Ramdisk is a container
inside the memory (RAM) for that working read/write copy. Using branching technique, ESXi VMkernel uses the
read/write copy instead of the original read-only one and any modification is done to this read/write copy.
Unfortunately, this modifications will not persist any boot as it’s written to RAM. To come over this, a TGZ file named
state.tgz is created by internal backup process to contain all of these modified files as well as all last-laid bundles
(Confirmation needed for last-laid bundles part). This state.tgz file is called ESXi State Archive.

ESXi Logs
 /var/log/auth.log: ESXi Shell authentication success and failure.

 /var/log/dhclient.log: DHCP client service, including discovery, address lease requests and renewals.

 /var/log/esxupdate.log: ESXi patch and update installation logs.

 /var/log/lacp.log: Link Aggregation Control Protocol logs.

 /var/log/hostd.log: Host management service logs, including virtual machine and host Task and Events, communication
with the vSphere Client and vCenter Server vpxa agent, and SDK connections.

 /var/log/hostd-probe.log: Host management service responsiveness checker.

 /var/log/rhttpproxy.log: HTTP connections proxied on behalf of other ESXi host webservices.

 /var/log/shell.log: ESXi Shell usage logs, including enable/disable and every command entered. For more information,
see vSphere 5.5 Command-Line Documentation and Auditing ESXi Shell logins and commands in ESXi 5.x (2004810).

 /var/log/sysboot.log: Early VMkernel startup and module loading.

 /var/log/boot.gz: A compressed file that contains boot log information and can be read using zcat
/var/log/boot.gz|more.

 /var/log/syslog.log: Management service initialization, watchdogs, scheduled tasks and DCUI use.

 /var/log/usb.log: USB device arbitration events, such as discovery and pass-through to virtual machines.

 /var/log/vobd.log: VMkernel Observation events, similar to vob.component.event.

 /var/log/vmkernel.log: Core VMkernel logs, including device discovery, storage and networking device and driver events,
and virtual machine startup.
 /var/log/vmkwarning.log: A summary of Warning and Alert log messages excerpted from the VMkernel logs.

 /var/log/vmksummary.log: A summary of ESXi host startup and shutdown, and an hourly heartbeat with uptime, number
of virtual machines running, and service resource consumption. For more information, see Format of the ESXi 5.0 vmksummary
log file (2004566).

 /var/log/Xorg.log: Video acceleration.

Note: For information on sending logs to another location (such as a datastore or remote syslog server), see Configuring syslog on
ESXi 5.0 (2003322).

Logs from vCenter Server Components on ESXi 5.1 and 5.5


When an ESXi 5.1 / 5.5 host is managed by vCenter Server 5.1 and 5.5, two components are installed, each with its own logs:

 /var/log/vpxa.log: vCenter Server vpxa agent logs, including communication with vCenter Server and the Host
Management hostd agent.

 /var/log/fdm.log: vSphere High Availability logs, produced by the fdm service. For more information, see the vSphere HA
Security section of the vSphere Availability Guide.

Start-Up Operation
When the system boots for the first time, the VMkernel discovers devices and selects appropriate drivers for them.
It also discovers local disk drives and, if the disks are empty, formats them so they can be used to store virtual
machines.
During this initial boot, the VMkernel automatically creates the configuration files using reasonable default values
(for example, using DHCP to obtain network identity information). Users can adjust the defaults with the direct
console user interface or with the standard VMware management tools: VMware VirtualCenter and the VI Client. In
the embedded version of ESXi, the configuration is stored in a specific part of the memory module that is both
readable and writable. On subsequent reboots, the system reads the configuration from this persistent memory. In
the rest of the boot process, the system is initialized and the resident file system is built in memory. The hardware
drivers are loaded, the various agents are started, and finally the DCUI process is started.
Once the system is up and running, all further routine operations occur in much the same way as they do in ESX 3.
Because ESXi no longer includes a service console, many of the management activities performed on the ESX
platform are no longer necessary; they were required only to configure and manage the service console itself. Other
management tasks previously done in the service console are now performed in one of the following ways:
• Using the VI Client, which provides a Windows-based graphical user interface for interactive configuration of the
platform. The VI Client has been enhanced to provide capabilities that were previously available only in the service
console.
• Using the remote command line interfaces, new interfaces that enable scripting and command-line–based
configuration of the platform from a Linux or Windows-based server, via an encrypted and authenticated
communication channel.
• Using external agents that leverage well-defined APIs, such as the VI API and the CIM management standard.

Troubleshooting
1. PSOD
Ans: A PSOD (Purple Screen of Death) is the VMware ESX version of a Windows BSOD (Blue Screen of Death). This
occurs when the kernel panics and can no longer function. There most common causes for a PSOD are:
- Hardware Failure
- Out of Memory
- Hung CPU Condition
- Misbehaving drivers (null pointers, invalid memory access, etc)

Breakdown of each section of the above purple diagnostic screen:


The Product and Build: VMware ESX Server [Releasebuild-98103]
The Error Message: PCPU 1 locked up. Failed to ack TLB invalidate
The Stack Trace (The stack represents what the VMkernel was doing at the time of the error. In this example, it was
trying to clear memory page tables (TLB). This information is a vital tool in the diagnosis of purple screen errors by
evaluating the actions of the kernel at the time of the error.):
0x3a37ef4:[0x625e94]Panic+0x17 stack: 0x833ab4, 0x3a37f10, 0x3a37f48
0x3a37f04:[0x625e94]Panic+0x17 stack: 0x833ab4, 0x1, 0x14a03a0
0x3a37f48:[0x64bfa4]TLBDoInvalidate+0x38f stack: 0x3a37f54, 0x40, 0x2
0x3a37f70:[0x66da4d]XMapForceFlush+0x64 stack: 0x0, 0x4d3a, 0x0
0x3a37fac:[0x652b8b]helpFunc+0x2d2 stack: 0x1, 0x14a4580, 0x0
0x3a37ffc:[0x750902]CpuSched_StartWorld+0x109 stack: 0x0, 0x0, 0x0
0x3a38000:[0x0]blk_dev+0xfd76461f stack: 0x0, 0x0, 0x0
The Uptime: VMK uptime: 7:05:43:45.014 TSC: 1751259712918392
The Core Dump: Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1...using slot 1 of 1... log
When a PSOD occurs, one should collect the following:
- Screenshot of PSOD kernel stack trace screen (if possible)
- Support logs from the vm-support command
- Kernel log(should be included in vm-support, but better safe then sorry)
- Kernel core dump (only needed if a developer asks for it)
If the cause of the PSOD isn't obvious from the PSOD kernel stack trace screen, then the kernel log is the second best
place to look for the cause of a kernel panic.
To manually collect the kernel log:
## Kernel log - will output: vmkernel-log.1
# esxcfg-dumppart -L /vmfs/devices/disks/$( \
esxcfg-dumppart --get-active | awk '{print $1}' )

To manually collect the kernel core dump: (if developer asks for it)
## Kernel core dump - will output: vmkernel-zdump.1
## Note: ESXi 5.x will put kernel dump here:
## /scratch/core/vmkernel-zdump.*
# esxcfg-dumppart -C -D /vmfs/devices/disks/$( \
esxcfg-dumppart --get-active | awk '{print $1}' )

For testing purposes, one can manually trigger a PSOD:

# vsish -e set /reliability/crashMe/Panic

2. Disconnected State
Equivalent Ques: An ESXi/ESX host shows as Not Responding in VirtualCenter or vCenter Server
An ESXi/ESX host shows as Disconnected in vCenter Server
Cannot connect an ESXi/ESX host to vCenter Server
Virtual machines on an ESXi/ESX host show as grayed out in vCenter Server
When attempting to add an ESXi/ESX host to vCenter Server, you see an error similar to:
Ans;:
1. Verify that the ESXi host is in a powered ON state.
- To Verify the ESXi host uptime and time stamp - cat /var/log/vmksummary
2. Verify that the ESXi host can be reconnected, or if reconnecting the ESXi host resolves the issue.
- Right click on the ESXi or ESX host in the Inventory.
- Click Connect.
Note: A Reconnect Host task appears in the Recent Tasks pane.
- Wait until the task status changes to Complete.
3. Verify that the ESXi host is able to respond back to vCenter Server at the correct IP address. If vCenter Server does
not receive heartbeats from the ESXi host, it goes into a not responding state.
- Log into the vSphere Web Client. The default URL is
https://vCenter_Server_FQDN:9443/vsphere-client
- Navigate to vCenter > Inventory Lists vCenter Server.
- Click the vCenter Server needing to be verified.
- Select Manage > Settings > General
- Expand the Runtime settings field by clicking on the arrow next to it.

To modify the Runtime Settings:


- Click Edit in the right-hand corner of the panel
- Click Runtime settings on the left-hand side of the window
- Modify the vCenter Server managed address and vCenter Server name accordingly.
- Click OK

To check if a Managed IP has been set on VirtualCenter 2.5.x:


- From the VMware Infrastructure (VI) Client, log in to the vCenter Server, navigate to Administration >
VirtualCenter Management Server Configuration > Runtime Settings.
- Review the Virtual Center Server Managed IP address setting.
- Verify that the address is correct. If not, correct the entry, and click OK.
- Remove the ESXi/ESX host from the VirtualCenter Inventory.
- Add the ESX /ESXihost back to VirtualCenter Inventory.

To check if a Managed IP has been set on VirtualCenter 2.0.x:


- Review the serverIP configuration section of the/etc/opt/vmware/vpxa.cfg file to see if the IP address is
correct
- Run the command: cat /etc/opt/vmware/vpxa.cfg | more
4. Verify that network connectivity exists from vCenter Server to the ESXi host with the IP and FQDN.
ping server
5. Verify that you can connect from vCenter Server to the ESXi host on TCP/UDP port 902. If the host was upgraded
from version 2.x and you cannot connect on port 902, then verify that you can connect on port 905.
telnet server port
6. Verify if restarting the ESXi Management Agents resolves the issue.
/etc/init.d/hostd restart
/etc/init.d/vpxa restart
7. Verify if the ESXi host has experienced a Purple Diagnostic Screen.
8. ESXi hosts can disconnect from vCenter Server due to underlying storage issues.
3. ESX/ESXi virtual machine performance issues
Symptoms
 Services running in guest virtual machines respond slowly.
 Applications running in the guest virtual machines respond intermittently.
 The guest virtual machine may seem slow or unresponsive.
Cause
Performance issues may be caused by several different areas such as CPU constraints, memory overcommitment,
storage latency, or network latency. If one or more of your virtual machines has a bad response time, consider each
of these areas to find the bottleneck.
Resolution
CPU constraints
1. Use the esxtop command to determine if the ESXi/ESX server is being overloaded.
2. Examine the load average on the first line of the command output.
Load average of 1.00 - ESXi/ESX Server machine’s physical CPUs are fully utilized
Load average of 0.5 - ESXi/ESX Server machine’s physical CPUs are half utilized
Load average of 2.00 - the system as a whole is overloaded
3.Examine the %READY field for the percentage of time that the virtual machine was ready but could not be
scheduled to run on a physical CPU.
Under normal operating conditions, this value should remain under 5%. If the ready time values are high on the
virtual machines that experience bad performance, then check for CPU limiting:
- Make sure the virtual machine is not constrained by a CPU limit set on itself
- Make sure that the virtual machine is not constrained by its resource pool.
If the load average is too high, and the ready time is not caused by CPU limiting, adjust the CPU load on the host.
To adjust the CPU load on the host, either:
- Increase the number of physical CPUs on the host
- Decrease the number of virtual CPUs allocated to the host by migrating VMs to different host

Memory overcommitment
1. Use the esxtop command to determine whether the ESXi/ESX server's memory is overcommitted.
2. Examine the MEM overcommit avg on the first line of the command output. This value reflects the ratio of
the requested memory to the available memory, minus 1.
Examples
 If the virtual machines require 4 GB of RAM, and the host has 4 GB of RAM, then there is a 1:1 ratio.
After subtracting 1 (from 1/1), the MEM overcommit avg field reads 0. There is no overcommitment and
no extra RAM is required.
 If the virtual machines require 6 GB of RAM, and the host has 4 GB of RAM, then there is a 1.5:1 ratio.
After subtracting 1 (from 1.5/1), the MEM overcommit avg field reads 0.5. The RAM is overcommited by
50%, meaning that 50% more than the available RAM is required.
If the memory is being overcommited, adjust the memory load on the host. To adjust the memory load, either:
 Increase the amount of physical RAM on the host
 Migrate the VM to other host

3. Determine whether the virtual machines are ballooning and/or swapping.


- Run esxtop.
- Type m for memory
- Type f for fields
- Select the letter J for Memory Ballooning Statistics (MCTL)
- Look at the MCTLSZ value.
MCTLSZ (MB) displays the amount of guest physical memory reclaimed by the balloon driver.
- Type f for Field
- Select the letter for Memory Swap Statistics (SWAP STATS).
- Look at the SWCUR value.
SWCUR (MB) displays the current Swap Usage.
To resolve this issue, ensure that the ballooning and/or swapping is not caused by the memory limit being incorrectly
set. If the memory limit is incorrectly set, reset it correctly.

By default, a virtual machine's Resources > Memory > Limit > Unlimited box is selected, allowing the virtual machine
to use full allocation.
The variables for comparison in the .vmx files are memsize and sched.mem.max

To check the .vmx values for the Virtual Machines:


1. SSH to the host(s)
2. Run this command to list the variables for comparison:
egrep "memSize|sched.mem.max" /vmfs/volumes/*/*/*.vmx | awk -F/ '{print $6}' | more

Note: This command can be run on any version, but you cannot access a Virtual Machine's .vmx file if it is powered
on and running on another host.
or
vm-support -V|sed 's/(Running)\|\|(Registered)//g'|xargs egrep "memsize|sched.mem.max"|more

Note: This is useful to limit the list of virtual machines to the one host you are connected to using SSH.

If sched.mem.max is smaller than memsize, the balloon driver can start consuming memory (especially if the guest
operating system application has periodic bursts of memory usage). However, this setting can cause the balloon
driver to retain its hold on memory. If the guest operating system requires memory that is unavailable for the
balloon driver, the guest operating system starts using swap memory instead, which slows it down considerably.
To force the balloon driver to release its hold on memory and prevent the guest operating system from using swap
space, use one of these options:
- Set the value of sched.mem.max to the allocated memory or greater.
- Select the virtual machine's Resources > Memory > Limit > Unlimited box.
- Migrate the virtual machine to another host.
Note: These changes do not require a restart of the virtual machine.

Network latency
- Verify that the latest version of VMware Tools is installed in the virtual machines
- Verify that the portgroup and virtual switch are not configured for promiscuous mode.
Promiscuous mode causes it to detect all frames passed on the virtual switch that are allowed under the
VLAN policy for the associated portgroup.
1. Select the ESXi/ESX host in the inventory.
2. Click the Configuration tab.
3. In the Hardware section, click Networking.
4. Click Properties of the virtual switch for which you want to verify promiscuous mode.
5. Select the virtual switch or portgroup you wish to modify and click Edit.
6. Click the Security tab.
7. From the Promiscuous Mode drop down menu, verify whether it is configured or not.
- Verify that your host is not overloaded. Networking relies on available processor resources. If the CPUs on
the host are being used at capacity, network performance suffers.

Storage Latency
1. Migrate the virtual machines to a different storage location.
2. Using esxtop
To monitor storage performance on a per-virtual machine basis:
- Start esxtop by typing esxtop at the command line.
- Type v to switch to disk view (virtual machine mode).
- Press f to modify the fields that are displayed.
- Press b, d, e, h, and j to toggle the fields and press Enter.
- Press s and then 2 to alter the update time to every 2 seconds and press Enter.
Column Description

This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands
such as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual
CMDS/s machine being monitored.

In most cases, CMDS/s = IOPS unless there are a lot of metadata operations (such as SCSI reservations)

DAVG/cmd This is the average response time in milliseconds per command being sent to the device.

KAVG/cmd This is the amount of time the command spends in the VMkernel.

This is the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG =
GAVG/cmd
GAVG

3. Verify Kernel.log on host


/var/log/vmkernel.log

You might also like