Posts Tagged ‘vmware’

VMWare ESX Thin Provisioned Disks And Credit Cards…….

Friday, March 11th, 2011

In my youth, I was offered credit cards. I thought they were great, allowing me lots of instant retail gratification, I was on cloud 9 on the high street…….then I had to repay them. My cloud evaporated and I came back down to earth with a bump right on my wallet !

The principal behind thinly provisioned disks on VMWare ESX storage LUNs would appear to be along the same lines as that of credit cards. Utilising more than you actually have at your disposal.

With credit cards the impact is that later down the line you have to repay the money you spent that you don’t actually have (ouch !). With thinly provisioned disks the pain is that of not being able to reclaim free disk space without a lot of work.

With normal *fat* disk allocation when you create a volume all the space is allocated at creation time. So if you have a 100GB storage LUN and you create a 50GB volume on it, that 50GB is immediately deducted from the 100GB, leaving you with only 50GB free space on the storage LUN.

But with thin provisioned disks, if you allocate a 50GB *thin* volume, the space is only deducted from the storage LUN as it gets used. So if you only write 20GB of files to the 50GB thin volume, the storage LUN will report 80GB of free space out of the total 100GB.

But here’s the kicker, if you then delete 10GB of those files, the free space remains at only 80GB, the space does not get returned back to the storage LUN. This is because Windows/Linux does not actually delete files, it simply marks those blocks as being avilable for use in the file table. As the blocks do not actually get cleared, VMWare does not pick up on this and remains ignorant to the free’d up filespace.

In order to shrink the disk and reclaim the space, you have to actually clear the physical blocks so that they are actually empty. You can do this using the free tool Sdelete from Sysinternals. You use the -c switch (which tells it to zero out the blocks). This is a rather I/O demanding task best done out of peak useage hours.

Once sdelete has completed you will need to migrate the VM files to another datastore for VMWare to re-read the free blocks and give back the cleared up space. You may then have to re-migrate the VM files back to where you actually want them to run from if your fussy about the location of your VM’s on your datastores.

I’m not saying thin disk provisioning has no place, it’s great for R&D, labs and proof of concept type setups where you will be setting up and ripping down and don’t really care too much about long term storage levels. But for production systems, the administration overhead is just too great for my liking.

disk squish

Correct ESX NTP Time Periodically

Friday, May 14th, 2010

Just had an odd one. Everytime I rebooted any one of my Windows VM’s, when it came back up the clock would be out by a varying amount.

VM’s on the same physical host would be out by identical amounts, but Vm’s on different physical hosts would be out by different amounts.

The physical VM host servers are all running NTPd, and are configured to sync to the pool.ntp.org server lists, so I thought this was all sorted, seems I was wrong.

Each physical box, even though it was syncing with NTP peers in the outside world, was experiencing varying degrees of skew on the clock. The amount was up to 20mins across all the physical nodes.

To correct this, I have created a cron job on each server with the following entry

0 9 * * * /usr/sbin/ntpdate -s -b -p 4 -u 0.pool.ntp.org

Now the server will correct the clock skew once each day at 9am, and hopefully now I can forget all about this :-?

Apache2: No Listening Sockets Available…….

Friday, January 8th, 2010

Following on from the issue(s) I had with my OpenVPN server, I was still not happy/confident that in the event of a reboot or restart for any reason (wether deliberate or unintentional) all the necessary processes and services would startup successfully without some post boot intervention.

This in mind, I decided to create another server to transfer the live service(s) onto so I could get some much needed downtime on the existing server. Owing to the lack of another physical machine to do this with, I decided to create an virtual machine on our ESX cluster.

The initial steps were pretty easy, create a VM with x1 Vcpu, 1GB RAM, 30GB vdisk and x2 network interfaces. I installed Ubuntu server 9.04 i386 from the .iso and enabled LAMP and SSH. Installation completed and the system rebooted. Watching the console I saw that everything started at bootup time as it should.

Next step was to copy the websites across from the live server to this one. I installed NFS and mounted /var/www from the live server and copied all the sites across along with the relevant config files. I modified the config files to allow for the change of ip address and then restarted the system.

And that was when it started to go wrong. I only caught a glimpse of the error the first time I restarted the system. After reboot, I logged in a checked and apache was not running. Looking in /var/log/syslog did not show any clues why, even the error message itself did not seem to have been captured.

So I rebooted again and watched the console carefully, and this time saw the error :

apache2: no listening sockets available

along with

could not bind to address x.x.x.x:80 (where x was the ip address of the server)

Googling this made mention several times of other processes or programs perhaps using and blocking the socket/port in question, but this was happening at boot time, nothing else really had a chance to be up and running yet ? to test, I tried starting apache from the command prompt after bootup and it started fine, so what was going on

The main difference between this server and the live one was that this one was in a VM. Looking at the runlevel start scripts I noticed apache gets in there really early with S02apache2. Given my previous post where OpenVPN was trying to start before bridging on the live server, I wondered if perhaps the interface that Apache was trying to bind to was perhaps not quite ready at the time it tried during the boot process.

So I moved S02apache2 to S09apache2 for all runlevels and rebooted the VM again. Result, Apache was now loading as part of the boot process with no errors or manual intervention required.

So if you are also having issues with processes that do not start at boot time, but start fine after boot when you initiate them from the command prompt, you may just need to move them to a little late in the boot process to give other things time to start up beforehand.

I don’t profess to be the best system admin in the world, but I always get to the cause eventually :o)

OpenVPN TCP/UDP: Socket Bind Failed…….

Tuesday, January 5th, 2010

My faithful office OpenVPN server required a reboot before the start of the Christmas holidays to install some updates and patches.

The server came back ok and seemed to be fine, so I thought nothing much of it and went home for a few days off……until the emails started arriving from users stating they could not connect to the vpn from their homes !

So Boxing day I trudged through the freezing cold to the office to logon to the box locally to find out what was going on (was obviously something big as I could not connect in either).

Initial findings were that the OpenVPN process did not seem to be running….? so I issued ‘/etc/init.d/openvpn start’ and it started fine. so, what caused it to stop running ? peeking into /var/log/messages.log I found the following lines


TCP/UDP: Socket bind failed on local address x.x.x.x:1194: Cannot assign requested address

Exiting

Googling this error revealed a few other people had also had this issue, but there was nothing definitive as to the cause.

Was another process grabbing port 1194 and preventing openvpn from starting up ? I decided to reboot the server to check, and there it was again, the openvpn process failed to start with the same error message, but nothing else was using port 1194 when I checked, and when I started openvpn manually after reboot it came up fine, what was going on ?

Going back over the installation steps I took to install and setup openvpn, I remembered that it requires the use of the bridge-utils app for bridging the ethernet interfaces on the server. I wondered if there was some kind of race condition happening whereby birdge-utils had not started in time for openvpn to bind to the virtual tap interface that gets created.

So I stopped openvpn with ‘/etc/init.d/openvpn stop’ an then stopped bridging using ‘/etc/openvpn/scripts/bridge-stop’

I then tried to start openvpn without bridge-utils running and got the same error that I was seeing in the syslog when I rebooted the system. So that was the problem, but how to fix ?

First off I need to check which run levels openvpn and bridge-utils were being loaded at. ‘checkconfig -l | grep -E “openvpn|bridge”‘ showed both loading at runlevels 2,3,4 and 5.

Looking into the run level 5 in /etc/rc5.d I could see the x2 scripts used for starting up these processes at boot time, S01openvpn and S06bridge-start. As the startup scripts execute in numerical order, openvpn was being started before bridge-start. Simply moving S01openvpn to S10openvpn was all that was required. A subsequent reboot of the server showed that the openvpn process was already running when I logged on to the server post boot.

then the trek back home again in the freezing cold :o(