Disk IO slow on ESXi, even slower on a VM (freeNAS + iSCSI)

Disk IO slow on ESXi, even slower on a VM (freeNAS + iSCSI)

I have a server with ESXi 5 and iSCSI attached network storage. The storage server has 4x1Tb SATA II disks in Raid-Z on freenas 8.0.4. Those two machines are connected to each other with Gigabit ethernet, isolated from everything else. There is no switch in between. The SAN box itself is a 1U supermicro server with a Intel Pentium D at 3 GHz and 2 Gigs of memory. The disks are connected to a integrated controller (Intel something?).
The raid-z volume is divided into three parts: two zvols, shared with iscsi, and one directly on top of zfs, shared with nfs and similar.
I ssh'd into the freeNAS box, and did some testing on the disks. I used ddto test the third part of the disks (straight on top of ZFS). I copied a 4GB (2x the amount of RAM) block from /dev/zero to the disk, and the speed was 80MB/s.
Other of the iSCSI shared zvols is a datastore for the ESXi. I did similar test with time dd .. there. Since the dd there did not give the speed, I divided the amount of data transfered by the time show by time. The result was around 30-40 MB/s. Thats about half of the speed from the freeNAS host!
Then I tested the IO on a VM running on the same ESXi host. The VM was a light CentOS 6.0 machine, which was not really doing anything else at that time. There were no other VMs running on the server at the time, and the other two "parts" of the disk array were not used. A similar dd test gave me result of about 15-20 MB/s. That is again about half of the result on a lower level!
Of course the is some overhead in raid-z -> zfs -> zvolume -> iSCSI -> VMFS -> VM, but I don't expect it to be that big. I belive there must be something wrong in my system.
I have heard about bad performance of freeNAS's iSCSI, is that it? I have not managed to get any other "big" SAN OS to run on the box (NexentaSTOR, openfiler).
Can you see any obvious problems with my setup?

Solutions/Answers:

Answer 1:

To speed this up you’re going to need more RAM. I’d start with these some incremental improvements.

Firstly, speed up the filesystem:
1) ZFS needs much more RAM than you have to make use of the ARC cache. The more the better. If you can increase it at least 8GB or more then you should see quite an improvement.
Ours have 64GB in them.

2) Next, I would add a ZIL Log disk, i.e. a small SSD drive of around 20GB. Use an SLC type rather than MLC. The recommendation is to use 2 ZIL disks for redundancy. This will speed up writes tremendously.

3) Add an L2ARC disk. This can consist of a good sized SSD e.g. a 250GB MLC drive would be suitable. Technically speaking, a L2ARC is not needed. However, it’s usually cheaper to add a large amount of fast SSD storage than more primary RAM. But, start with as much RAM as you can fit/afford first.

There are a number of websites around that claim to help with zfs tuning in general and these parameters/variables may be set through the GUI. Worth looking into/trying.

Also, consult the freenas forums. You may receive better support there than you will here.

Secondly: You can speed up the network.
If you happen to have multiple NIC interfaces in your supermicro server. You can channel bond them to give you almost double the network throughput and some redundancy.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004088

Answer 2:

Some suggestions.

  • RAID 1+0 or ZFS mirrors typically perform better than RAIDZ.
  • You don’t mention the actual specifications of your storage server, but what is your CPU type/speed, RAM amount and storage controller?
  • Is there a network switch involved? Is the storage on its own network, isolated from VM traffic?

I’d argue that 80 Megabytes/second is slow for a direct test on the FreeNAS system. You may have a disk problem. Are you using “Advanced Format” or 4K-sector disks? If so, there could be partition alignment issues that will affect your performance.

Answer 3:

What you are probably seeing is not a translation overhead but a performance hit due to a different access pattern. Sequential writes to a ZFS volume would simply create a nearly-sequential data stream to be written to your underlying physical disks. Sequential writes to a VMFS datastore on top of a ZFS volume would create a data stream which is “pierced” by metadata updates of the VMFS filesystem structure and frequent sync / cache flush requests for this very metadata. Sequential writes to a virtual disk from within a client again would add more “piercing” of your sequential stream due to the guest’s file system metadata.

The cure usually prescribed in these situations would be enabling of a write cache which would ignore cache flush requests. It would alleviate the random-write and sync issues and improve the performance you see in your VM guests. Keep in mind however that your data integrity would be at risk if the cache would not be capable of persisting across power outages / sudden reboots.

You could easily test if you are hitting your disk’s limits by issuing something like iostat -xd 5 on your FreeNAS box and looking at the queue sizes and utilization statistics of your underlying physical devices. Running esxtop in disk device mode also should help you getting a clue about what is going on by showing disk utilization statistics from the ESX side.

Answer 4:

I currently use FreeNas 8 with two Raid 5 sSata arrays attached off the server.
The server has 8GB of ram and two single core Intel Xeon processors.

My performance has been substantially different to what others have experienced.

I am not using MPIO or any load balancing on NICs.
Just a single Intel GIGE 10/100/1000 server NIC.

Both arrays have five 2.0TB drives equating to roughly 7.5 TB of space RAID5.

I utilize these two arrays for two different functions:

1) Array #1 is attached to an Intel HPC server running Centos 5.8 and PostGres.
The file system is ext4.
I have been able to get a peak of 800 Mbps/sec to this array.

2) Array #2 is being used for Citrix Xenserver 6 Centos VMs.
These 200GB drive partitions are providing outstanding performance.
Each of the VMs are running real-time SIP signaling servers that are supporting 5-10K concurrent calls at 500-1000 CPS.
The local database writes the CDRs to these partitions before the main database server copies them into it’s tables.
I have been able to also get a peak of 800 Mbps/sec to this array.

Now, I would not suggest using a FreeNas iSCSI array as my mainstay solution for large database partitions. I have that running on a 10K RPM SAS Raid 10 partition on the database server.

But, there is absolutely no reason that you cannot send your data traffic across a simple switched Ethernet network to a reasonably configured server running FreeNAS and send it at the theoretical peak of GIGE.

I have yet to test the read throughput, but RAID5 is slower on reads. So, it should be as good or better.

FreeNAS consistently scales well as more traffic demands are made of it.
Any CentOS 5.8 server is going to use it’s own cache to buffer the data before sending it to the iSCSI arrays. So, make sure you have ample memory on your VM hosts and you will be happy with your performance.

Nothing tests a technology better than database applications and real-time traffic applications in my opinion.

I too think that adding a system memory write-through cache feature would be beneficial, but my performance numbers show that FreeNAS and iSCSI are performing stellar!

It can only get better.

Answer 5:

First – VMware performance is not really an issue of iSCSI (on FreeNAS) or NFS 3 or CIFS (windows) protocol, its an issue of XFS filesystem writes and the ‘sync’ status.

FreeNAS has a property called “sync” and it can be set on or off. “zfs sync=always” is set by default and causes every write to be flushed. This dramatically slows performance but guarantees disk writes. For example, running VMware ESXI 5.5 and FreeNAS on modern equipment (3.x GHZ CPU, Seagate 7200 HD, 1GigE net) without strain typically results in 4-5MB/sec performance on a VMware clone or Windows robocopy or other ‘write’ operation. By setting “zfs sync=disabled ” the write performance easily goes to 40MBs and as high as 80Mbs (that’s Megabytes per second). Its 10x-20x faster with sync-disabled and is what you would expect…. BUT the writes are not as safe.

SO, I use sync=disabled ‘temporarily’ when I want to do a bunch of clones or signification robocopy etc. Then I reset sync=always for ‘standard’ VM operation.

FreeNAS has a ‘scrub’ that will verify all the bytes on the disk… takes about 8hrs for 12TB and I run it once a week as a followup to make sure that bytes written during sync=disabled are OK.

References

Mysterious “fragmentation required” rejections from gateway VM

Mysterious “fragmentation required” rejections from gateway VM

I've been troubleshooting a severe WAN speed issue.  I fixed it, but for the benefit of others:
Via WireShark, logging, and simplifying the config I narrowed it down to some strange behaviour from a gateway doing DNAT to servers on the internal network.  The gateway (a CentOS box) and servers are both running in the same VMware ESXi 5 host (and this turns out to be significant). 
Here is the sequence of events that happened - quite consistently - when I attempted to download a file from an HTTP server behind the DNAT, using a test client connected directly to the WAN side of the gateway (bypassing the actual Internet connection normally used here):

The usual TCP connection establishment (SYN, SYN ACK, ACK) proceeds normally; the gateway remaps the server's IP correctly both ways.
The client sends a single TCP segment with the HTTP GET and this is also DNATted correctly to the target server.
The server sends a 1460 byte TCP segment with the 200 response and part of the file, via the gateway.  The size of the frame on the wire is 1514 bytes - 1500 in payload.  This segment should cross the gateway but doesn't.
The server sends a second 1460 byte TCP segment, continuing the file, via the gateway.  Again, the link payload is 1500 bytes.  This segment doesn't cross the gateway either and is never accounted for.
The gateway sends an ICMP Type 3 Code 4 (destination unreachable - fragmentation needed) packet back to the server, citing the packet sent in Event 3.  The ICMP packet indicates the next hop MTU is 1500.  This appears to be nonsensical, as the network is 1500-byte clean and the link payloads in 3 and 4 already were within the stated 1500 byte limit.  The server understandably ignores this response.  (Originally, ICMP had been dropped by an overzealous firewall, but this was fixed.)
After a considerable delay (and in some configurations, duplicate ACKs from the server), the server decides to resend the segment from Event 3, this time alone.  Apart from the IP identification field and checksum, the frame is identical to the one in Event 3.  They are the same length and the new one still has the Don't Fragment flag set.  However, this time, the gateway happily passes the segment on to the client - in one piece - instead of sending an ICMP reject.
The client ACKs this, and the transfer continues, albeit excruciatingly slowly, since subsequent segments go through roughly the same pattern of being rejected, timing out, being resent and then getting through.

The client and server work together normally if the client is moved to the LAN so as to access the server directly.
This strange behaviour varies unpredictably based on seemingly irrelevant details of the target server.
For instance, on Server 2003 R2, the 7MB test file would take over 7h to transmit if Windows Firewall was enabled (even if it allowed HTTP and all ICMP), while the issue would not appear at all, and paradoxically the rejection would never be sent by the gateway in the first place if Windows Firewall was disabled.  On the other hand, on Server 2008 R2, disabling Windows Firewall had no effect whatsoever, but the transfer, while still being impaired, would occur much faster than on Server 2003 R2 with the firewall enabled.  (I think this is because 2008 R2 is using smarter timeout heuristics and TCP fast retransmission.)
Even more strangely, the problem would disappear if WireShark were installed on the target server.  As such, to diagnose the issue I had to install WireShark on a separate VM to watch the LAN side network traffic (probably a better idea anyway for other reasons.)
The ESXi host is version 5.0 U2.

Solutions/Answers:

Answer 1:

You can’t drop ICMP fragmentation required messages. They’re required for pMTU discovery, which is required for TCP to work properly. Please LART the firewall administrator.

By the transparency rule, a packet-filtering router acting as a
firewall which permits outgoing IP packets with the Don’t Fragment
(DF) bit set MUST NOT block incoming ICMP Destination Unreachable /
Fragmentation Needed errors sent in response to the outbound packets
from reaching hosts inside the firewall, as this would break the
standards-compliant usage of Path MTU discovery by hosts generating
legitimate traffic. — Firewall Requirements – RFC2979 (emphasis in original)

This is a configuration that has been recognized as fundamentally broken for more than a decade. ICMP is not optional.

Answer 2:

I finally got to the bottom of this. It turned out to be an issue with VMware’s implementation of TCP segmentation offloading in the virtual NIC of the target server.

The server’s TCP/IP stack would send one large block along to the NIC, with the expectation that the NIC would break this into TCP segments restricted to the link’s MTU. However, VMware decided to leave this in one large segment until – well, I’m not sure when.

It seems it actually stayed one large segment when it reached the gateway VM’s TCP/IP stack, which elicited the rejection.

An important clue was buried in the resulting ICMP packet: the IP header of the rejected packet indicated a size of 2960 bytes – way larger than the actual packet it appeared to be rejecting. This is also exactly the size a TCP segment would be on the wire if it had combined the data from both of the segments sent thus far.

One thing that made the issue very hard to diagnose was that the transmitted data actually was split into 1500-byte frames as far as WireShark running on another VM (connected to the same vSwitch on a separate, promiscuous port group) could see. I’m really not sure why the gateway VM saw one packet while the WireShark VM saw two. FWIW, the gateway doesn’t have large receive offload enabled – I could understand if it did. The WireShark VM is running Windows 7.

I think VMware’s logic in delaying the segmentation is so that if the data is to go out a physical NIC, the NIC’s actual hardware offload can be leveraged. It does seem buggy, however, that it would fail to segment before sending into another VM, and inconsistently, for that matter. I’ve seen this behaviour mentioned elsewhere as a VMware bug.

The solution was simply to turn off TCP segmentation offloading in the target server. The procedure varies by OS but fwiw:

In Windows, on the connection’s properties, General tab or Networking tab, click “Configure…” beside the adapter, and look on the Advanced tab. For Server 2003 R2 it’s given as “IPv4 TCP Segmentation Offload.” For Server 2008 R2 it’s “Large Send Offload (IPv4).”

This solution is a bit of a kludge and could conceivably impact performance in some environments, so I’ll still accept any better answer.

Answer 3:

I had the same symptoms and the problem turned out to be this kernel bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754294

Answer 4:

I have seen the same issue on Linux hosts.

The solution was to deactivate Large Receive Offload (LRO) on the network driver (vmxnet) of the gateway machine.

To quote the VMware KB:

LRO reassembles incoming network packets into larger buffers and transfers the resulting larger but fewer packets to the network stack of the host or virtual machine. The CPU has to process fewer packets than when LRO is disabled, which reduces its utilization for networking.

See http://kb.vmware.com/kb/2055140

Thus, packets arriving on the gateway machine were merged by the network driver and sent to the network stack, that dropped them as bigger than the MTU…

References

Networking with VMware, physical adapters and iSCSI

Networking with VMware, physical adapters and iSCSI

New VMware user here, setting up my first environment. I have an ESXi host that has four physical NiCs (NiC0, NiC1, NiC2, NiC3).
I have installed ESXi, the VCVA appliance, and several VMs successfully. They are connected to the local storage currently, but I want to connect to our iSCSI SAN.
Physically NiC0 and NiC1 are connected to our regular network switch. NiC2 and NiC3 are connected to our iSCSI network, which is a separate network.
So what I have done is set NiC0 and NiC1 as active for the management network on the ESXi host. I left NiC2 and NiC3 unchecked.
When I use the vSphere client to create the iSCSI connection, it can't see NiC2 or NiC3. Do I need to enable all four NiCs to be able to use them in vSphere? 
If I enable NiC2 and NiC3, they say they are disconnected, because they are connected to our iSCSI network and have no regular network connection.
Am I way off track here?

Solutions/Answers:

Answer 1:

You need to create another vSwitch to contain your iSCSI ports. This is separate from the vSwitch you created that holds your management and virtual machine traffic:

Something like:

enter image description here

Once you get past this point, the process is straightforward.

Answer 2:

It sounds like you didn’tsetup the vSwitches for iSCSI using NIC2 and NIC3.

If not…

See here for information on the basics to get you started:

Configuring software iCSI setup

and their Youtube video:

http://www.youtube.com/watch?feature=player_embedded&v=FzXYUzYTJVE

The information is for v4 but will get you in the right direction.

EDIT: better walkthrough video for v5 here: http://www.youtube.com/watch?v=Mu-HyD3E3cw

References

Migrating virtual (VMWare) workstations from AMD to Intel hardware

Migrating virtual (VMWare) workstations from AMD to Intel hardware

We're a small dev team that runs instances VMWare Ubuntu hosted on AMD x64 machines. Our hardware will be upgraded to Intel i7, but we want to continue to use the virtual images we've built. But according to the page on VMWare, there are

problems when moving virtual machine guests between hardware hosts using different instruction-sets (such as found in 64-bit Intel and AMD CPU

How do you run the same virtual guest image on different hardware, if the instruction sets are different? Is there additional VMWare product tool to use for this purpose?

Solutions/Answers:

Answer 1:

You would only theoretically have problems if you live migrated your VMs from an AMD processor to an Intel processor (i.e. vMotion), so vSphere just won’t allow a vMotion in this scenario. If you shut down the VM and then start it up again on the new processor, you will be fine, provided the guest OS isn’t particularly processor-dependent. (For the most part this is Windows 2000 and older operating systems.)

Answer 2:

Can you provide more information on the VMware infrastructure?

It sounds like you’re running some variant of VMware vSphere rather than the VMware Workstation product.

  • Is your hardware a collection of desktops or are they physical servers?
  • Do you know the makes/models involved?
  • Will there be any change in the versions of VMware involved as well?

In either case, you’d be looking at cold migrations in order to make the existing guests work. That’s essentially the process of shutting the VM down, moving its physical location (or the server it’s running on), then powering up.

AMD -> Intel is not a problem in this case.

Also note that if there’s going to be a version change in the VMware product being used, there’s also the matter of upgrading the actual virtual machine’s hardware version and its guest tools.

References

Migrate VMware Server to ESXi

Migrate VMware Server to ESXi

What's the recommended approach/tool to migrate VMs from VMware Server 1.0.7 (under Win2003 x64 Enterprise) to ESXi 4 ? 
Some of the VMs are running Win2003 Enterprise x86, others run Ubuntu JeOS 8.04.

Solutions/Answers:

Answer 1:

vmware converter will be the simplest way to do it. Once you have ESXi built and the vms copied to a location, just launch converter, you will be asked for the source machine then to give it a destination, that will be your ESXi server, then follow the steps, even if the import fails it will not affect your source vm.

Answer 2:

There is also a standalone, commercial version of Vmware converter. It is a boot CD (coldclone.iso) that you can use to convert a machine while it is off. The disc contains a pre-installed Windows 2003 image that runs a built-in version of converter. Since you boot from the CD, none of the server’s processes start-up so you can get a good clone of the hard drive.

To be able to to download this CD image you need to be properly licenced.

Answer 3:

VMWare Converter is the simple and easy way. If you have issues with a straight convert, I would suggest two other ideas:

Point the VMWare Converter at the live system and select convert physical machine.

Boot Ghost from Unix (or the cloner of your choice) inside of the VM, store the output to an external server, Create a VM in ESXi, boot from the cloner disk and download the image.

When I tried to upgrade using the VMServer files, the Converter kept crashing because it didn’t like the linked clone VMs with multiple drives. I used the second method to due my first round of migrations and then when VMWare Converter 4.0 came out, I found that the first method started working for me.

Answer 4:

Just and FYI the converted VM would not run. I got an error about a line in the VMX file. Something about a variable having been already defined. I had to delete the offending line to get the vm to run.

Just a heads up for those who might run into a similar issue.

Cheers,
Dave.

References

ESXi 4.0 resize a VM hard drive size

ESXi 4.0 resize a VM hard drive size

I am using ESXi 4.0, I have a few VM's on the machine.  I have just copied teh .vmx and .vmdk to create a new copy of the VM.  They are normally 100GB but I want to create a few of these that are 50GB.  Is there a way of resizing the copied VM or do I need to create a new one and go through the whole install process?
I tried just going into the edit settings section when the VM was off and changing it but it just reverts back to 100GB.
Thanks.

Solutions/Answers:

Answer 1:

You can use the VMWare Converter to clone the machine, but change the size of the virtual disk in the process. When configuring the Source Data, indicate that you want to “Select volumes and resize to save or add space”. You will then be able to specify the size of the destination drive in the “New Disk Space” column below.
If you’re simply trying to resize rather than clone and resize at this point, you can then remove the original once you verify the converted version works.

Answer 2:

You can expand virtual hard disks, but you can’t shrink them natively in ESX(i).

One popular strategy is to attach a second smaller virtual HD to the VM, run a cloning tool (Ghost, Acronis…) in the VM to clone the first disk to the second one and then replace the first disk with it.

Answer 3:

An other solution constist to :
– stop the VM
– Expand the disk size in the VM settings
– boot the VM with a Gparted live CD
– move and resize partition
– and reboot the VM from the hard disk

Answer 4:

The method discussed by joe is actually the recommended method by VMWare, however vizioncore makes a product which will actually do a disk shrink as well as other things. This might be something you would want to consider as it is a very nice product with quite a few features.

References

ESXi network setup for isolated internal virtual machines

ESXi network setup for isolated internal virtual machines

Using ESXi v5.1 and vSphere, my networking is setup like so:

One standard Switch: vSwitch0
vSwitch0 has one uplink physical adapter (Internet connected)
vSwitch0 VM Network has 3 virtual machines (Web Accessible)

I'd like to add several "internal" VMs that are accessible only to the 3 Public facing VMs that are currently on vSwitch0.  How should I do this?  I know I could add a another "internal" vSwitch that is not bound to a physical uplink, then employ a "gateway" VM that is dual-homed, but it seems there should be an easier way.  Can I accomplish this strictly with a networking setup?  If so, how?
(Please feel free to use whatever IP scheme you need to illustrate your answer)
Thanks much!

Solutions/Answers:

Answer 1:

  1. ESXi 5.1 and vSphere are synonomous. They are the same thing. I prefer to call it vSphere since that’s what VMware calls it.

  2. Create a new vSwitch for the internal VM’s. Do not bind this vSwitch to a physical NIC.

  3. Connect the internal VM’s to this internal vSwitch.

  4. Add a new vNIC to each external VM and connect it to the “internal” vSwitch.

  5. Configure the internal vNIC appropriately on each VM so that they’re all on the same internal subnet (whatever RFC1918 address range you choose to use).

Now each external VM is multihomed and will have a connection to both the external and the internal network and should be able to communicate on the internet as well as to the internal VM’s.

Of course, this is just one of the possible ways to do this.

Answer 2:

Associating a port-group to a vSwitch does not necessarily give it upstream network access. The upstream device has to have a corresponding interface with a like configure IP/subnet for those devices to talk to.

Create an “internal only” port group. Give each virtual machine needing access to the uplink network and the internal network a second NIC that faces internal.

How it would look:

Router: 192.168.0.1/24

ESXi Host: 192.168.0.2/24

Public Server 1:

NIC1: Assign to existing “VM Network” port group. IP 192.168.0.11/24, Default Gateway 192.168.0.1

NIC2: Assign to “Internal Only” port group 172.16.0.11/24, No Default Gateway

Public Server 2:

NIC1: Assign to existing “VM Network” port group. IP 192.168.0.12/24, Default Gateway 192.168.0.1

NIC2: Assign to “Internal Only” port group 172.16.0.12/24, No Default Gateway

Public Server 3:

NIC1: Assign to existing “VM Network” port group. IP 192.168.0.13/24, Default Gateway 192.168.0.1

NIC2: Assign to “Internal Only” port group 172.16.0.13/24, No Default Gateway

Internal Only Server 1:

NIC1: Assign to “Internal Only” port group 172.16.0.21/24, No Default Gateway

Internal Only Server 2:

NIC1: Assign to “Internal Only” port group 172.16.0.22/24, No Default Gateway

Internal Only Server 3:

NIC1: Assign to “Internal Only” port group 172.16.0.23/24, No Default Gateway

Answer 3:

just add networks with specific vlan in the vswitch then add nics in said vlan pointing to specific network … et voila!

if notice that the phisical interface is a trunk port will not ship between vlan only to specific vlan packets are tagged (segregation whithin the vswitch)

you can also in your case add phisical interface to specific networks ( for example a vlan 10 with internet access then add a nic in the vm so that it connects to an internal vlan and add another phisical in that vlan for the internal network ( you can of course expand on that )

also vcenter/vsphere ==/== Esxi

ESXi is the OS the bare metal runs vcenter/vsphere is the management applications.

References

Virtualization for hardware resiliency?

Virtualization for hardware resiliency?

Can anyone tell me if it is possible to pool several physical servers to run a resilient virtualization environment. Our servers are getting more and more critical to our clients and we want to do everything we can to improve resiliency in the event of a hardware failure. I have used desktop VMs but I am not familiar with what is possible in enterprise level VMs.
The ideal would be to have a few physical servers in our datacenter. A few VMs would be shared among these to run a web server, application server, and database server. If one physical server failed, the VMs should switch to one of the other servers and continue running without any interruption.
Can this be accomplished? I realise that even Google goes down from time to time, so I am not looking for perfection; just an optimal solution.

Solutions/Answers:

Answer 1:

It doable, and we do something similar, just without the automatic part.

As @ewwhite pointed out, the key is having a shared storage pool that visible to multiple host servers, so if one host goes down, it doesn’t much matter a lot, because another host can take over. Setting up the kind of unnoticeable, interruption-free automatic failover you’re asking about is not easy (or cheap), and frankly a lot more trouble than it’s worth, at least for the vast majority of use-cases out there. Modern hardware doesn’t fail a lot, unless it’s set up really badly, so you’ll get more mileage out of making sure it’s set up right and in an environment that’s within the operational ranges of the equipment.

We use the fail-over and high availability functions of our systems for only two things, really. The first is in disaster recovery (if our main site loses power or explodes, or what have you, we have the critical parts are mirrored at a second facility) and the second is in avoiding maintenance windows. We use blade servers, and ESX/vSphere and between having the ability to fail-over to a secondary site, and the ease of using vMotion to move VMs between hosts, there’s very little that we can’t do without a service interruption.

I would focus on getting that set up first – once you’re able to (manually) fail things around to where-ever, you may decide that getting it work automatically is more expensive and difficult than its worth. It sounds easy enough and great in theory, but in practice it can be a real pain to get everything working properly in clusters or in a distributed-guest set up.

Answer 2:

This is an excellent reason to virtualize. As application availability, rather than individual (physical) server uptime, become more important to businesses, many organizations find that they can attain a higher level of reliability through virtualization.

I’ll use VMWare and Xen as examples, but with some form of shared storage that’s visible to two or more host systems, virtualized guests can be distributed and load-balanced across physical servers. The focus begins to be the quality of the shared storage solution, management and the networking/interconnects in the environment.

However, one bit of caution… You should evaluate what type of hardware and environmental situations pose a threat. Quality server-class equipment includes many redundancies (fans, power supplies, RAID, even RAM)… Modern hardware does not just fail often. So avoid overreacting by building an unnecessarily-complex environment if spec’ing higher-end servers could help eliminate 90% of the potential issues.

Answer 3:

It sounds like VMware FT might be what you’re looking for. It keeps a “shadow instance” of each virtual machine in lockstep with each source VM and allows for instantaneous failover between the two instances. More here:

http://www.vmware.com/products/fault-tolerance/overview.html

Answer 4:

The any interruption part is quite an ask, specially that today you’re going from what appears to be standard servers with no resiliency?

Virtualisation is an option but for the sake of full disclosure you should make an informed decision between the following,

  1. Small interruption, in the order of a few mins.
  2. No interruption (we’re talking miliseconds).

(2) is normally very,

  1. Expensive – you need N+N hardware capacity. I.e. for every server you’re running, you have a full standby server running the exact same software ready to take over in case of a hardware failure.
  2. Restrictive – the software you use for that ensures that the machines are “in sync”, normally over ethernet. That means that if you’re network slows down, it will slow your application down to ensure things remain in lockstep. To ensure that doesn’t happen those machines have to be in the same Datacentre to get any kind of performance.

Virtualisation with VMware-FT is on solution. Xen has its equivalent with everRun, and there is the bare metal equivalent (no hypervisor).

(1) may well be all you need (Clustering)

  1. Depending on the application this can offer equal failure to (2). E.g. NFS servers like NetApp can offer a seamless failover, and clients continue with no failures and only a brief interruption.
  2. “Slightly” more tolerant of software failures. Because none deterministic CPU instructions are not in lockstep, a number of bugs like race conditions won’t be triggered.
  3. Could allow you to run different versions of the software. For e.g. upgrade Node 1 of cluster to service pack 1 of Windows Server 2008, confirm its ok, upgrade Node 2 to Service Pack of Windows Server 2008.

I don’t mean to sell clustering vs fault tolerance, or bare metal vs hypervisor, but when it comes to High Availability hopefully the above illustrates a large number of questions you need to answer first before implementing it.

  1. What is the maximum downtime tolerated by users (be realistic)
  2. What are the outage domains you will tolerate? Physical server? Software? Layer 2 network? Layer 3? Datacentre?
  3. What are the performance requirements of the application? Virtualisation is not for everything, and only very recently that clock sensitive applicaitons like Active Directory were accpted on Virtual Machines (and it is certianly not common practice). Regardless of whether you use the latets hypervisor and chipsets, virtualisation will still mean a hit on performance, throughput, and latency.
  4. Budget tha you need to work within.

These requirements can be translated to things like MTTF, and depending on budget and skillsets of your team, some solutions will just be a no go.

References

Running All Microsoft products vs VMWare/backup/AV for small network

Running All Microsoft products vs VMWare/backup/AV for small network

Recently a company that I work for is getting IT advice to upgrade their infrastructure (for approximately 30 on prem users).  We are a not-for-profit and get deep discounts on many products (primarily MS products). So our options are:  Pay for VM Ware, pay for Backup suite, pay for AV (annual cost indeterminate); or use already paid for Hyper V and SCCM (annual cost $0)
So the original end goal is to have 3 VM's all in Hyper-V: 
1) a DC (DHCP, DNS, File server, print server)
2) finance server with SQL Server instance
3) System Center - all modules (for backup, update services, and AV)
Here's the questions: 
1) For a network this small (3 servers, 30 in prem users) is there enough differentiation in the products to justify a paid product (VMWare, different backup, different AV) vs an already paid for product (Hyper V, SCCM)   
2) Is there any appreciable risk in going with the pure Microsoft options? 

Solutions/Answers:

Answer 1:

If you are a 501(c)3 there is no reason to buy VMware. In fact, you should use TechSoup.org to get all the Microsoft and Cisco you can. Ideally you would be keeping track of donation cycles and timelines and planning at least a year ahead on what donation requests you are going to make from Tech Soup next time the cycle comes around.

This is especially true for Cisco products, which are often unavailable outside of a two months window or so (which I believe is somewhere in the October to December range).

I came from a 501(c)3 and we were able to build an excellent network with Cisco and Microsoft. I’m now at a much larger semi-governmental organization with much larger operating and capital budgets and we are moving away from VMware to 100% Hyper-V. At your size, I can’t think of a single thing that VMware offers that would be worth the investment. VMware is absolutely more expensive – even if you throw in SC-VMM on the Hyper-V side. And that’s not even counting whatever discounts and/or donation consideration you may be getting for Microsoft.

If you’re at a non-profit and an organization comes in the door to consult for you and they are not recommending you take the fullest advantage of all the resources that are specifically offered to non-profits, then you should keep looking. Many “IT experts” and firms just don’t pay enough attention to different types of businesses and how their needs are different, and instead want to deploy the exact same thing to every client so they can keep their costs down. Look for someone who has experience with non-profit and will partner with you to get you the necessary IT resources in the most cost-effective way.

PS: You should also be able to get discounts/donations for backup and anti-virus software. If you’re 501(c)3 then Symantec participates in Tech Soup and can provide both backup software and anti-virus.

References

How do I determine which virtual disk is which in Linux?

How do I determine which virtual disk is which in Linux?

I have a Linux server running on a VMware virtual machine, with 4 virtual hard drives. After the box ran for a month, I added 2 of the 4 hard drives in the vSphere client; I need more space. I did this step a few weeks ago, then was pulled into another project before creating the file systems and setting up mounts. Now, I do not know which drive is which within Linux. I have /dev/sda , /dev/sda1, /dev/sda2, and /dev/sdb
How do determine which drives have existing data and which are the new? Or, how do I remove drives and start over (I know how to remove the drives in teh vSphere client, but not how to remove the references to them in Linux).
Here are the results of dmesg| grep sd:
[    1.361162] sd 2:0:0:0: [sda] 16777216 512-byte logical blocks: (8.58 GB/8.00 GiB)
[    1.361205] sd 2:0:0:0: [sda] Write Protect is off
[    1.361210] sd 2:0:0:0: [sda] Mode Sense: 61 00 00 00
[    1.361253] sd 2:0:0:0: [sda] Cache data unavailable
[    1.361257] sd 2:0:0:0: [sda] Assuming drive cache: write through
[    1.363223] sd 2:0:0:0: Attached scsi generic sg1 type 0
[    1.363398]  sda: sda1 sda2
[    1.363788] sd 2:0:0:0: [sda] Attached SCSI disk
[    1.364425] sd 2:0:1:0: [sdb] 1572864000 512-byte logical blocks: (805 GB/750 GiB)
[    1.364466] sd 2:0:1:0: [sdb] Write Protect is off
[    1.364471] sd 2:0:1:0: [sdb] Mode Sense: 61 00 00 00
[    1.364512] sd 2:0:1:0: [sdb] Cache data unavailable
[    1.364515] sd 2:0:1:0: [sdb] Assuming drive cache: write through
[    1.370673] sd 2:0:1:0: Attached scsi generic sg2 type 0
[    1.405886]  sdb: unknown partition table
[    1.406228] sd 2:0:1:0: [sdb] Attached SCSI disk
[    4.493214] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[    4.493849] SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts
[    5.933636] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: (null)
[    5.933649] SELinux: initialized (dev sdb, type ext4), uses xattr
[    6.099670] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[    6.108488] SELinux: initialized (dev sda1, type ext4), uses xattr

Output from fdisk -l
Disk /dev/sda: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000dfc09

Device    Boot     Start       End  Blocks  Id System
/dev/sda1 *         2048   1026047  512000  83 Linux
/dev/sda2        1026048  16777215 7875584  8e Linux LVM


Disk /dev/sdb: 750 GiB, 805306368000 bytes, 1572864000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/fedora_dataserv-swap: 820 MiB, 859832320 bytes, 1679360 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/fedora_dataserv-root: 6.7 GiB, 7201619968 bytes, 14065664 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Solutions/Answers:

Answer 1:

From the information you provide, you have two VM disks:

  • /dev/sda: 8GB with two partitions /dev/sda1 and /dev/sda2

  • /dev/sdb: 750GB with no partition, which should be the one you newly added.

Your fdisk -l command result shows that you have created a LVM volume called fedora_dataserv and according to the used disk space, you are using the /dev/sda disk only.

You can refer to the Answer I have posted before, change the value of deb-web138 to fedora_dataserv. For example:

# vgextend deb-web138 /dev/sdb1
# lvresize -L+70G /dev/deb-web138/root
# resize2fs /dev/deb-web138/root

are changed to:

# vgextend fedora_dataserv /dev/sdb1
# lvresize -L+70G /dev/fedora_dataserv/root
# resize2fs /dev/fedora_dataserv/root

in order to increase the space you can use.

Answer 2:

If you simply type

mount 

you will see, which folder is mounted to which disk.

Answer 3:

lsscsi

dmesg| grep sd

cat /proc/scsi/scsi

fdisk -l

Answer 4:

sda is the drive connected to the first logical port in your VM’s configuration. sdb is the drive connected to the second logical port in your VM’s configuration. sda1 and sda2 are two partitions on the first drive, and sdb appears to have no partitions (i.e. is the one you added). You can use gparted or (if formatted as such, lvm) to see how your partitions are laid out.

Answer 5:

blkid will list the drives. You should be able to identify them based on their sizes, partitions, UUIDs, filesystem types, and so on. lsblk is also quite useful to get a graphical overview of the devices, but doesn’t show the filesystem type.

Answer 6:

Thanks to everyone who answered. Everyone who did so, helped me track down the issue, and taught me much!

For some reason, Linux was not recognizing the 2 new drives. (I did not know that until I learned from the others’ answers.

The final solution was:

  1. shut down the vm
  2. Remove the 2 new drives from in the vm in the vSphere client, without deleting them from the datastore
  3. Reboot the vm
  4. Shut down the vm
  5. add one drive in vSphere
  6. reboot the vm
  7. Confirm the os recognizes the new drive (fdisk -l), which it did
  8. Shut down the vm
  9. add the other drive in vSphere
  10. reboot the vm

fdisk -l now shows /dev/sdc and /dev/sdd

Thanks again to everyone for the help!

Answer 7:

/dev/mapper is where mounted luns and LVM partitions are automounted, usually with friendly names.

If your system uses LVM, man lvm. If you’re using mounted luns, check out dm-multipath.

References