VM Junkie

December 3, 2010

Passed the VCAP-DCA exam!

Filed under: vmware, vSphere — ermac318 @ 1:18 pm

If you follow my twitter, you’ll have noticed that I finally received word that I passed the VCAP-DCA exam! I took the test back on Nov 4th, and had to wait almost a month for my results. But all is forgiven!

Some experiences from my exam:

  • Manage your time well. I actually spent too much time hunting through the provided documentation to answer a couple questions and ended up running out of time. If I had just given up on one particular question and moved on, I might have gotten 2 more right at the end that I couldn’t get to.
  • The blueprint is very important! Make sure you know everything in it.
  • When they say you need to know all the command lines, they’re not kidding.
  • Know your Distributed Virtual Switch stuff well.

Also, I can’t recommend the guides over at vFail enough – they are excellent. Make sure you go there and study before the exam.
Good luck everyone!

October 13, 2010

vSphere Network I/O Control vs. HP VirtualConnect Flex-10 or FlexFabric

Filed under: bladesystem, hp, vmware, vSphere — ermac318 @ 8:48 am

As of vSphere 4.1, VMware introduced a new feature called Network I/O control. Many of the features of Network I/O Control overlap with some of the features of HP VirtualConncet Flex-10 (and subsequently, FlexFabric as well). This article provides a compare and contrast of the two systems and their pros and cons.

HP Flex-10

With HP Flex-10 onboard NICs, you can take a single 10Gb pipe and carve it up into 4 distinct FlexNICs, which appear as their own PCI function in hardware. Using VirtualConnect Server Profiles, you can then specify how much bandwidth you want each FlexNIC to have.

This allows customers in vSphere environments to partition bandwidth between different logical functions in hardware. For example, in the above diagram we could give 500MB of bandwidth for management traffic, 2Gb for vMotion, 4Gb for iSCSI traffic and 3.5Gb for Virtual Machine traffic per FlexNIC. In a FlexFabric environment, one of your four FlexNICs can assume the personality of a FlexHBA, which can act as a Fibre Channel HBA or hardware iSCSI initiator.

Pros:

  • Bandwidth shaping occurs in hardware and is stored in the VC Profile, and therefore is OS independent. For example, FlexNICs can be used by a physical Windows blade.
  • Since the physical NIC function is capped, both ingress and egress traffic is limited by the speed of the FlexNIC you set in hardware.

Cons:

  • Requires Flex-10 or FlexFabric capable blades and interconnect modules.
  • Can only dial up or dial down FlexNIC speeds while blade is powered off.
  • When bandwidth utilization on one FlexNIC is low, another FlexNIC cannot utilized its unused bandwidth.

vSphere Network I/O Control

Introduced in vSphere 4.1, Network I/O Control (or NIOC) is designed to solve many of the same problems as Flex-10. How can I make sure all types of traffic have an appropriate amount of bandwidth allocated, without letting any single network function rob the others of throughput?

By enabling Network I/O Control on a vDistributed Switch (vDS), you can specify limits and shares for particular port groups (illustrated above on the right) or host functions (illustrated above on the left). You can specify that vMotion traffic has a limit of 5Gbps and that is has a share value of 100. You can then specify that your VM traffic has a share value of 50, and your iSCSI traffic has a share value of 50. If all three functions were attempting to push maximum throughput, the vMotion traffic would push 5Gbps (since vMotion is given 100 out of 200 shares), VM and iSCSI traffic would get 2.5Gbps.

An example screenshot, taken with a 1Gb (not 10Gb) NIC.

Pros:

  • Shares allow bandwidth on a single function to utilize the entire 10Gb pipe if the link is not oversubscribed.
  • You can change the speed of a function while the vDS is online and servicing traffic.
  • No special hardware required – can be utilized on rack-mount servers with standard 1Gb or 10Gb NIC interfaces.

Cons:

  • Requires vSphere Enterprise Plus, and requires use of the vDS – NIOC is not available with traditional vSwitches.
  • NIOC can only regulate egress traffic. Ingress traffic will not be affected by NIOC settings.

Conclusions

Both options provide similar capabilities but approach the problem in different ways. While a FlexNIC cannot dial itself up dynamically based on load, it can prevent ingress traffic from overwhelming other functions, whereas NIOC cannot.

The biggest problem with NIOC is that it is only available with the vDistributed Switch, making it challenging for many customers to implement. Not only do they need to be on the most expensive version of vSphere, but they also must then implement vDS, which many customers are not doing or avoiding intentionally due to the added complexity. However, VMware will most likely be targeting only the vDS for future feature enhancements.

In HP Blade environments, it makes sense to utilize the HP VirtualConnect technology as it provides other benefits (MAC address virtualization, server profile migration, and now FlexFabric) beyond just the FlexNIC capability. However, if customers are utilizing competing Blade solutions, or traditional rack-mount servers, then NIOC provides new capabilities to them that they cannot get in hardware.

It is also possible to utilize both solutions in tandem. One could conceivably use FlexNICs to segregate certain types of traffic for security purposes (maybe if your organization doesn’t allow traffic from different security zones on the same vSwitch) and then use NIOC to do bandwidth shaping. Another use case is if you want your Management Traffic to stay on a standard vSwitch, but move all VM/vMotion/etc traffic to a vDS, you can use two FlexNICs per pipe and use NIOC on the larger of the two.

December 10, 2009

View 4 & PCoIP – a few problems

Filed under: view, vmware, vSphere — ermac318 @ 9:15 pm

So I’ve been doing my first customer deployment of View 4 lately, and there have been quite a few gotchas that I’ve noticed so far…

The first, and biggest, is that I just can’t get View Composer to reliably clone machines. About 50% of the time, the VM will have an Error in View Manager that says “View Composer agent initialization state error (6): Unknown failure (waited 0 seconds) and the only way to fix it is to reset the VM manually or refresh it. I ran into similar issues during the Beta and what I’m seeing here is pretty nasty. It’s a very similar issue to what is outlined in this VMware KB article. However, the resolution in the KB article has so far not fixed the issue for this customer. The error will not clear on its own, and can even reoccur after a reset.

We’ve also had several issues with PCoIP so far. The most critical is we’ve had virtual machines lock up so bad after trying to connect to them with PCoIP that I had to kill the VM from the service console with kill -9. Other issues are that PCoIP doesn’t work with ThinPrint (so you get no printer redirection except through USB redirection), no folder redirection (which was to be somewhat expected since that’s provided by RDP), and the biggest issue is no tunneled connections. Much like RGS connections, PCoIP connections go directly to the VM. This has important implications, such as the Security Server is now useless except for RDP.

While I definitely think View 4 is a HUGE step forward from View 3, it still has a lot of things I think are pretty rough edges. Add that to a lot of the issues with vSphere Update 1 (well documented by Mike Laverick) and it looks to me like View 4 really was released too soon. That said, I think it’s a great start and bodes really well for the direction VMware is headed.

View Composer agent initialization state error (6): Unknown failure (waited 0 seconds)

November 28, 2009

View 4 issue with LSI Logic controller

Filed under: view, vmware, vSphere — ermac318 @ 9:27 pm

So this came out during my View 4 beta testing, but I chocked it up to the drivers in the Beta being unsigned (as they often are), however this is still an issue in the final bits of vSphere 4 update 1 and View 4.

When creating a template from scratch (and sometimes after upgrading templates) using the (Parallel) LSI Logic SCSI controller (as you are instructed to do in the XP Deployment guide), you will run into an issue after creating a linked clone pool.

When a linked clone is created in View 4, a second controller is added for the additional disks (unlike in View 4). With IDE, this isn’t an issue (as the disks are added to the second IDE controller). With BusLogic SCSI, the additional controller is auto detected and doesn’t require any user interaction. However, if the template is made with the LSI Logic controller, non-admin users will be presented with a dialogue box after every View Composer operation (newly created clone, refresh, or recompose) which will ask for admin credentials to install the driver for the second LSI Logic controller.

This is a big issue as the Windows XP Deployment Guide says: “The LSI Logic adapter issued for VDI-based deployments is recommended.” They later elaborate that this is for performance reasons.

For now I recommend people either change around their templates to use a different controller.

November 20, 2009

Best new feature of vSphere Update 1: PVSCSI Boot

Filed under: esx, vmware, vSphere — ermac318 @ 12:03 pm

So a while back I made a post about some new best practices for vSphere based on some new features that were available. One of the ugly points at the time was that while the new PVSCSI controller was awesome, you could only use it for additional disks.

Well I’m thankful to report that as of vSphere Update 1, you can boot from a PVSCSI controller so there shouldn’t be a necessity to add a second controller just when you want to take advantage of this new IO device.

Coincidentally, one of the other points in that article was about Thin Provisioning, but a recent whitepaper from VMware has pretty much alleviated all my fears on that front.

New EVC Mode in vSphere 4.0 Update 1: Westmere

Filed under: esx, vmware, vSphere — ermac318 @ 11:54 am

Looks like VMware is already preparing for the launch of Intel’s newest chip. I noticed the following in vCenter 4 Update 1’s EVC Cluster options:

Good to note this means that there is sufficient changes ahead to require a different EVC mode. That means for those of you installing Nehalem-based clusters today, if you plan on adding Westmere cores in the future, you’ll have to turn EVC on in Core i7 mode.

September 2, 2009

VMworld session TA3438 – Top 10 Performance improvements in vSphere 4

Filed under: vmware, vmworld, vSphere — ermac318 @ 10:00 pm

This was a really interesting session that really broke down a lot of the stuff that was improved in vSphere. VMware likes to talk about how vSphere has however many hundred new features, here’s an interesting list of the highlights:

  • IO overhead has been cut in half. Also, IO for a VM can execute on a different core than the VM Monitor is running on. This means a single CPU VM can actually use two CPUs.
  • The CPU scheduler is much better at scheduling SMP workloads. 4-way SMP VMs perform 20% petter, and 8-way is about 2x the performance of a 4-way with an Oracle OLTP workload, so performance scales well.
  • EPT improves performance a LOT. Turning it on also enables Large Pages by default (which can negatively affect TPS). Applications need to have Large Pages turned on, like SQL (which gains 7% performance)
  • Hardware iSCSI is 30% less overhead across the board, Software iSCSI is 30% better on reads, 60% better on writes!
  • Storage VMotion is significantly faster, because of block change tracking and no need to do a self-VMotion (Which also means it doesn’t need 2x RAM)
  • In vSphere performance between RDM and VMFS is less than 5%, and while this is the same as ESX3.5, performance of a VM on a VMFS volume where another operation (like a VM getting cloned) has improved.
  • Big improvement in VDI workloads – a boot storm of 512 VMs is five times faster in vSphere. 20 minutes reduced to 4.
  • PVSCSI does some very clever things like sharing the I/O queue depth with the underlying hypervisor, so you have one less queue.
  • vSphere TCP stack is improved (I know from other sessions they’re using the new tcpip2 stack end-to-end.
  • VMXNET3 gives big network I/O improvements, especially in Windows SMP VMs.
  • Network throughput scales much better, 80% performance improvement with 16 VMs running full blast.
  • VMotion 5x faster on active workloads, 2x faster at idle.
  • 350K IOPS per ESX Host, 120K IOPS per VM.

All reasons to be running vSphere on your infrastructure today.

September 1, 2009

VMworld Session VM2241 – PowerCLI (4.0 Update 1 and Onyx)

Filed under: powershell, vmware, vmworld, vSphere — ermac318 @ 1:54 pm

One of the sessions I was most looking forward to today was the PowerCLI session from Carter and Friends. After teasing Project Onyx at their blog a few days back the anticipation was at a fever pitch.

Some great info in the session for PowerCLI newbies, but the good stuff was info on PowerCLI 4.0 Update 1 which is scheduled to be out “before Christmas” according to Carter (to which someone in the audience quipped, “what year?”)

  • PowerCLI 4u1 has 229 cmdlets in the current internal beta build
  • New cmdlets for vApps (get, new, start, stop, import, export)
  • Better Storage Management:
    • iSCSI improvements, get/set-vmhosthba.
    • You can now turn on the SW iSCSI initiator and add a Send Targets IP, rescan, and format LUNs all from PowerCLI.
  • Huge improvements to Guest operations.
    • Set-VMGuestNetwork (name approximate) allows you to set the networking information of Virtual Machines (Windows OR Linux with same syntax). Will be great for post-SRM failver scripting!
    • Copy files in and out of guests (Win or Linux)
    • Invoke-VMScript can run arbitrary commands and batch files (no longer requires PS in VM, can run BASH scripts in Linux VMs, too). Still require Host and Guest credentials.
  • NIC Teaming and Load Balancing policies
    • Set standby NICs, unused NICs, changed load balancing policy to IPHash, etc.
    • Forgot to ask if you can remove a VMNIC from a vSwitch yet…
  • vCenter Permissions and Role cmdlets.

Stuff I wish I had seen:

  • License management cmdlets (adding licenses to vCenter’s license database, assigning licenses to servers)
  • DPM Cluster- and Host-based cmdlets

Carter also said that performance in large vCenter servers will improve a lot – getting a single VM won’t take as long as getting them all.

The last thing was a demonstration of Project Onyx. Onyx is a proxy that sits in between your vSphere client and your vCenter server. It functions a lot like a sysinternals tool – you turn on “capture” and whenever you issue a command in the client it spits out PowerShell code that breaks out to the vSphere API to do what you just did. You then can save this to a PS1 file, edit out the stuff you don’t want to put in a script, generalize it (so it runs on a variable) and wrap it in a function.

The example they used was turning on DPM for a cluster. We ran Onyx and made the change to one cluster manually. Onyx spits out a giant chunk of code that creates a VMware.vim.ClusterConfigSpecEx object. We checked the API documentation to confirm that all the stuff related to HA and DRS (which we don’t want to modify) is optional, then remove all that junk from the code. We then wrap it in a function and call it using foreach object after grabbing all the clusters using get-cluster.

It’s a pretty slick process. I dropped off my business card to hopefully get into the Beta. Carter, hook a brotha up!

Lastly Scott Herold from Quest/Vizioncore showed off the Virtualization EcoShell, which I guess I’m way late to the party on but I’m totally switching to that from PowerGUI that I use now.

Overall GREAT session.

Why VMware’s ESX3 to ESX4 Upgrade Sucks

Filed under: vmware, vSphere — ermac318 @ 6:00 am

Working for a VMware partner, I’m involved in a lot of design and service engagements. Recently, we’ve gone through two major vSphere upgrades at a couple clients, and both times we’ve done what VMware refers to as a “Migration” upgrade, where we are moving VMs off of old servers onto new servers. This is great – it gives us a chance to get the customer on ESXi (for reasons I’ve spoken of previously) and it means we don’t have to go through the ESX3.x to ESX4 upgrade process, which sucks. The sad thing is, the reason it sucks is entirely VMware’s fault and could have been avoided.

Let us begin with a little background about how the new Service Console works in ESX4: Not only does it do less, it takes up more space! That’s exiting… But the real problem is that now the Service Console lives inside a VMDK file. Or at least, that’s what VMware says. But in reality, it doesn’t completely live inside a VMDK, only most of its partitions live in a VMDK. There are still two key partitions (/boot and the vmkcore partition) which live on regular, standard partitions. This means you have the following possible scenarios:

  • System has local storage or a private SAN LUN with three partitions: /boot, vmkcore, and a local VMFS3 partition where the COS VMDK lives.
  • System has local storage or a private SAN LUN with two partitions: /boot, vmkcore, and a SAN-based VMFS3 partition where the COS VMDK lives (for that server and maybe others as well).

I don’t like either of these. Why? Because beforehand, our best practices were to remove all the local VMFS partitions. This was to avoid confusion and to prevent people from accidentally placing a VM on a local volume and then wondering why VMotion didn’t work. Now, having that local VMFS volume is a necessity – unless you want to split the boot of your ESX server between one LUN with the /boot and vmkcore, and another LUN with the actual COS. This makes no sense.

Why did VMware place the Service Console disks inside a VMDK file? In the various training, this is always touted as a feature, but for the life of me I can’t think of a reason why this is a good thing. It’s not like we can Storage VMotion our Service Console VMDK (as far as I know), and it’s not like expanding our partitions and such get any easier by virtue of it being a VMDK. From where I see it, this was a dumb architectural decision, and ever since the Beta when I complained about it, I’ve always gotten the same shrug. The best answer I ever got out of VMware related to this was “well, if you don’t like it, use ESXi.”

That’s great, we’re already doing that. But what about customers who want a real upgrade path? VMware doesn’t provide one for ESX -> ESXi , other than blow the system away and start from scratch. I can’t use Update Manager to do a clean, managed upgrade from ESX to ESXi, and from where I see it VMware should’ve had that ready to go out of the gate -bBecause from what they’re telling partners and customers, ESX4 Classic is the end of the road for the Service Console. Instead, it’s write down the configuration, rip out the hard disks, and replace them with a flash device.

And that sucks.

July 20, 2009

VMware Plea Part Deux: ESXi Boot from SAN

Filed under: esx, Uncategorized, vmware, vSphere — ermac318 @ 10:08 am

As some of you are aware, ESXi can officially boot from all the following sources:
USB Flash Device
SD Card
Local Hard Drive
PXE Boot (experimental)

There’s one big missing piece here: Boot from SAN. Why is this a big deal? It means customers who want to BFS (like HP VirtualConnect customers, or in the future Cisco UCS customers) in order to get the most out of their dynamic datacenter cannot use the next generation hypervisor architecture; they must stick to ESX “Classic.” We have been going back and forth with VMware support on this, but still even in ESXi 4.0 Boot from SAN is not officially supported.

This is despite the fact that on the ESXi Features Page, one of the features listed is Boot from SAN! Instead, you need to dig into the Install and Setup guide to find this gem:

You use the ESXi 4.0 CD to install the ESXi 4.0 software onto a SAS, SATA, or SCSI hard drive.
Installing on a Fibre Channel SAN is supported experimentally. Do not attempt to install ESXi with a SAN attached, unless you want to try this experimental feature.

That said, it works fine. I haven’t had any problems. But we can’t deploy it for customers that way if it’s not supported.

VMWare: Why is your next-generation hypervisor crippled in this way?

Older Posts »

Create a free website or blog at WordPress.com.