VMware vCenter Operation Manager in Real Life – Capacity Planning Models

mixerLately, I’ve been working a lot around vCops. Using vCops in the right way can be super useful with the most common tasks for VMware architects and administrators.

If you are in a search for simplifying your organization capacity planning process, discovering critical issues in your virtual environment or you’re just trying to check which VMs are the most oversized in the datacenter, vCops is your guy.

The smartest part in vCops is its analytics engine; running tons of mathematical calculations behind the scenes helps you get the closer and deeper look you need on the environment.

Rather using it for real-time monitoring or predicting when you’re running out of resources, at the end of day, vCops most common use cases are for monitoring and right sizing your environment. Like any other monitoring system, getting the implementation the right way is crucial.

vCops policy set should be the first step after login, without doing this there is a good chance the end user will be bombarded with alarms, will not receive meaningful information for his capacity planning requirements and eventually will lose his trust in the system.

?Can you answer some capacity planning questions for me

As I mentioned before, one of the most common vCops use cases is the capacity planning side of things. Like any good VMware design, defining the project requirements is your first step towards successful implementation.

Performing capacity planning and environment assessment using vCops has a lot of similarity to a common capacity planning process but in order to get the most realistic outcomes we should configure capacity and time policy in vCops. The following are some questions I found very useful when engaging with customers:

  • Which environment has the need for capacity planning?
  • Are we planning for a performance sensitive production environment or a dynamic Dev environment with greater VM density needs?
  • What percentage of future growth do you anticipate? 
  • In case of resource contention, how long it will take to get more hardware?
  • What kind of workloads are you running?Do you have any idea what are your environment over-commitment ratios looks like?
    • Are they CPU or more memory intensive?
    • What do you care more about, disk capacity or performance? Maybe both?!
    • Do you see memory ballooning or swapping for your VMs?
    • Any high network utilization VMs?
  • Any monster VMs in-place?
  • How do you size your VMs?
  • How HA admission control looks like?
  • Should we take weekends under consideration? What about specific business hours?

Tip #1 – Consider using both “Allocation based model” and “Demand based model”

Wouldn’t it be great to actually know how much time do my clusters still have until they run out of resources?! First, let me explain more about the allocation models.

Allocation based model

Allocation based model

The good:

  • Can reduce the risk of capacity shortfalls by over provisioning
  • Fits to production policies

The bad:

  • Does not minimize costs, very static and cannot account for over-commitment resources
  • May not handle all resources bursts and peaks

Demand based model

The good:

  • Increases density > minimizes waste > reduces costs
  • Accounts for CPU/Memory over-commitment
  • Accounts for resources bursts and peaks and take waste under consideration in the forecasting model

The bad:

  • If you are using vendor best practices and allocating resources to VMs according to white papers this model will not take this under consideration
  • Same goes for your company specific allocation policies for VMs
  • Considered to be more aggressive model and can be less adopted by conservative consumers

Mixing the two

The best way for me to explain how to use both allocation models in vCops is with the following example:

  • In my production cluster I would be taking an aggressive approach for CPU and disk I/O. Because I know I got a lot of CPU over commitment going on and my disk I/O activity is mostly random, the policy will take the demand for CPU and disk I\O under consideration. That way I will get realistic numbers when looking on how much time left until my CPU and disks I\O resources run out.
  • Although I may take the aggressive approach when looking at CPU and disk I/O, I would like to prevent memory over provisioning as much as possible and also be more conservative on disk capacity. Using allocation based model for memory and disk space will help me avoid lack of memory resources and minimize the risks that comes with an over provision datastores.
  • Because I have 10GB network infrastructure and I don’t see any network capacity risks in my future, I’ve decided to clear my selection for network I/O so I won’t get any unnecessary alerts from vCops when it comes to networking resources.

As you can see, using both models in this case can help me reduce compute resources costs, avoid the risks that come with storage over provisioning and avoid unwanted alerts. Your job is to find the “sweet spot” for your environment.

vCops is an awesome tool. Sure, it has its bad moments but most of the time it’s due to misconfigurations. After asking the right questions like showed in the example, configure the capacity and time polices for any given environment will make the process easier and much more reasonable, as I will try to share with you on the next part for this blog series.

7 Comments

  1. Great article. How do you address monitoring thick provisioned disks when you are assigning space as needed from the San? I assign the luns as needed from the San so my storage is usually around 90% full. It seems for this I should just not monitor storage? Or can I selectively select what data stores are monitored? For instance I have several data stores where snaps are directed. I would like to be alerted to those but not to the luns where all of the data store is provisioned.

    Any thoughts?

Leave a Reply