Tag Archives: Design

Analyze and Replay IO Workloads with VMware

One of the most commonly problem in VMware migration projects is that the storage is undersized. When you want to size a new platform it is important to know the IO requirements. Unfortunately storage is often only sized for capacity, which is a great mistake. VMware offers tools you need to analyze and record workloads and replay them in the new platform. This post explains how to capture workloads with vscsiStats trace mode and replay them with I/O Analyzer.

vmware-io-analyzer

Read more »

vCenter and the physical or virtual discussion - vSphere 5.5

Since vSphere 5.1 I have heard many advises that it is better to have a physical vCenter due to its enormous resource requirements. In the last years I had many discussions on that topics and it turned out that most people are just afraid because of this "chicken and egg" situation. Technical arguments are mostly related to dvSwitch issues or resource worries. In the following post I'll cover both sides of the argument.

p2v

Read more »

VMware vSphere 5.5 Configuration Maximums

With the upcoming release of VMware vSphere 5.5, there are a few neat platform scalability enhancements. The support for physical hardware has been doubled which makes VMware vSphere available for large enterprise-class hardware:

Enhancements

  • 320 logical CPUs per Host (Up from 160)
  • 4TB Memory (Up from 2TB)
  • 16 NUMA Nodes per host (Up from 8)
  • 4096 vCPUs per host (Up from 2048)
  • 62TB VMDK Size (Up from 2TB)
  • ...more to come

VMware vSphere ESX and vCenter Configuration Maximums

Howto: vCenter 5.1 SSO with trusted Active Directory

There are a lot of pitfalls when you want to deploy or update to VMware vSphere 5.1. Beside the vSphere Web Client, the most discussed new component is the new authentication engine called Single Sign On (SSO) which is mandatory for the vCenter Server. I've already written about a simple deployment scenario where a vCenter Server (Appliance or Installable) can be authenticated against a single Active Directory domain. In this post i am going to explain the changes and straits when using multiple trusted Active Directory Domains.

Read more »

VDCD510 Objective 2.6 – Build Recoverability Requirements into the Logical Design

RTO & RPO
During the business continuity plan creation it is necessary to define the key metrics of recovery point objective (RPO) and recovery time objective (RTO). These two terms are used to define how fast a backup can be restored and the amount of data loss.

Recovery Time Objective (RTO): The time it takes to recover from a data loss event. This is the amount of time while the systems or service is unavailable.

Recovery Point Objective (RPO): The amount of time between backups. Leads to the possible amount of data loss.

Disaster Recovery and Business Continuity

Disaster recovery refers to previously defined steps taken by administrators to resume services or systems after a disaster event.

Business continuity is a high level overview of processes to ensure that an organization can resume their business after a disaster.

VMware offers a great free course about this topic: DRBC Design - Disaster Recovery and Business Continuity Fundamentals

VCAP5-DCD Exam Blueprint v1.1

Knowledge

  • Understand what recoverability services are provided by VMware solutions.
  • Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)
  • Differentiate Business Continuity and Disaster Recovery concepts.
  • Describe and differentiate between RTO and RPO

Skills and Abilities

  • Given specific RTO and RPO requirements, build these requirements into the logical design.
  • Given recoverability requirements, identify the services that will be impacted and provide a recovery plan for impacted services.
  • Given specific regulatory compliance requirements, build these requirements into the logical design.
  • Based on customer requirements, identify applicable site failure / site recovery use cases.
  • Determine recoverability component of SLAs and service level management processes.
  • Based on customer requirements, create a data retention policy.

Tools

Back to VCAP5-DCD Study Guide

VDCD510 Objective 2.5 – Build Performance Requirements into the Logical Design

Identify Infrastructure Qualities
We have already covered infrastructure qualities in objective 2.3. To recall them, here is a short overview:

Availability is the ability of a system or service to perform its required function when required. It is usually calculated as a percentage like 99,9%.
Manageability describes the expense of running the system. If you have a huge platform that is managed by a tiny team the operational costs are very low.
Performance is the measure of what is delivered by a system. This accomplishment is usually measured against known standards of speed completeness and speed.
Recoverability describes the ability to return a system or service to a working state. This is usually required after a system failure and repair.
Security is the process of ensuring that services are used in an appropriate way.

 

Key Performance Indicators
According to ITIL, a Key Performance Indicator (KPI) is used to assess if a defined service is running according to expectations. The exact definition of the KPIs differs depending on the area. This objective is about server performance which is typically assessed using the following KPIs: Processor, Memory, Disk, and Network. VMware offers several concepts for managing resources.

Processor
To manage CPU resources VMware relies on the the CPU scheduler. The CPU scheduler shares the same logical processor among multiple virtual machines. It defines the following terms:

  • Processor Socket: A physical CPU
  • Core: A logical core within a physical CPU
  • Logical Processor: A hyperthreading CPU Core presents itself as multiple logical processors

Memory
VMware offers the following features to manage the memory efficiently

  • Transparent Page Sharing: Shares identical Memory Pages among multiple virtual machines. This feature is active by default and does not impact the performance of the virtual machine.
  • Ballooning: Controls a balloon driver which is running inside each virtual machine. When the physical host runs out of memory it instructs the driver to inflate by allocating inactive physical pages. The ESX host can uses these pages to fulfill the demand from other virtual machines.
  • Memory Compression: Prior to swap memory pages out to physical disks the ESX server starts to compress pages. Compared to swapping, compression can improve the overall performance in an memory overcommitment scenario.
  • Swapping: As the last choice the ESX hypervisor starts to swap pages out to physical disks. This is definitely a bad situation as disk are much slower than memory.

Disk
Storage I/O Control (SIOC) allows cluster wide control of disk resources. The main goal is to prevent a single VM to use all available disk performance from a shared storage. With SIOC a virtual machine can be assigned a priority when contention arises on a defined datastore.

Network
Network I/O Control (NetIOC) enables traffic prioritization by partitioning of network bandwidth among the entire cluster.

 

VCAP5-DCD Exam Blueprint v1.1

Knowledge

  • Understand what logical performance services are provided by VMware solutions.
  • Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)
  • List the key performance indicators for resource utilization.

Skills and Abilities

  • Analyze current performance, identify and address gaps when building the logical design.
  • Using a conceptual design, create a logical design that meets performance requirements.
  • Identify performance-related functional requirements based on given non-functional requirements and service dependencies.
  • Define capacity management practices and create a capacity plan.
  • Incorporate scalability requirements into the logical design.
  • Determine performance component of SLAs and service level management processes.

Tools

Back to VCAP5-DCD Study Guide

VDCD510 Objective 2.4 – Build Manageability Requirements into the Logical Design

Identify Infrastructure Qualities
We have already covered infrastructure qualities in objective 2.3. To recall them, here is a short overview:

Availability is the ability of a system or service to perform its required function when required. It is usually calculated as a percentage like 99,9%.
Manageability describes the expense of running the system. If you have a huge platform that is managed by a tiny team the operational costs are very low.
Performance is the measure of what is delivered by a system. This accomplishment is usually measured against known standards of speed completeness and speed.
Recoverability describes the ability to return a system or service to a working state. This is usually required after a system failure and repair.
Security is the process of ensuring that services are used in an appropriate way.

Event, Incident and Problem Management
This concept is related to the well known ITIL standard.

Event: A change of state which might have an influence for the management of a service or system
Incident: An event which is not part of the standard operation. It might cause a service disruption or reduce the productivity.
Problem: The cause of one or more incidents. Problems are usually identified because of multiple incidents.

Please note that an incident might give a hint to the investigation of a Problem, but never become a Problem. Even if the incident is elevated to the 2nd level, it remains an incident. The problem management might manage the resolution of the incident when the incident can only be closed by solving the Problem.

Change Management
The change management process is responsible for controlling the lifecycle of all changes. Changes are defined as the addition, modification or removal of anything that could have an effect to services or systems. The primary objective of change management is to enable beneficial changes to be made, with minimum disruption.

VCAP5-DCD Exam Blueprint v1.1

Knowledge

  • KnowledgeUnderstand what management services are provided by VMware solutions.
  • Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)

Skills and Abilities

  • Build interfaces to existing operations practices into the logical design
  • Address identified operational readiness deficiencies
  • Define Event, Incident and Problem Management practices
  • Define Release Management practices
  • Determine Request Fulfillment processes
  • Design Service Asset and Configuration Management (CMDB) systems
  • Define Change Management processes
  • Based on customer requirements, identify required reporting assets and processes

Tools

Back to VCAP5-DCD Study Guide

VDCD510 Objective 2.3 – Build Availability Requirements into the Logical Design

Identify Infrastructure Qualities
VMware named five infrastructure qualities: Availability, Manageability, Performance, Recoverability and Security. If you are familiar with other design methodologies you might have encountered other terms, but here i want to describe the VMware terms.

Availability is the ability of a system or service to perform its required function when required. It is usually calculated as a percentage like 99,9%.

Manageability describes the expense of running the system. If you have a huge platform that is managed by a tiny team the operational costs are very low.

Performance is the measure of what is delivered by a system. This accomplishment is usually measured against known standards of speed completeness and speed.

Recoverability describes the ability to return a system or service to a working state. This is usually required after a system failure and repair.

Security is the process of ensuring that services are used in an appropriate way.

Lets take a website for example. To understand how to compare your design against this requirements you could ask yourself:

  • Availability: Is website up and running?
  • Manageability: How expensive is it to keep the system up?
  • Performance: How fast does the webserver responds?
  • Recoverability: If the server crashes, how fast can it be restored?
  • Security: Are the customers data safe?

Redundancy and Single Point of Failure
A single point of failure is a component of a system that, if it fails, will cause the entire system to fail. Systems can be made robust bei adding redundancy. A server usually attains internal component redundancy by having multiple hard drives, network connections or power supplies. By having multiple servers attached to a cluster you can archieve server hardware redundancy.

VMware Availability Services
vSphere High Availability (HA)
minimizes your downtime by restarting virtual machines on remaining hosts in case of hardware failures.

vSphere Fault Tolerance (FT) provides continuous availability for virtual machines by creating a live copy of a virtual machine to another physical host.

Differentiate Business Continuity and Disaster Recovery
Business Continuity is focused on avoiding or mitigateing the impact of a risk. (Proactive)
Disaster Recovery is focused on how to restore the services after a outage occurs. (Reactive)

VMware offers a course about disaster recovery and business continuity. It is free at the moment, so check it out:
DRBC Design - Disaster Recovery and Business Continuity Fundamentals

VCAP5-DCD Exam Blueprint v1.1

Knowledge

  • Understand what logical availability services are provided by VMware solutions.
  • Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)
  • Describe the concept of redundancy and the risks associated with single points of failure.
  • Differentiate Business Continuity and Disaster Recovery concepts.

Skills and Abilities

  • Determine availability component of service level agreements (SLAs) and service level management processes.
  • Explain availability solutions for a logical design based on customer requirements.
  • Define an availability plan, including maintenance processes.
  • Prioritize each service in the Service Catalog according to availability requirements.
  • Balance availability requirements with other infrastructure qualities

Tools

Back to VCAP5-DCD Study Guide

VDCD510 Objective 2.2 – Map Service Dependencies

This objective talks about service dependencies in your design and how to document them. Services could be everything that matters the design, DNS, databases or NTP for example. I will not go deeper into these services, instead of this I will explain the terminology.

Application Dependency Diagram
An application dependency diagram determines which entities are related with another. While discovering running services during the current state analysis you can use this information to draw down the upstream and downstream relationships. Relationships could be defined in the following terms:

  • runs on / runs
  • depends on / used by
  • contains / contained by
  • hosts / hosted by

If you have a website for example. The website runs on a webserver which runs on a linux server which is hosted by a VMware Cluster. This is an example of the dependency map:

Upstream and Downstream Relationships
Everything that happens downstream can have an effect on upstream items. For example, if the webserver crashes, the website upstream is affected and goes down. Neither the operating system, nor the cluster are affected, as this are downstream relationships. To memorize this, you could think of a house. The roof is “up” while basement is “down”. If you break down the basement, the roof upstream” also collapses.

VMware offers a product called VMware vCenter Application Discovery Manager which can assist to discover and draw down these relationships.

VCAP5-DCD Exam Blueprint v1.1

Knowledge

  • Identify basic service dependencies for infrastructure and application services.

Skills and Abilities

  • Document service relationships and dependencies (Entity Relationship Diagrams)
  • Identify interfaces to existing business processes and define new business processes
  • Given a scenario, identify logical components that have dependencies on certain services.
  • Include service dependencies in a vSphere 5 logical design.
  • Analyze services to identify upstream and downstream service dependencies.
  • Having navigated logical components and their interdependencies, make decisions based upon all service relationships.

Tools

VDCD510 Objective 2.1 – Map Business Requirements to the Logical Design

The second objective talks about the logical design. When we talk about the logical design at this point in the project phase we are talking about a lower level design, compared to the conceptual design. If you take a network diagram for example, this is usually also a logical design but it contains much more information. Information you do not have at this point. So the purpose of a logical design is to dig deeper into the conceptual design and evaluate the design without getting lost in explicit connection or configuration details.

It is often not easy to understand the difference between physical, logical and conceptual design. To understand the difference, always remember the timeline. First you create a concept, something like “The customer wants to have a protected cluster with physically separated hardware”. The conceptual design is always the part the customer wants. The second step is the logical design. This is where the designer creates his logical design to fulfill the given requirements. This design should sound like “To fit the needs we need servers, connected to switches, connected to our storage.” Does the designer care about IP addresses, hostnames or hardware vendors at this point? No!

What should be shown in the logical design? Here are a couple of questions which might be part of the logical design:

  • Should the Cluster use HA and DRS?
  • Is Storage DRS a valid solution?
  • Does the customer need storage tiering?
  • Could Site Recovery Manager fit the needs?

If you are looking at a logical design you should usually see ESX hosts, physical switches, virtual switches, storages and the depencies between all components. Another hint: If the logical design is reusable at another customer without modification, it is a valid logical design.

Service Catalog
A new part in this objective and derived from ITIL is the service catalog. A service catalog is a list of services that a company provides to its customers. The catalog should provide the following information:

  • Service name (Extended Support)
  • Service description (Maintenance and support of servers and components)
  • Services included (Patch management, upgrades, incident support)
  • Services not included (Non-standard changes)
  • Services availability  (24x7x365)

VCAP5-DCD Exam Blueprint v1.1

Knowledge

  • Explain the common components of logical design.
  • List the detailed steps that go into the makeup of a common logical design.
  • Differentiate functional and non-functional requirements for the design.

Skills and Abilities

  • Build non-functional requirements into a specific logical design.
  • Translate given business requirements and the current state of a customer environment into a logical design.
  • Create a Service Catalog

Tools