vSphere design best practices

What this book covers Chapter 1, Virtual Data Center Design, gives you an insight into the design and architecture of the overall datacenter, including the core components of vCenter an

Trang 1

www.it-ebooks.info

Trang 2

vSphere Design Best Practices

Apply industry-accepted best practices to design

reliable high-performance datacenters for your

Trang 3

vSphere Design Best Practices

All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews

Every effort has been made in the preparation of this book to ensure the accuracy

of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information.First published: May 2014

Trang 4

Valentina Dsilva Abhinash Sahu

Production Coordinator

Alwin Roy

Cover Work

Alwin Roy

Trang 5

About the Authors

Brian Bolander spent 13 years on active duty in the United States Air Force A veteran of Operation Enduring Freedom, he was honorably discharged in 2005 He immediately returned to Afghanistan and worked on datacenter operations for the

US Department of Defense (DoD) at various locations in southwest Asia Invited

to join a select team of internal consultants and troubleshooters responsible for operations in five countries, he was also the project manager for what was then the largest datacenter built in Afghanistan

After leaving Afghanistan in 2011, he managed IT operations in a DoD datacenter

in the San Francisco Bay Area His team was responsible for dozens of multimillion dollar programs running on VMware and supporting users across the globe

He scratched his adrenaline itch in 2011 when he went back "downrange", this time directing the premier engineering and installation team for the DoD in

Afghanistan It was his privilege to lead this talented group of engineers who were responsible for architecting virtual datacenter installations, IT project management, infrastructure upgrades, and technology deployments on VMware for the entire country from 2011 to 2013

Selected as a vExpert in 2014, he is currently supporting Operation Enduring

Freedom as a senior virtualization and storage engineer

He loves his family, friends, and rescue mutts He digs science, technology, and geek gear He's an audiophile, a horologist, an avid collector, a woodworker, an amateur photographer, and a virtualization nerd

www.it-ebooks.info

Trang 6

Christopher Kusek had a unique opportunity presented to him in 2013, to take the leadership position responsible for theater-wide infrastructure operations

for the war effort in Afghanistan Leveraging his leadership skills and expertise

in virtualization, storage, applications, and security, he's been able to provide

enterprise-quality service while operating in an environment that includes the real and regular challenges of heat, dust, rockets, and earthquakes

He has over 20 years' experience in the industry, with virtualization experience running back to the pre-1.0 days of VMware He has shared his expertise with

many far and wide through conferences, presentations, #CXIParty, and sponsoring

or presenting at community events and outings whether it is focused on storage, VMworld, or cloud

He is the author of VMware vSphere 5 Administration Instant Reference, Sybex, 2012, and VMware vSphere Performance: Designing CPU, Memory, Storage, and Networking for

Performance-Intensive Workloads, Sybex, 2014 He is a frequent contributor to VMware

Communities' Podcasts and vBrownbag, and has been an active blogger for over

a decade

A proud VMware vExpert and huge supporter of the program and growth of the virtualization community, Christopher continues to find new ways to outreach and spread the joys of virtualization and the transformative properties it has on individuals and businesses alike He was named an EMC Elect in 2013, 2014, and continues to contribute to the storage community, whether directly or indirectly, with analysis and regular review

He continues to update his blog with useful stories of virtualization and storage and his adventures throughout the world, which currently include stories of his times in Afghanistan You can read his blog at http://pkguild.com or more likely catch him

on Twitter; his twitter handle is @cxi

When he is not busy changing the world, one virtual machine at a time or Facetiming with his family on the other side of the world, he's trying to find awesome vegan food in the world at large or somewhat edible food for a vegan in a war zone

Trang 7

About the Reviewers

Andy Grant works as a technical consultant for HP Enterprise Services Andy's primary focus is datacenter infrastructure and virtualization projects across a

number of industries including government, healthcare, forestry, financial, gas and oil, and international contracting He currently holds a number of technical certifications including VCAP4/5-DCA/DCD, VCP4/5, MCITP:EA, MCSE, CCNA, Security+, A+, and HP ASE BladeSystem

Outside of work, he enjoys backcountry camping, playing action pistol sports (IPSC), and spending time being a goof with his son

Muhammad Zeeshan Munir is a freelance ICT consultant and solution architect

He established his career as a system administrator in 2004 and since then has

acquired and executed many successful projects in multimillion ICT industries With more than 10 years of experience, he now provides ICT consultancy services

to different clients in Europe He also works as a system consultant for Qatar

Computing Research Institute He regularly contributes to different wikis and produces various video tutorials mostly about different technologies, including VMware products, Zimbra e-mail services, OpenStack, and Red Hat Linux, which can be found at http://zee.linxsol.com/system-administration When he is doing nothing, he likes to travel around and speak languages such as English, Urdu, Punjabi, and Italian

www.it-ebooks.info

Trang 8

Prasenjit Sarkar is a senior member of the technical staff in VMware Service Provider Cloud R&D where he provides architectural oversight and technical

guidance to design, implement, and test VMware's Cloud datacenters You can follow him on Twitter at @stretchcloud

He is an author, R&D guy, and a blogger focusing on virtualization, cloud

computing, storage, networking, and other enterprise technologies He has more than 10 years' expert knowledge in R&D, professional services, alliances, solution engineering, consulting, and technical sales with expertise in architecting and

deploying virtualization solutions and rolling out new technology and solution initiatives His primary focus is on the VMware vSphere infrastructure and public cloud using VMware vCloud Suite One of his other areas of focus is to own the entire life cycle of a VMware-based IaaS (SDDC), especially vSphere, vCloud

Director, vShield Manager, and vCenter Operations

He was one of the VMware vExperts in 2012 and 2013 and well known for his acclaimed virtualization blog http://stretch-cloud.info He holds certifications from VMware, Cisco, Citrix, Red Hat, Microsoft, IBM, HP, and Exin Prior to joining VMware, he has served other fine organizations such as Capgemini, HP, and GE as a solution architect and infrastructure architect

I would like to thank and dedicate this book to my family

Without their endless and untiring support, this book would

not have been possible

Trang 9

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy Get in touch with us at service@packtpub.com for more details

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks

• Fully searchable across every book published by Packt

• Copy and paste, print and bookmark content

• On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access

PacktLib today and view nine entirely free books Simply use your login credentials for immediate access

Instant Updates on New Packt Books

Get notified! Find out when new books are published by following @PacktEnterprise on

Twitter, or the Packt Enterprise Facebook page.

www.it-ebooks.info

Trang 10

Table of Contents

Preface 1 Chapter 1: Virtual Data Center Design 5

Chapter 2: Hypervisor Design 23

Summary 35

Trang 11

SIOC 46

Provisioning 47

Chapter 4: Network Design 49

Summary 57

Chapter 5: Virtual Machine Design 59

www.it-ebooks.info

Trang 12

Table of Contents

[ iii ]

Limits 63Reservations 63

Summary 68

Chapter 6: Business Critical Applications 69

Chapter 7: Disaster Recovery and Business Continuity 81

Appendix A: vSphere Automation Tools 93

Trang 13

Table of Contents

[ iv ]

Appendix B: Certification Resources 99

Index 105

www.it-ebooks.info

Trang 14

Welcome to vSphere Design Best Practices, an easy-to-read guide full of hands-on

examples of real-world design best practices Each topic is explained and placed

in the context of virtual datacenter design

This book is all about design principles It isn't about operations or administration; instead, we focus on designing virtual datacenters to meet business requirements and leveraging vSphere to get us to robust and highly available solutions

In this book, you'll learn how to utilize the features of VMware to design, architect, and operate a virtual infrastructure using the VMware vSphere platform Readers will walk away with a sense of all the details and parameters there are for a design that has been well thought out

We'll examine how to customize your vSphere infrastructure to fit business needs and look at specific use cases for live production environments Readers will become familiar with new features in Version 5.5 of the vSphere suite and how these features can be leveraged in design

Readers will walk away with confidence as they will know what their next steps are towards accomplishing their goals, whether that be a VCAP-DCD certification or a sound design for an upcoming project

What this book covers

Chapter 1, Virtual Data Center Design, gives you an insight into the design and

architecture of the overall datacenter, including the core components of vCenter and ESXi

Chapter 2, Hypervisor Design, dives into the critical components of the datacenter by

providing the best practices for hypervisor design

Trang 15

[ 2 ]

Chapter 3, Storage Design, explains one of the most important design points of the

datacenter—storage This chapter focuses attention on the design principles related

to the storage and storage protocols

Chapter 4, Network Design, peels back the layers of networking by focusing on designing

flexible and scalable network architectures to support your virtual datacenter

Chapter 5, Virtual Machine Design, covers what it takes to inspect your existing

virtual machines and provides guidance and design principles to correctly size and deploy VMs

Chapter 6, Business Critical Applications, breaks through the barriers and concerns

associated with business critical applications, allowing business requirements to translate into a successful and stable deployment

Chapter 7, Disaster Recovery and Business Continuity, considers the many parameters

that make up a DR/BC design, providing guidance to make sound decisions to ensure a well-documented and tested disaster recovery policy

Appendix A, vSphere Automation Tools, briefly discusses the value of automation tools

such as vCenter Orchestrator (vCO) and vCloud Automation Center (vCAC), their differences, and use cases for your design and implementation

Appendix B, Certification Resources, gives guidance on the VMware certification

roadmap and lists resources for the Data Center Virtualization (DCV) track

What you need for this book

As this book is technical in nature, the reader should possess an understanding of the following concepts:

• Storage technologies (SAN, NAS, Fibre Channel, and iSCSI)

• Networking—both physical and virtual

• VMware vSphere products and technologies such as:

° Hypervisor basics

° vMotion and Storage vMotion

° Cluster capabilities such as High Availability (HA) and Distributed Resource Scheduler (DRS)

www.it-ebooks.info

Trang 16

[ 3 ]

Who this book is for

This book is ideal for those who desire a better understanding of how to design virtual datacenters leveraging VMware vSphere and associated technologies The typical reader will have a sound understanding of VMware vSphere fundamentals and would have been involved in the installation and administration of a VMware environment for more than two years

Conventions

In this book, you will find a number of styles of text that distinguish between

different kinds of information Here are some examples of these styles, and an explanation of their meaning

Code words in text are shown as follows: "VMware Communities Roundtable

podcast hosted by John Troyer (@jtroyer)."

New terms and important words are shown in bold Words that you see on the

screen, in menus or dialog boxes for example, appear in the text like this: "You can

adjust your guests' vNUMA configuration on a per-VM basis via Advanced Settings

in order to adjust vNUMA to meet your needs."

Warnings or important notes appear in a box like this

Tips and tricks appear like this

Reader feedback

Feedback from our readers is always welcome Let us know what you think about this book—what you liked or may have disliked Reader feedback is important for us

to develop titles that you really get the most out of

To send us general feedback, simply send an e-mail to feedback@packtpub.com, and mention the book title via the subject of your message If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors

Trang 17

Although we have taken every care to ensure the accuracy of our content, mistakes

do happen If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us By doing so, you can save other readers from frustration and help us improve subsequent versions of this book If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link,

and entering the details of your errata Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title Any existing errata can be viewed

by selecting your title from http://www.packtpub.com/support

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media

At Packt, we take the protection of our copyright and licenses very seriously If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy

Please contact us at copyright@packtpub.com with a link to the suspected

pirated material

We appreciate your help in protecting our authors, and our ability to bring

you valuable content

Questions

You can contact us at questions@packtpub.com if you are having a problem with any aspect of the book, and we will do our best to address it

www.it-ebooks.info

Trang 18

Virtual Data Center Design

The datacenter has quite simply become one of the most critical components to organizations today Some datacenters, either owned by large enterprises or leased

by large service providers, can contain great amounts of bleeding-edge technologies, tens of thousands of servers, huge network pipes from multiple carriers, and enough contingency mechanisms to keep their systems running for months in case of a disaster Other datacenters can consist of just a small handful of servers and

off-the-shelf networking gear stuffed in a closet Every organization is a bit different and has its own unique set of requirements in order to provide IT services to its users As system administrators, engineers, and architects, we need to be able to take those requirements, sometimes even search for or define them, and design a solution

to meet or exceed those requirements

As datacenters have evolved, we've seen a paradigm shift towards virtualization This is due to a number of factors including performance, availability, management, and recoverability This book aims to help call attention to these factors and explains how to address them in your Virtual Data Center designs

The Virtual Data Center has several key components such as compute, storage, networking, and management

We will be covering the following topics in this chapter:

• Core design principles for the Virtual Data Center

• Best Practices for Virtual Data Center design

• How best practices change over time

• Virtual Data Center design scenarios

• vCenter design including the Linux-based vCenter Server Appliance (vCSA)

• vSphere clustering, HA, and DRS

• A consideration of the other components of the vCloud Suite

Trang 19

Virtual Data Center Design

[ 6 ]

Virtual Data Center design principles

VMware vSphere is a leading platform for virtualization with many components that make up the VMware vCloud Suite including vCenter Server, ESXi hypervisor, and Site Recovery Manager (SRM) Through these products, the vSphere platform, with proper design characteristics, enables an IT infrastructure to provide services, availability, flexibility, recoverability, and performance that its customers require.Apart from knowledge of the aforementioned VMware products (or perhaps

any third-party solutions being considered), there are a few key principles to get started with our designs While there are numerous methodologies out there,

this is a framework on which you can base your process consisting of three basic

phases—Conceptual Design, Logical Design, and Physical Design, as shown in the

following diagram:

Conceptual Design

Logical Design

Physical Design

It is important to remember that while the process is important, the focus of this book

is about the design decisions If you typically subscribe to a different framework or methodology, that is OK, as the design decisions discussed throughout the book will hold true regardless of your process and methodology in most cases

First, create a Conceptual Design by gathering and analyzing business and

application requirements, and then document the risks, constraints, and

assumptions Once all of the information is gathered and compiled, you should be able to formulate a high-level end goal, which is ultimately a vision of the solution For example, you could have requirements to provide 99.99 percent uptime to an application, guarantee 2500 IOPS via shared storage, 25 GHz CPU, and 48 GB of RAM You could be constrained by a budget of $100,000 among various risks and assumptions So, conceptually, you'll have a high-level idea that you'll need a certain class of compute, storage, and network to support that type of application

www.it-ebooks.info

Trang 20

we just know that we need shared storage, and so on We need to map each of the requirements to a logical solution within the complete design This also begins to flush out dependencies between components and services While it would be great if

we could jump straight from the conceptual design right to the complete solution, we should be taking smaller steps towards our goal This step is all about putting ideas down on paper and thinking through this Logical Design before we start procuring hardware and building up the datacenter by trial and error

Finally, we transition our Logical Design into a Physical Design While the previous steps allow us to be somewhat theoretical in our work, this phase absolutely requires

us to do our homework and put specific parameters around our design For example,

we determine we need six servers with dual 8-core CPUs, 128 GB RAM, 15 TB of array-based fiber channel attached storage, 4 x 10 Gbe NICs, 2 x 8 Gb Fiber Channel HBAs, and so on In most cases, you'll have this detail down to the manufacturer of each component as well This physical design should consist of all the infrastructure components that will be required to support the requirements gathered in the

Conceptual Design phase For example, a complete Physical Design might include a vSphere Network Design, Storage Design, Compute Design, Virtual Machine (VM) Design, and Management Design As you can imagine, the documentation for a physical design can get quite large and time consuming, but this is a critical step

In summary, remember that the three design phases are additive and dependent upon one another In order to have a successful Physical Design, both the Conceptual and Logical Designs must be solid Document as much as is reasonable in your designs and even include reasoning behind choices Many choices seem obvious while making them; however, when it is time to review and deploy a design months later, it may not be as obvious as to why certain choices were made

Best practices

We hear the term Best practices everywhere but what does it really mean? Do we

always adhere to best practices? What happens to best practices as technology

evolves, or as your business evolves? These are great questions and ones that many people don't seem to ask There are three scenarios when addressing a design

Trang 21

[ 8 ]

You may ignore best practices, strictly follow best practices, or employ a combination

of the two The third scenario is the most common and should be your target Why? The simplified answer is that there are certainly times when best practices aren't in fact best for your design A great example is the general best practice for vSphere where the defaults for just about everything in the environment are acceptable For example, take storage multipathing VMware default is for fixed path and if best practice is default, you may consider not modifying this; experience shows that often

you'll see performance gains switching to Round Robin (see VMware KB1011340

for details) While changing this default may not be a best practice, then again, "Best Practice" might not fit your needs in the real world Choose what works for you and the constraints of your environment

Best practices also seem to have a shelf life much longer than intended A favorite example is that the best size for vSphere datastore is 500 GB While this sizing

best practice was certainly true at one time, vSphere has improved how it handles datastore sizing using features such as SCSI Reservation, which enables much

higher VM per datastore density Since vSphere 5.0, we've been able to create 64 TB datastores and the best practice is more about evaluating parameters such as number

of VMs, number of VMDKs per VM, required IOPS, SLAs, fault domains, and

restore capabilities Many a times the backend disk capabilities will determine the ideal datastore size Yet, many customers continue to use 500 GB as their standard datastore size There are many more examples but the bottom line is that we should use best practices as a guideline and then do our homework to determine how to apply them to our designs

Along with the best practices and design framework we've already discussed, it is also important to define and adhere to standards throughout your Virtual Data Center This includes items such as naming conventions for ESXi hosts and VMs as well as

IP address schemes, storage layout, network configurations, and security policies Adhering to these standards will make understanding the design much easier as well

as help as changes are introduced into the environment after deployment

Designing Virtual Data Center

As previously mentioned VMware's Virtual Data Center (vDC) is composed of several key components: vCenter, ESXi hypervisor, CPU, storage, networking, and virtual machines Many datacenters will have additional components but these form the foundation of typical vDC Note that it is possible to operate, albeit in a limited capacity, without vCenter in the management layer We find that it is really the core

of the vDC and enables many features we seem to take for granted, such as vMotion, DRS (Distributed Resource Scheduling), and HA (High Availability)

www.it-ebooks.info

Trang 22

Chapter 1

[ 9 ]

Our vDCs will also consist of several constructs that we will leverage in our designs

to provide organization and operability These include the following:

By using these constructs, we can bring consistency into complex designs For

example, by using folders and tags, we can organize VMs, hosts, and other objects so

we can easily find them, report on them, or even perform operations such as backup based on folders or tags

Tags are a new feature in vSphere 5.1 and above that work much

like tagging in other applications It allows us to put arbitrary tags

on objects in our vDC for organizational purposes Folders present a

challenge when we have objects that could belong in multiple folders

(which isn't possible) Many administrators use a combination of folders and tags to both organize their inventory as well manage it

The following screenshot is a specific example of how folders can be used to organize our vDC inventory:

It is important to define a standard folder hierarchy as part of your design and include tags into that standard to help organize and manage your infrastructure

Trang 23

[ 10 ]

Another facet to our designs will be how we might employ multiple vCenters and how we manage them Many organizations utilize multiple vCenters because they want to manage Production, QA, and Test/Dev, separately Others have vCenters distributed geographically with a vCenter managing each local infrastructure In any case, with vSphere 5.5, multiple vCenter management has become much easier

We have what is called Linked Mode as an option This allows us to provide a

single pane of access to resources connecting to multiple vCenters—think shared user roles/groups and being able to access objects from any vCenters that are linked through a single console However, with the vSphere web client included with vSphere 5.1 and above, we are able to add multiple vCenters to our view in the client So, if your requirements are to have less roles and groups to manage, then

Linked Mode may be the best route If your requirements are to be able to manage

multiple vCenters from a single interface, then the standard vSphere web client

should meet your needs without adding the complexity of Linked Mode.

Next, we have Single sign-on (SSO) This is a new feature introduced in vSphere 5.1

and improved in vSphere 5.5 Previously, SSO had a difficult installation process and some difficult design decisions with respect to HA and multisite configurations With SSO in vSphere 5.5, VMware has consolidated down to a single deployment model,

so we're able to much more easily design HA and multisite configurations because they are integrated by default

Finally, there is the subject of SSL certificates Prior to vSphere 5.0, it was a best practice (and a difficult task) to replace the default self-signed certificates of the vSphere components (vCenter, ESX, and so on) with CA-signed certificates That remains the case, but fortunately, VMware has developed and released the vCenter Certificate Automation tool, which greatly reduces the headaches caused when attempting to manually generate and replace all of those SSL certificates Although not a steadfast requirement, it is certainly recommended to include CA-signed certificates in your designs

VMware vCenter components

There are four primary components to vCenter 5.1 and 5.5:

• Single sign-on: This provides identity management for administrators and

applications that interact with the vSphere platform

• vSphere web client: This is the service that provides the web-based GUI for

administrators to interact with vCenter and the objects it manages

• vCenter inventory service: This acts as a cache for vCenter managed objects

when accessed via the web client to provide better performance and reduce lookups to the vCenter database

www.it-ebooks.info

Trang 24

Chapter 1

[ 11 ]

• vCenter server: This is the core service of vCenter, which is required by all

other components

There are also several optional components and support tools, which are as follows:

• vSphere client: This is the "legacy" C#-based client that will no longer be

available in the future versions of vSphere (although still required to interact with vSphere Update Manager)

• vSphere Update Manager (VUM): This is a tool used to deploy upgrades

and patches to ESXi hosts and VMware tools and virtual hardware version updates to VMs

• vSphere ESXi Dump Collector: This is a tool used mostly to configure

stateless ESXi hosts (that is, have no local storage) to dump VMkernel

memory to a network location and then pull those logs back into vCenter

• vSphere Syslog Collector: This is a tool similar to the Dump Collector

that allows ESXi system logs to be redirected to a network location and be accessed via vCenter

• vSphere Auto Deploy: This is a tool that can provision ESXi hosts over the

network and load ESXi directly into memory allowing for stateless hosts and efficient provisioning

• vSphere Authentication Proxy: This is a service that is typically used with

Auto Deploy so that hosts can be joined to a directory service, such as MS Active Directory, without the need to store credentials in a configuration file

At the onset of your design, these components should be accounted for (if used) and integrated into the design For small environments, it is generally acceptable

to combine these services and roles on the vCenter while larger environments will generally want to separate these roles as much as possible to increase reliability

by reducing the fault domain This ensures that the core vCenter services have the resources they need to operate effectively

Choosing a platform for your vCenter server

Now that we've reviewed the components of vCenter, we can start looking at

making some design decisions The first design decision we'll need to make relative

to vCenter is whether you'll have a physical or virtual vCenter VMware's own recommendation is to run vCenter as a VM However, just as with best practices,

we should examine and understand why before making a final decision There are valid reasons for having a physical vCenter and it is acceptable to do However, if

we make the appropriate design decisions, a virtual vCenter can benefit from all the flexibility and manageability of being a VM without high amounts of risk

Trang 25

[ 12 ]

One way to mitigate risks is to utilize a Management Cluster (sometimes called

a management pod) with dedicated hosts for vCenter, DNS, Active Directory,

monitoring systems, and other management infrastructure This benefits us in a few ways such as being able to more easily locate these types of VMs if vCenter is down Rather than connecting to multiple ESXi hosts via Secure Shell (SSH) or using the vSphere Client to connect to multiple hosts in order to locate a domain controller

to be manually powered on, we're able to identify its exact location within our management pod Using a management cluster is sometimes not an option due to the cost of a management pod, separate vCenter server for management, and the cost of managing a separate cluster In smaller environments, it isn't such a big deal because

we don't have many hosts to go searching around for VMs But, in general,

a dedicated cluster for management infrastructure is recommended

A second way to mitigate risks for vCenter is to use a product called vCenter

Heartbeat VMware vCenter Heartbeat provides automated and manual failover and failback of your vCenter Server and ensures availability at the application and service layer It can restart or restore individual services while monitoring and protecting the Microsoft SQL Server database associated with vCenter Server, even

if it's installed on a separate server For more information on vCenter Heartbeat, you can go to the product page at http://www.vmware.com/products/vcenter-server-heartbeat/

In general, if choosing a virtual vCenter, do everything possible to ensure that the vCenter VM has a high priority with resources and HA restarts An option would

be to set CPU and memory resource shares to High for the vCenter VM Also, do not disable DRS or use host affinity rules for the vCenter VM to try to reduce its movement within the vDC This will negate many of the benefits of having a

virtual vCenter

Using the vCenter server appliance

Traditionally, this decision hasn't really required much thought on our part The vCenter Server Appliance (vCSA or sometimes called vCVA or vCenter Virtual Appliance) prior to vSphere 5.5 was not supported for production use It also had a limit of managing five hosts and 50 VMs However, with vSphere 5.5, vCSA is fully supported in production and can manage 1000 hosts and 10000 VMs See vSphere 5.5 Configuration Maximums at http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-maximums.pdf

www.it-ebooks.info

Trang 26

Chapter 1

[ 13 ]

So why would we want to use an appliance versus a full Windows VM? The

appliance is Linux-based and comes with an embedded vPostgres, which will

support up to 100 hosts and 3000 VMs In order to scale higher, you need to use an external Oracle database server Environments can save on Windows and MS SQL licensing by leveraging the vCSA It is also a much simpler deployment There are

a few caveats, however, in that VMware Update Manager (VUM) still requires a separate Windows-based host with an MS SQL database as of vSphere 5.5 If you are running Horizon View, then you'll need a Windows server for the View Composer service and a similar situation exists if you are running vCloud Director The general idea, though, is that the vCSA is easier to manage and is a purpose-built appliance While vCSA may not be for everyone, but I would encourage you to at least consider

it for your designs

Sizing your vCenter server

If you're using the vCSA, this is less of an issue but there are still guidelines for disk space and memory as shown in the following table Note that this only applies to vCSA 5.5 and above as the vCSA prior to 5.5 is only supported with a maximum of five hosts and 50 VMs

VMware vCenter Server

Appliance Hardware Requirement

Disk storage on the host

machine The vCenter Server Appliance requires at least 7 GB of disk space, and is limited to a maximum size of 80 GB The vCenter

Server Appliance can be deployed with thin-provisioned virtual disks that can grow to the maximum size of 80 GB

If the host machine does not have enough free disk space to accommodate the growth of the vCenter Server Appliance virtual disks, vCenter Server might cease operation, and you will not be able to manage your vSphere environment

Memory in the VMware

vCenter Server Appliance • Very small (≤ 10 hosts, ≤ 100 virtual machines): This

has a size of at least 4 GB

• Small (10-100 hosts or 100-1000 virtual machines):

This has a size of at least 8 GB

• Medium (100-400 hosts or 1000-4000 virtual

machines): This has a size of at least 16 GB.

• Large (≥ 400 hosts or 4000 virtual machines): This has

a size of at least 24 GB

Source: VMware KB 2052334

Trang 27

[ 14 ]

If you are choosing a Windows-based vCenter, you need to size your machine

appropriately based on the size of your environment This includes CPU, memory, disk space, and network speed/connectivity

The number of clusters and VMs within a vCenter can absolutely

affect performance This is due to DRS calculations that must be made

and more objects mean more calculations Take this into account

when estimating resource requirements for your vCenter to ensure

performance and reliability

It is also recommended to adjust the JVM (Java Virtual Machine) heap settings for the vCenter Management Web service (tcServer), Inventory Service, and Profile-driven Storage Service based on the size of deployment Note that this only affects vCenter 5.1 and above

The following VMware KB articles have up-to-date information for your specific vCenter version and should be referenced for vCenter sizing:

vCenter version VMware KB article number

5.0 20037905.1 20212025.5 2052334

Choosing your vCenter database

The vCenter database is the location where all configuration information, along with some performance, task, and event data, is stored As you might imagine, the vCenter DB can be a critical component to your virtual infrastructure But it also can be disposable—although this is becoming a much less adopted strategy than it was a couple of years ago In some environments that weren't utilizing features like the Virtual Distributed Switch (VDS) and didn't have a high reliance on the stored event data, we saw that vCenter could be quickly and easily reinstalled from scratch without much of an impact In today's vDC's, however, we see more widespread use

of VDS and peripheral VMware technologies that make the vCenter DB much more important and critical to keep backed up For example, the VDS configuration is housed in the vCenter DB, so if we were to lose the vCenter DB, it becomes difficult and disruptive to migrate those hosts and their VMs to a new vCenter

www.it-ebooks.info

Trang 28

Chapter 1

[ 15 ]

There are two other components that require a DB in vCenter and those are VUM and SSO SSO is a new component introduced in vSphere 5.1 and then completely rewritten in vSphere 5.5 If you haven't deployed 5.1 or 5.5 yet, I'd strongly

encourage you to go to 5.5 when you're ready to upgrade due to this re-architecture

of SSO In vSphere 5.1, the SSO setup requires manual DB configuration with SQL authentication and JDBC SSL, which caused many headaches and many support tickets with VMware support during the upgrade process

We have the following choices for our vCenter DB:

• MS SQL Express

• MS SQL

• IBM DB2

• Oracle

The different versions of those DBs are supported depending on the version

of vCenter you are deploying, so check the VMware Product Interoperability

Matrixes located at http://partnerweb.vmware.com/comp_guide/sim/interop_matrix.php

The MS SQL Express DB is used for small deployments—maximum of five hosts and

50 VMs—and is the easiest to set up and configure Sometimes, this is an option for dev/test environments as well

Next, we have probably the most popular option, that is, MS SQL MS SQL can either be installed on the vCenter itself, but it is recommended to have a separate server dedicated to SQL Again, sometimes in smaller deployments, it is OK to install the SQL DB on the vCenter server, but the general best practice is to have that dedicated database server for better protection, flexibility, and availability With vCenter 5.5, we now have broader support of Windows Failover Clustering (formerly Microsoft Cluster Services or MSCS), which enhances our options for the availability of the vCenter DB

Last, we have Oracle and IBM DB2 These two options are common for organizations that already have these databases in use for other applications Each database

type has its own benefits and features Choose the database that will be the easiest

to manage while providing the features that are required In general, any of the database choices are completely capable of providing the features and

performance of vCenter

Trang 29

[ 16 ]

vSphere clustering – HA and DRS

Aside from vCenter itself, HA and DRS are two features that help unlock the true potential of virtualization This combination of automatic failover and workload balancing result in more effective and efficient use of resources The mechanics of

HA and DRS are a full topic in and of itself, so we'll focus on how to design with HA and DRS in mind

The following components will be covered in this section:

• Host considerations

• Networking design considerations

• Storage design considerations

• Cluster configuration considerations

• Admission control

Host considerations

Providing high availability to your vSphere environment means making every

reasonable effort to eliminate any single point of failure (SPoF) Within our hosts,

we can make sure we're using redundant power supplies, ECC memory, multiple I/O adapters (such as NICs and HBAs), and using appropriate remote monitoring and alerting tools We can also distribute our hosts across multiple racks or blade chassis to guard against the failure of a rack or chassis bringing down an entire cluster So not only is component redundancy important, but it is also important to look at the physical location

Some other items related to our hosts are using identical hardware Although

not always possible, such as hosts that have different lifecycles, we should make

an effort to be consistent in our host hardware within a cluster Aside from

simplifying configuration and management, hardware consistency reduces resource fragmentation For example, if we have hosts with Intel E5-2640 CPUs and E5-2690 CPUs, the cluster will become unbalanced and result in fragmentation HA prepares for a worst-case scenario and plans for the largest host in a cluster to fail resulting in more resources being reserved for that HA event This results in a lower efficiency in resource utilization

The next consideration with HA is the number of hosts in your design We typically look for an N+1 configuration for our cluster N+1 represents the number of hosts we need to service our workload (N) plus one additional host in case of a failure Some describe this as having a "hot spare", but in the case of vSphere, we're typically using all of the hosts to actively serve workload

www.it-ebooks.info

Trang 30

Chapter 1

[ 17 ]

In some cases, it can be acceptable to actually reserve a host

or hosts for failover This is truly a hot spare scenario where those designated hosts sit idle and are used only in the case

of an HA event This is an acceptable practice, but it isn't the most efficient use of resources

VMware vSphere 5.5 supports clusters of up to 32 hosts and it is important to

understand how HA reserves resources based on cluster size HA essentially

reserves an entire hosts' worth of resources and then spreads that reservation across all hosts in the cluster For example, if you have two hosts in a cluster, 50 percent

of the total cluster resources are reserved for HA To put it another way, if one host fails, the other host needs to have enough resource capacity to service the entire workload If you have three hosts in a cluster, then 33 percent of the resources are reserved in the cluster for HA You may have picked up on the pattern that HA reserves )N1) resources for the cluster where N is equal to the number of hosts in the cluster As we increase the number of hosts in a cluster, we see a smaller percentage

of resources reserved for HA that result in higher resource utilization One could argue that larger clusters increase complexity but that is negated by the benefits of higher utilization The important takeaway here is to remember that these reserved resources need to be available to the hosts If they are being used up by VMs and

an HA event occurs, it is possible that VMs will not be restarted automatically and you'll experience downtime

Last, with respect to hosts, is versioning It is absolutely recommended to have all hosts within a cluster at the same major and minor versions of ESXi, along with the same patch level Mixed-host clusters are supported but because there are differences

in HA, DRS, and other features, functional and performance issues can result If different host versions are required, it is recommended to separate those hosts into different clusters, that is, a cluster for 5.0 hosts and a cluster for 5.5 hosts

For a VMware HA percentage Admission Control Calculator see

Trang 31

as dead and VMs will not start up on that host This can also trigger an isolation response due to the datastore heartbeat In either case, this can be prevented by enabling PortFast on those ports on the connected switches Some switch vendors may call the feature something different, but the idea of PortFast is that STP will allow traffic to pass on those ports while it completes its calculations

HA relies on the management network of the hosts This brings with it several design considerations Each host should be able to communicate with all other hosts' management interfaces within a cluster Most of the time, this is not an issue because our network designs typically have all of our hosts' management VMkernel ports on the same layer 2 subnet, but there are scenarios where this is not the case Therefore,

it is important to make sure all hosts within a cluster can communicate properly to ensure proper HA functionality and prevent network partitions Along these same lines, we should be designing our networks so that we have the fewest possible number of hops between hosts Additional hops, or hardware segments, can cause higher latency and increase points of failure

Because of the criticality of the management VMkernel port for proper HA

functionality, it is imperative that we design in redundancy for this network This means having more than one physical NIC (pNIC) in our hosts assigned to this network If there is a failure in the management network, we still may get host isolation since HA won't have a means to test connectivity between vSphere servers.When standardizing your network configuration, you should use consistent names for objects such as Port Groups, VLANs, and VMkernel ports Inconsistent use

of names across our hosts can result in VMs losing network connectivity after a vMotion or even preventing vMotion or an HA restart This, of course, results in increased downtime

www.it-ebooks.info

Trang 32

Chapter 1

[ 19 ]

Storage considerations

HA, as previously mentioned, also relies on datastore heartbeats to determine if an

HA event has occurred Prior to vSphere 5.0, we only had the network to determine whether a host had failed or become isolated Now we can use datastores to aid

in that determination enhancing our HA functionality Datastore connectivity is inherently important—after all, our VMs can't run if they can't connect to their storage, right? So, in the case where we are utilizing a storage array, practices such

as multipathing continue to be important and we need to make sure our hosts have multiple paths to our storage array whether that is iSCSI, NFS, FC, or FCoE

vCenter will automatically choose two datastores to use for heartbeats However,

we do have the capability to override that choice A scenario where this makes sense

is if we have multiple storage arrays It would be recommended to have a heartbeat datastore on two different arrays Note that we can have up to five heartbeat

datastores Otherwise, the algorithm for choosing the heartbeat datastores is fairly robust and normally does not need to be tampered with Another consideration is that if we are using all IP-based storage, we should try to have separate physical network infrastructure to connect to that storage to minimize the risk of the

management network and storage network being disrupted at the same time Last, wherever possible, all hosts in a cluster should have access to the same datastores to maximize the ability of isolated hosts to communicate via the datastore heartbeat

Cluster considerations

Of the many configurations we have within a cluster for HA, Host Isolation

Response seems to be often neglected While we previously talked about how to make our management networks robust and resilient, it is still important to decide how we want our cluster to respond in the event of a failure We have the following three options to choose from for the Host Isolation Response:

• Leave Powered On: VMs will remain running on their host even though the

host has become isolated

• Power Off: VMs will be forcibly shutdown as if pulling the power plugs

from a host

• Shutdown: Using VMware tools, this option attempts to gracefully

shutdown the VM and after a default timeout period of 300 seconds will then force the power off

Trang 33

[ 20 ]

In many cases, the management network may be disrupted but VMs are not

affected In a properly architected environment, host isolation is a rare occurrence, especially with a fiber channel SAN fabric as datastore heartbeats will be out-of-band of the network If the design requires IP storage, which uses the same network infrastructure as the management network, then the recommended option is to shut down during isolation as it is likely that a disruption in the management network will also affect access to the VM's datastore(s) This ensures that another host can power on the affected VMs

Another HA feature is VM and application monitoring This feature uses VMware tools or an application agent to send heartbeats from the guest OS or application to the

HA agent running on the ESXi host If a consecutive number of heartbeats are lost, HA can be configured to restart the VM The recommendation here is to utilize this feature and decide ahead of time how tolerant of heartbeat failures you wish to be

Admission control

Admission Control (AC) is another great feature of HA Because AC can prevent

VMs from being powered on during a normal operation or restarted during an HA event, many administrators tend to turn AC off HA uses AC to ensure that sufficient resources exist in a cluster and are reserved for VM recovery during an HA event

In other words, without AC enabled, you can overcommit your hosts to the point that if there was to be an HA event, you may find yourself with some VMs that are unable to power on Let's take another look at a previous example where we have three hosts in a cluster HA reserves 33 percent of that cluster for failover Without

AC enabled, we can still utilize that 33 percent reserve which puts us at risk With

AC, we are unable to utilize those reserved resources, which mitigates that risk of overprovisioning and ensures we'll have enough resources to power on our VMs during an HA event When Admission Control is enabled, we can choose from the following three policies:

• Host Failures Cluster Tolerates (default): This reserves resources based on

the number of hosts for which we want to tolerate failures For example, in a

10 host cluster, we could want to tolerate two host failures, so instead of the default one host failures the cluster tolerates (remember the N)1) formula), we would be allowing two host failures to tolerate in the cluster for failover

HA actually uses something called a "slot size" when determining

resource reservations Slot sizes are calculated on a worst-case basis;

thus, if you have VMs with differing CPU and memory reservations, the slot size will be calculated from the VM with the highest reservations

This could greatly skew the slot size and result in more resources being

reserved for failover than necessary

www.it-ebooks.info

Trang 34

Chapter 1

[ 21 ]

• Percentage of Cluster Resources Reserved: This reserves the specified

percentage of CPU and memory resources for failover If you have a cluster where VMs have significantly different CPU and memory reservations

or hosts with different CPU and memory capacities, this policy is

recommended This setting must be revisited as the cluster grows or shrinks

• Specify a Failover Host: This designates a specific host or hosts as a

hot spare

The general recommendation is to use the percentage of cluster resources policy for Admission Control because it offers maximum flexibility Design the percentage reserved to represent the number of host failures to be tolerated For example, in a four host cluster, if you want to tolerate one host failure your percentage of failures tolerated is 25 percent, and if you wanted to tolerate two host failures it is 50 percent

If there will be hosts in the cluster of different sizes, be sure to use the largest host(s)

as your reference for determining the percentage to reserve Similarly, if the Specify

a Failover Host policy is used, we must specify hosts that have CPU and memory equal to or larger than the largest nonfailover host in the cluster

If the Host Failures Cluster Tolerates policy is to be used, then make every

reasonable attempt to use similar VM resource reservations across the cluster This is due to the slot size calculation mentioned earlier

Summary

In this chapter, we learned about the design characteristics of the Virtual Data Center and the core components of the VMware vSphere platform We also discussed best practices and design considerations specifically around vCenter, Clustering, and HA

In the next chapter, we'll dive into the best practices of design of the ESXi hypervisor

Trang 36

Hypervisor Design

This chapter begins a series of chapters regarding the architecture of specific virtual datacenter components The hypervisor is a critical component and this chapter will examine various best practices for hypervisor design

We will cover the following topics in this chapter:

• ESXi hardware design including CPU, memory, and NICs

• NUMA, vNUMA, and JVM considerations

• Hypervisor storage components

• Stateless host design

• Scale-Up and Scale-Out designs

ESXi hardware design

Over the past 15 years we've seen a paradigm shift in server technology from

mainframe and proprietary systems such as PowerPC, Alpha, and Itanium to

commodity x86 architecture This shift, in conjunction with x86 virtualization, has yielded tremendous benefits and has allowed us both flexibility and

standardization with our hardware designs If we take a look at the vSphere

Hardware Compatibility List (HCL), we only see two CPU manufacturers: Intel

and AMD (Advanced Micro Devices) The VMware HCL can be found at

http://www.vmware.com/resources/compatibility/search.php

Between these two manufacturers, however, there are a wide range of CPUs

to choose from and VMware does a great job of supporting new CPUs as they are released and developing new technologies that further enhance the capabilities

of x86 virtualization

Trang 37

mentioned in Chapter 1, Virtual Data Center Design, standardization goes a long

way and our host designs are no different However, there are certainly instances

in which we may choose different CPUs for our hosts It remains a best practice

to keep the same CPU model consistent throughout a cluster If we find ourselves

in a situation where we need different CPU models within a cluster, such as

adding new hosts to an existing cluster, we have a great feature called Enhanced

vMotion Compatibility (EVC) As long as we're running CPUs from the same chip

manufacturer (that is all Intel or all AMD), we can mask CPU features to a lowest common denominator to ensure vMotions can occur between different chip models For example, without EVC, we could not vMotion a VM from a host with an Intel Xeon 5500-series CPU to a host with the latest Intel Xeon E5-2600 CPU With EVC this is completely possible and works very well For more information on the EVC requirements, see VMware KB 1003212

CPUs have many parameters to look at when we're choosing the best one(s) for our host design The following parameters are the most common factors in determining the appropriate CPU for a host:

• Cores/threads

• Clock speed

• Cache

• Power draw (watts)

• vSphere feature support

• Memory support

• Max CPUs supported (2-way, 4-way, 6-way, and so on)

Each of these parameters can be integral to a design while we may not use the others

We still need to consider the effects each one may have on our designs Provided we have a good conceptual design, and thus proper requirements, we should be able

to determine how many CPU resources we will need If we have a requirement for eight cores per socket or 12 GHz CPU per host or maybe more commonly a density

of 60 VMs per host is specified We can then profile the workload to determine the proper range of CPUs to use A range is used because cost is typically always an influencing factor Initially, we may come up with a range and then research cost to narrow down our selection

www.it-ebooks.info

Trang 38

Chapter 2

[ 25 ]

Continuing with the example where we require 60 VMs per host, there is an

important metric to look at when considering consolidation ratios: the ratio of vCPUs

to pCPUs denoted as vCPU:pCPU A vCPU is a virtual CPU that is presented to a

VM while the pCPU represents a physical CPU in the host system Furthermore, with technologies such as hyperthreading, pCPU can represent a logical processor For instance, an Intel Xeon E5-2640 CPU has eight cores and hyperthreading enabled, which yields 16 pCPUs because each core can execute two threads Each thread represents a logical processor (pCPU) and thus 8 x 2 = 16 logical processors or 16 pCPUs VMware Best Practices for Performance with Hyper-threading can be read about in more detail at http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

Sometimes we have project requirements that allow us to start with a specific goal of

an overall vCPU:pCPU ratio or specific ratios for different parts of the infrastructure From there, we can choose a CPU and server platform that can accommodate those requirements Let's get back to that 60 VMs per host example, and for this we will assume that these are all general purpose VMs with no special requirements and

an average of 1.5 vCPUs per VM because of a mixture of one vCPU and two vCPU machines Our design targets a vCPU: pCPU ratio of 2:1 The math is as follows: 1.5 x

60 = 90 vCPUs Therefore, we'll require at least 45 pCPUs per host to achieve our 2:1 target (90:45) An eight-socket server with six-core CPUs would yield 48 pCPUs and would accommodate the needs for this design A quad socket configuration with 12-core CPUs would also yield 48 pCPUs There are many ways to make the math work and other considerations such as using high availability configurations,

additional capacity for growth that we will discuss as we continue our design

For general purpose predictable workloads that don't have special requirements (such as SQL, Exchange, SharePoint, and VDI), we typically see a vCPU:pCPU ratio

of 2:1 to 4:1 inclusive with higher ratios for test and development environments But this is certainly a situation where your mileage may vary, so you need to know your applications and workload in order to intelligently choose the appropriate ratio for your environment

Some of the other factors, such as cache and clock speed, are a bit more

straightforward More cache and more speed equal higher performance in most cases If you are power conscious, then you'll want to take into account the power draw of each CPU (wattage) Power consumption modeling is often used to

determine the TCO (total cost of ownership) of systems in a datacenter, especially when using co-location facilities where you are charged for power consumption Unless we're specifying refurbished or otherwise dated CPUs, new ones are typically going to support all of VMware's feature set If you are using older equipment, it is vital to review the HCL to validate that your infrastructure is supported

Trang 39

Hypervisor Design

[ 26 ]

Memory and NUMA considerations

While CPUs provide the raw processing power for our VMs, memory provides the capacity for our applications and workload to run

Also important are High Availability (HA) admission controls If strict admission

controls are enabled, then guests that violate HA controls won't come back online after a failure Instead, by employing relaxed admission controls, you'll see

performance degradation but won't have to deal with the pain of HA causing VMs

to power off and not come back By considering the speed, quantity, and density

of our host-based RAM, we're able to avoid those issues and provide predictable performance to our virtual infrastructure This means that your design depends on sizing memory and calculating HA minimums to ensure you don't cause problems

At the highest level, the total amount of host memory is pretty straightforward As

we take into consideration our workload and application requirements, we can start

by simply dividing the total amount of RAM required by the number of hosts to get the amount of memory we want in each host Just as before, there are tools to help

us determine the amount of memory we need VMware Capacity Planner is one of those tools and is available to VMware partners VMware has some free flings such

as vBenchmark and ESX System Analyzer Other tools such as Dell Quest's vFoglight and SolarWinds Virtualization Manager are commercially available and can provide detailed reporting and recommendations for properly sizing your virtual environment

In most cases we're designing an N+1 architecture, we'll then add an extra host with that amount of RAM into the cluster For example, if our overall workload requires one TB of RAM, we may design for four hosts each with 256 GB of RAM To achieve the proper failover capacity required for HA and our N+1 architecture, we'll add a fifth host for a total of 1.25 TB

NUMA (non-uniform memory access) is a feature where the system memory (RAM)

is divided up into NUMA "nodes" These nodes are then tied to a pCPU (remember, this also could be a CPU core) and represent the set of RAM (or address space) that can be accessed quickest by a given pCPU

www.it-ebooks.info

Trang 40

Chapter 2

[ 27 ]

Each NUMA node is situated the closest to its associated pCPU and results in a

reduced time to access that memory This reduces memory access contention as only

a single pCPU can access the memory space at a time pCPUs may simultaneously access their NUMA nodes resulting in dramatically increased performance over non-NUMA, or UMA systems The following diagram shows the NUMA architecture:

access

3 4

NUMA Architecture

Some server manufacturers include settings in the server's BIOS, which will impact NUMA One such setting is called Node Interleaving

and should be disabled if we want to take advantage of NUMA

optimizations For more information, have a look at the White Paper

from VMware at https://www.vmware.com/files/pdf/

an increase in memory access times because the VM will span NUMA nodes and will need to access "remote memory" or memory outside of its NUMA node

Định dạng
Số trang	126
Dung lượng	4,23 MB