What this book covers Chapter 1, Virtual Data Center Design, gives you an insight into the design and architecture of the overall datacenter, including the core components of vCenter an
Trang 1www.it-ebooks.info
Trang 2vSphere Design Best Practices
Apply industry-accepted best practices to design
reliable high-performance datacenters for your
Trang 3vSphere Design Best Practices
Copyright © 2014 Packt Publishing
All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information.First published: May 2014
Trang 4Valentina Dsilva Abhinash Sahu
Production Coordinator
Alwin Roy
Cover Work
Alwin Roy
Trang 5About the Authors
Brian Bolander spent 13 years on active duty in the United States Air Force A veteran of Operation Enduring Freedom, he was honorably discharged in 2005 He immediately returned to Afghanistan and worked on datacenter operations for the
US Department of Defense (DoD) at various locations in southwest Asia Invited
to join a select team of internal consultants and troubleshooters responsible for operations in five countries, he was also the project manager for what was then the largest datacenter built in Afghanistan
After leaving Afghanistan in 2011, he managed IT operations in a DoD datacenter
in the San Francisco Bay Area His team was responsible for dozens of multimillion dollar programs running on VMware and supporting users across the globe
He scratched his adrenaline itch in 2011 when he went back "downrange", this time directing the premier engineering and installation team for the DoD in
Afghanistan It was his privilege to lead this talented group of engineers who were responsible for architecting virtual datacenter installations, IT project management, infrastructure upgrades, and technology deployments on VMware for the entire country from 2011 to 2013
Selected as a vExpert in 2014, he is currently supporting Operation Enduring
Freedom as a senior virtualization and storage engineer
He loves his family, friends, and rescue mutts He digs science, technology, and geek gear He's an audiophile, a horologist, an avid collector, a woodworker, an amateur photographer, and a virtualization nerd
www.it-ebooks.info
Trang 6Christopher Kusek had a unique opportunity presented to him in 2013, to take the leadership position responsible for theater-wide infrastructure operations
for the war effort in Afghanistan Leveraging his leadership skills and expertise
in virtualization, storage, applications, and security, he's been able to provide
enterprise-quality service while operating in an environment that includes the real and regular challenges of heat, dust, rockets, and earthquakes
He has over 20 years' experience in the industry, with virtualization experience running back to the pre-1.0 days of VMware He has shared his expertise with
many far and wide through conferences, presentations, #CXIParty, and sponsoring
or presenting at community events and outings whether it is focused on storage, VMworld, or cloud
He is the author of VMware vSphere 5 Administration Instant Reference, Sybex, 2012, and VMware vSphere Performance: Designing CPU, Memory, Storage, and Networking for
Performance-Intensive Workloads, Sybex, 2014 He is a frequent contributor to VMware
Communities' Podcasts and vBrownbag, and has been an active blogger for over
a decade
A proud VMware vExpert and huge supporter of the program and growth of the virtualization community, Christopher continues to find new ways to outreach and spread the joys of virtualization and the transformative properties it has on individuals and businesses alike He was named an EMC Elect in 2013, 2014, and continues to contribute to the storage community, whether directly or indirectly, with analysis and regular review
He continues to update his blog with useful stories of virtualization and storage and his adventures throughout the world, which currently include stories of his times in Afghanistan You can read his blog at http://pkguild.com or more likely catch him
on Twitter; his twitter handle is @cxi
When he is not busy changing the world, one virtual machine at a time or Facetiming with his family on the other side of the world, he's trying to find awesome vegan food in the world at large or somewhat edible food for a vegan in a war zone
Trang 7About the Reviewers
Andy Grant works as a technical consultant for HP Enterprise Services Andy's primary focus is datacenter infrastructure and virtualization projects across a
number of industries including government, healthcare, forestry, financial, gas and oil, and international contracting He currently holds a number of technical certifications including VCAP4/5-DCA/DCD, VCP4/5, MCITP:EA, MCSE, CCNA, Security+, A+, and HP ASE BladeSystem
Outside of work, he enjoys backcountry camping, playing action pistol sports (IPSC), and spending time being a goof with his son
Muhammad Zeeshan Munir is a freelance ICT consultant and solution architect
He established his career as a system administrator in 2004 and since then has
acquired and executed many successful projects in multimillion ICT industries With more than 10 years of experience, he now provides ICT consultancy services
to different clients in Europe He also works as a system consultant for Qatar
Computing Research Institute He regularly contributes to different wikis and produces various video tutorials mostly about different technologies, including VMware products, Zimbra e-mail services, OpenStack, and Red Hat Linux, which can be found at http://zee.linxsol.com/system-administration When he is doing nothing, he likes to travel around and speak languages such as English, Urdu, Punjabi, and Italian
www.it-ebooks.info
Trang 8Prasenjit Sarkar is a senior member of the technical staff in VMware Service Provider Cloud R&D where he provides architectural oversight and technical
guidance to design, implement, and test VMware's Cloud datacenters You can follow him on Twitter at @stretchcloud
He is an author, R&D guy, and a blogger focusing on virtualization, cloud
computing, storage, networking, and other enterprise technologies He has more than 10 years' expert knowledge in R&D, professional services, alliances, solution engineering, consulting, and technical sales with expertise in architecting and
deploying virtualization solutions and rolling out new technology and solution initiatives His primary focus is on the VMware vSphere infrastructure and public cloud using VMware vCloud Suite One of his other areas of focus is to own the entire life cycle of a VMware-based IaaS (SDDC), especially vSphere, vCloud
Director, vShield Manager, and vCenter Operations
He was one of the VMware vExperts in 2012 and 2013 and well known for his acclaimed virtualization blog http://stretch-cloud.info He holds certifications from VMware, Cisco, Citrix, Red Hat, Microsoft, IBM, HP, and Exin Prior to joining VMware, he has served other fine organizations such as Capgemini, HP, and GE as a solution architect and infrastructure architect
I would like to thank and dedicate this book to my family
Without their endless and untiring support, this book would
not have been possible
Trang 9Support files, eBooks, discount offers and more
You might want to visit www.PacktPub.com for support files and downloads related to your book
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy Get in touch with us at service@packtpub.com for more details
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks
• Fully searchable across every book published by Packt
• Copy and paste, print and bookmark content
• On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access
PacktLib today and view nine entirely free books Simply use your login credentials for immediate access
Instant Updates on New Packt Books
Get notified! Find out when new books are published by following @PacktEnterprise on
Twitter, or the Packt Enterprise Facebook page.
www.it-ebooks.info
Trang 10Table of Contents
Preface 1 Chapter 1: Virtual Data Center Design 5
Chapter 2: Hypervisor Design 23
Summary 35
Trang 11SIOC 46
Provisioning 47
Chapter 4: Network Design 49
Summary 57
Chapter 5: Virtual Machine Design 59
www.it-ebooks.info
Trang 12Table of Contents
[ iii ]
Limits 63Reservations 63
Summary 68
Chapter 6: Business Critical Applications 69
Chapter 7: Disaster Recovery and Business Continuity 81
Appendix A: vSphere Automation Tools 93
Trang 13Table of Contents
[ iv ]
Appendix B: Certification Resources 99
Index 105
www.it-ebooks.info
Trang 14Welcome to vSphere Design Best Practices, an easy-to-read guide full of hands-on
examples of real-world design best practices Each topic is explained and placed
in the context of virtual datacenter design
This book is all about design principles It isn't about operations or administration; instead, we focus on designing virtual datacenters to meet business requirements and leveraging vSphere to get us to robust and highly available solutions
In this book, you'll learn how to utilize the features of VMware to design, architect, and operate a virtual infrastructure using the VMware vSphere platform Readers will walk away with a sense of all the details and parameters there are for a design that has been well thought out
We'll examine how to customize your vSphere infrastructure to fit business needs and look at specific use cases for live production environments Readers will become familiar with new features in Version 5.5 of the vSphere suite and how these features can be leveraged in design
Readers will walk away with confidence as they will know what their next steps are towards accomplishing their goals, whether that be a VCAP-DCD certification or a sound design for an upcoming project
What this book covers
Chapter 1, Virtual Data Center Design, gives you an insight into the design and
architecture of the overall datacenter, including the core components of vCenter and ESXi
Chapter 2, Hypervisor Design, dives into the critical components of the datacenter by
providing the best practices for hypervisor design
Trang 15[ 2 ]
Chapter 3, Storage Design, explains one of the most important design points of the
datacenter—storage This chapter focuses attention on the design principles related
to the storage and storage protocols
Chapter 4, Network Design, peels back the layers of networking by focusing on designing
flexible and scalable network architectures to support your virtual datacenter
Chapter 5, Virtual Machine Design, covers what it takes to inspect your existing
virtual machines and provides guidance and design principles to correctly size and deploy VMs
Chapter 6, Business Critical Applications, breaks through the barriers and concerns
associated with business critical applications, allowing business requirements to translate into a successful and stable deployment
Chapter 7, Disaster Recovery and Business Continuity, considers the many parameters
that make up a DR/BC design, providing guidance to make sound decisions to ensure a well-documented and tested disaster recovery policy
Appendix A, vSphere Automation Tools, briefly discusses the value of automation tools
such as vCenter Orchestrator (vCO) and vCloud Automation Center (vCAC), their differences, and use cases for your design and implementation
Appendix B, Certification Resources, gives guidance on the VMware certification
roadmap and lists resources for the Data Center Virtualization (DCV) track
What you need for this book
As this book is technical in nature, the reader should possess an understanding of the following concepts:
• Storage technologies (SAN, NAS, Fibre Channel, and iSCSI)
• Networking—both physical and virtual
• VMware vSphere products and technologies such as:
° Hypervisor basics
° vMotion and Storage vMotion
° Cluster capabilities such as High Availability (HA) and Distributed Resource Scheduler (DRS)
www.it-ebooks.info
Trang 16[ 3 ]
Who this book is for
This book is ideal for those who desire a better understanding of how to design virtual datacenters leveraging VMware vSphere and associated technologies The typical reader will have a sound understanding of VMware vSphere fundamentals and would have been involved in the installation and administration of a VMware environment for more than two years
Conventions
In this book, you will find a number of styles of text that distinguish between
different kinds of information Here are some examples of these styles, and an explanation of their meaning
Code words in text are shown as follows: "VMware Communities Roundtable
podcast hosted by John Troyer (@jtroyer)."
New terms and important words are shown in bold Words that you see on the
screen, in menus or dialog boxes for example, appear in the text like this: "You can
adjust your guests' vNUMA configuration on a per-VM basis via Advanced Settings
in order to adjust vNUMA to meet your needs."
Warnings or important notes appear in a box like this
Tips and tricks appear like this
Reader feedback
Feedback from our readers is always welcome Let us know what you think about this book—what you liked or may have disliked Reader feedback is important for us
to develop titles that you really get the most out of
To send us general feedback, simply send an e-mail to feedback@packtpub.com, and mention the book title via the subject of your message If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors
Trang 17Although we have taken every care to ensure the accuracy of our content, mistakes
do happen If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us By doing so, you can save other readers from frustration and help us improve subsequent versions of this book If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link,
and entering the details of your errata Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title Any existing errata can be viewed
by selecting your title from http://www.packtpub.com/support
Piracy
Piracy of copyright material on the Internet is an ongoing problem across all media
At Packt, we take the protection of our copyright and licenses very seriously If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy
Please contact us at copyright@packtpub.com with a link to the suspected
pirated material
We appreciate your help in protecting our authors, and our ability to bring
you valuable content
Questions
You can contact us at questions@packtpub.com if you are having a problem with any aspect of the book, and we will do our best to address it
www.it-ebooks.info
Trang 18Virtual Data Center Design
The datacenter has quite simply become one of the most critical components to organizations today Some datacenters, either owned by large enterprises or leased
by large service providers, can contain great amounts of bleeding-edge technologies, tens of thousands of servers, huge network pipes from multiple carriers, and enough contingency mechanisms to keep their systems running for months in case of a disaster Other datacenters can consist of just a small handful of servers and
off-the-shelf networking gear stuffed in a closet Every organization is a bit different and has its own unique set of requirements in order to provide IT services to its users As system administrators, engineers, and architects, we need to be able to take those requirements, sometimes even search for or define them, and design a solution
to meet or exceed those requirements
As datacenters have evolved, we've seen a paradigm shift towards virtualization This is due to a number of factors including performance, availability, management, and recoverability This book aims to help call attention to these factors and explains how to address them in your Virtual Data Center designs
The Virtual Data Center has several key components such as compute, storage, networking, and management
We will be covering the following topics in this chapter:
• Core design principles for the Virtual Data Center
• Best Practices for Virtual Data Center design
• How best practices change over time
• Virtual Data Center design scenarios
• vCenter design including the Linux-based vCenter Server Appliance (vCSA)
• vSphere clustering, HA, and DRS
• A consideration of the other components of the vCloud Suite
Trang 19Virtual Data Center Design
[ 6 ]
Virtual Data Center design principles
VMware vSphere is a leading platform for virtualization with many components that make up the VMware vCloud Suite including vCenter Server, ESXi hypervisor, and Site Recovery Manager (SRM) Through these products, the vSphere platform, with proper design characteristics, enables an IT infrastructure to provide services, availability, flexibility, recoverability, and performance that its customers require.Apart from knowledge of the aforementioned VMware products (or perhaps
any third-party solutions being considered), there are a few key principles to get started with our designs While there are numerous methodologies out there,
this is a framework on which you can base your process consisting of three basic
phases—Conceptual Design, Logical Design, and Physical Design, as shown in the
following diagram:
Conceptual Design
Logical Design
Physical Design
It is important to remember that while the process is important, the focus of this book
is about the design decisions If you typically subscribe to a different framework or methodology, that is OK, as the design decisions discussed throughout the book will hold true regardless of your process and methodology in most cases
First, create a Conceptual Design by gathering and analyzing business and
application requirements, and then document the risks, constraints, and
assumptions Once all of the information is gathered and compiled, you should be able to formulate a high-level end goal, which is ultimately a vision of the solution For example, you could have requirements to provide 99.99 percent uptime to an application, guarantee 2500 IOPS via shared storage, 25 GHz CPU, and 48 GB of RAM You could be constrained by a budget of $100,000 among various risks and assumptions So, conceptually, you'll have a high-level idea that you'll need a certain class of compute, storage, and network to support that type of application
www.it-ebooks.info
Trang 20we just know that we need shared storage, and so on We need to map each of the requirements to a logical solution within the complete design This also begins to flush out dependencies between components and services While it would be great if
we could jump straight from the conceptual design right to the complete solution, we should be taking smaller steps towards our goal This step is all about putting ideas down on paper and thinking through this Logical Design before we start procuring hardware and building up the datacenter by trial and error
Finally, we transition our Logical Design into a Physical Design While the previous steps allow us to be somewhat theoretical in our work, this phase absolutely requires
us to do our homework and put specific parameters around our design For example,
we determine we need six servers with dual 8-core CPUs, 128 GB RAM, 15 TB of array-based fiber channel attached storage, 4 x 10 Gbe NICs, 2 x 8 Gb Fiber Channel HBAs, and so on In most cases, you'll have this detail down to the manufacturer of each component as well This physical design should consist of all the infrastructure components that will be required to support the requirements gathered in the
Conceptual Design phase For example, a complete Physical Design might include a vSphere Network Design, Storage Design, Compute Design, Virtual Machine (VM) Design, and Management Design As you can imagine, the documentation for a physical design can get quite large and time consuming, but this is a critical step
In summary, remember that the three design phases are additive and dependent upon one another In order to have a successful Physical Design, both the Conceptual and Logical Designs must be solid Document as much as is reasonable in your designs and even include reasoning behind choices Many choices seem obvious while making them; however, when it is time to review and deploy a design months later, it may not be as obvious as to why certain choices were made
Best practices
We hear the term Best practices everywhere but what does it really mean? Do we
always adhere to best practices? What happens to best practices as technology
evolves, or as your business evolves? These are great questions and ones that many people don't seem to ask There are three scenarios when addressing a design
Trang 21Virtual Data Center Design
[ 8 ]
You may ignore best practices, strictly follow best practices, or employ a combination
of the two The third scenario is the most common and should be your target Why? The simplified answer is that there are certainly times when best practices aren't in fact best for your design A great example is the general best practice for vSphere where the defaults for just about everything in the environment are acceptable For example, take storage multipathing VMware default is for fixed path and if best practice is default, you may consider not modifying this; experience shows that often
you'll see performance gains switching to Round Robin (see VMware KB1011340
for details) While changing this default may not be a best practice, then again, "Best Practice" might not fit your needs in the real world Choose what works for you and the constraints of your environment
Best practices also seem to have a shelf life much longer than intended A favorite example is that the best size for vSphere datastore is 500 GB While this sizing
best practice was certainly true at one time, vSphere has improved how it handles datastore sizing using features such as SCSI Reservation, which enables much
higher VM per datastore density Since vSphere 5.0, we've been able to create 64 TB datastores and the best practice is more about evaluating parameters such as number
of VMs, number of VMDKs per VM, required IOPS, SLAs, fault domains, and
restore capabilities Many a times the backend disk capabilities will determine the ideal datastore size Yet, many customers continue to use 500 GB as their standard datastore size There are many more examples but the bottom line is that we should use best practices as a guideline and then do our homework to determine how to apply them to our designs
Along with the best practices and design framework we've already discussed, it is also important to define and adhere to standards throughout your Virtual Data Center This includes items such as naming conventions for ESXi hosts and VMs as well as
IP address schemes, storage layout, network configurations, and security policies Adhering to these standards will make understanding the design much easier as well
as help as changes are introduced into the environment after deployment
Designing Virtual Data Center
As previously mentioned VMware's Virtual Data Center (vDC) is composed of several key components: vCenter, ESXi hypervisor, CPU, storage, networking, and virtual machines Many datacenters will have additional components but these form the foundation of typical vDC Note that it is possible to operate, albeit in a limited capacity, without vCenter in the management layer We find that it is really the core
of the vDC and enables many features we seem to take for granted, such as vMotion, DRS (Distributed Resource Scheduling), and HA (High Availability)
www.it-ebooks.info
Trang 22Chapter 1
[ 9 ]
Our vDCs will also consist of several constructs that we will leverage in our designs
to provide organization and operability These include the following:
By using these constructs, we can bring consistency into complex designs For
example, by using folders and tags, we can organize VMs, hosts, and other objects so
we can easily find them, report on them, or even perform operations such as backup based on folders or tags
Tags are a new feature in vSphere 5.1 and above that work much
like tagging in other applications It allows us to put arbitrary tags
on objects in our vDC for organizational purposes Folders present a
challenge when we have objects that could belong in multiple folders
(which isn't possible) Many administrators use a combination of folders and tags to both organize their inventory as well manage it
The following screenshot is a specific example of how folders can be used to organize our vDC inventory:
It is important to define a standard folder hierarchy as part of your design and include tags into that standard to help organize and manage your infrastructure
Trang 23Virtual Data Center Design
[ 10 ]
Another facet to our designs will be how we might employ multiple vCenters and how we manage them Many organizations utilize multiple vCenters because they want to manage Production, QA, and Test/Dev, separately Others have vCenters distributed geographically with a vCenter managing each local infrastructure In any case, with vSphere 5.5, multiple vCenter management has become much easier
We have what is called Linked Mode as an option This allows us to provide a
single pane of access to resources connecting to multiple vCenters—think shared user roles/groups and being able to access objects from any vCenters that are linked through a single console However, with the vSphere web client included with vSphere 5.1 and above, we are able to add multiple vCenters to our view in the client So, if your requirements are to have less roles and groups to manage, then
Linked Mode may be the best route If your requirements are to be able to manage
multiple vCenters from a single interface, then the standard vSphere web client
should meet your needs without adding the complexity of Linked Mode.
Next, we have Single sign-on (SSO) This is a new feature introduced in vSphere 5.1
and improved in vSphere 5.5 Previously, SSO had a difficult installation process and some difficult design decisions with respect to HA and multisite configurations With SSO in vSphere 5.5, VMware has consolidated down to a single deployment model,
so we're able to much more easily design HA and multisite configurations because they are integrated by default
Finally, there is the subject of SSL certificates Prior to vSphere 5.0, it was a best practice (and a difficult task) to replace the default self-signed certificates of the vSphere components (vCenter, ESX, and so on) with CA-signed certificates That remains the case, but fortunately, VMware has developed and released the vCenter Certificate Automation tool, which greatly reduces the headaches caused when attempting to manually generate and replace all of those SSL certificates Although not a steadfast requirement, it is certainly recommended to include CA-signed certificates in your designs
VMware vCenter components
There are four primary components to vCenter 5.1 and 5.5:
• Single sign-on: This provides identity management for administrators and
applications that interact with the vSphere platform
• vSphere web client: This is the service that provides the web-based GUI for
administrators to interact with vCenter and the objects it manages
• vCenter inventory service: This acts as a cache for vCenter managed objects
when accessed via the web client to provide better performance and reduce lookups to the vCenter database
www.it-ebooks.info
Trang 24Chapter 1
[ 11 ]
• vCenter server: This is the core service of vCenter, which is required by all
other components
There are also several optional components and support tools, which are as follows:
• vSphere client: This is the "legacy" C#-based client that will no longer be
available in the future versions of vSphere (although still required to interact with vSphere Update Manager)
• vSphere Update Manager (VUM): This is a tool used to deploy upgrades
and patches to ESXi hosts and VMware tools and virtual hardware version updates to VMs
• vSphere ESXi Dump Collector: This is a tool used mostly to configure
stateless ESXi hosts (that is, have no local storage) to dump VMkernel
memory to a network location and then pull those logs back into vCenter
• vSphere Syslog Collector: This is a tool similar to the Dump Collector
that allows ESXi system logs to be redirected to a network location and be accessed via vCenter
• vSphere Auto Deploy: This is a tool that can provision ESXi hosts over the
network and load ESXi directly into memory allowing for stateless hosts and efficient provisioning
• vSphere Authentication Proxy: This is a service that is typically used with
Auto Deploy so that hosts can be joined to a directory service, such as MS Active Directory, without the need to store credentials in a configuration file
At the onset of your design, these components should be accounted for (if used) and integrated into the design For small environments, it is generally acceptable
to combine these services and roles on the vCenter while larger environments will generally want to separate these roles as much as possible to increase reliability
by reducing the fault domain This ensures that the core vCenter services have the resources they need to operate effectively
Choosing a platform for your vCenter server
Now that we've reviewed the components of vCenter, we can start looking at
making some design decisions The first design decision we'll need to make relative
to vCenter is whether you'll have a physical or virtual vCenter VMware's own recommendation is to run vCenter as a VM However, just as with best practices,
we should examine and understand why before making a final decision There are valid reasons for having a physical vCenter and it is acceptable to do However, if
we make the appropriate design decisions, a virtual vCenter can benefit from all the flexibility and manageability of being a VM without high amounts of risk
Trang 25Virtual Data Center Design
[ 12 ]
One way to mitigate risks is to utilize a Management Cluster (sometimes called
a management pod) with dedicated hosts for vCenter, DNS, Active Directory,
monitoring systems, and other management infrastructure This benefits us in a few ways such as being able to more easily locate these types of VMs if vCenter is down Rather than connecting to multiple ESXi hosts via Secure Shell (SSH) or using the vSphere Client to connect to multiple hosts in order to locate a domain controller
to be manually powered on, we're able to identify its exact location within our management pod Using a management cluster is sometimes not an option due to the cost of a management pod, separate vCenter server for management, and the cost of managing a separate cluster In smaller environments, it isn't such a big deal because
we don't have many hosts to go searching around for VMs But, in general,
a dedicated cluster for management infrastructure is recommended
A second way to mitigate risks for vCenter is to use a product called vCenter
Heartbeat VMware vCenter Heartbeat provides automated and manual failover and failback of your vCenter Server and ensures availability at the application and service layer It can restart or restore individual services while monitoring and protecting the Microsoft SQL Server database associated with vCenter Server, even
if it's installed on a separate server For more information on vCenter Heartbeat, you can go to the product page at http://www.vmware.com/products/vcenter-server-heartbeat/
In general, if choosing a virtual vCenter, do everything possible to ensure that the vCenter VM has a high priority with resources and HA restarts An option would
be to set CPU and memory resource shares to High for the vCenter VM Also, do not disable DRS or use host affinity rules for the vCenter VM to try to reduce its movement within the vDC This will negate many of the benefits of having a
virtual vCenter
Using the vCenter server appliance
Traditionally, this decision hasn't really required much thought on our part The vCenter Server Appliance (vCSA or sometimes called vCVA or vCenter Virtual Appliance) prior to vSphere 5.5 was not supported for production use It also had a limit of managing five hosts and 50 VMs However, with vSphere 5.5, vCSA is fully supported in production and can manage 1000 hosts and 10000 VMs See vSphere 5.5 Configuration Maximums at http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-maximums.pdf
www.it-ebooks.info
Trang 26Chapter 1
[ 13 ]
So why would we want to use an appliance versus a full Windows VM? The
appliance is Linux-based and comes with an embedded vPostgres, which will
support up to 100 hosts and 3000 VMs In order to scale higher, you need to use an external Oracle database server Environments can save on Windows and MS SQL licensing by leveraging the vCSA It is also a much simpler deployment There are
a few caveats, however, in that VMware Update Manager (VUM) still requires a separate Windows-based host with an MS SQL database as of vSphere 5.5 If you are running Horizon View, then you'll need a Windows server for the View Composer service and a similar situation exists if you are running vCloud Director The general idea, though, is that the vCSA is easier to manage and is a purpose-built appliance While vCSA may not be for everyone, but I would encourage you to at least consider
it for your designs
Sizing your vCenter server
If you're using the vCSA, this is less of an issue but there are still guidelines for disk space and memory as shown in the following table Note that this only applies to vCSA 5.5 and above as the vCSA prior to 5.5 is only supported with a maximum of five hosts and 50 VMs
VMware vCenter Server
Appliance Hardware Requirement
Disk storage on the host
machine The vCenter Server Appliance requires at least 7 GB of disk space, and is limited to a maximum size of 80 GB The vCenter
Server Appliance can be deployed with thin-provisioned virtual disks that can grow to the maximum size of 80 GB
If the host machine does not have enough free disk space to accommodate the growth of the vCenter Server Appliance virtual disks, vCenter Server might cease operation, and you will not be able to manage your vSphere environment
Memory in the VMware
vCenter Server Appliance • Very small (≤ 10 hosts, ≤ 100 virtual machines): This
has a size of at least 4 GB
• Small (10-100 hosts or 100-1000 virtual machines):
This has a size of at least 8 GB
• Medium (100-400 hosts or 1000-4000 virtual
machines): This has a size of at least 16 GB.
• Large (≥ 400 hosts or 4000 virtual machines): This has
a size of at least 24 GB
Source: VMware KB 2052334
Trang 27Virtual Data Center Design
[ 14 ]
If you are choosing a Windows-based vCenter, you need to size your machine
appropriately based on the size of your environment This includes CPU, memory, disk space, and network speed/connectivity
The number of clusters and VMs within a vCenter can absolutely
affect performance This is due to DRS calculations that must be made
and more objects mean more calculations Take this into account
when estimating resource requirements for your vCenter to ensure
performance and reliability
It is also recommended to adjust the JVM (Java Virtual Machine) heap settings for the vCenter Management Web service (tcServer), Inventory Service, and Profile-driven Storage Service based on the size of deployment Note that this only affects vCenter 5.1 and above
The following VMware KB articles have up-to-date information for your specific vCenter version and should be referenced for vCenter sizing:
vCenter version VMware KB article number
5.0 20037905.1 20212025.5 2052334
Choosing your vCenter database
The vCenter database is the location where all configuration information, along with some performance, task, and event data, is stored As you might imagine, the vCenter DB can be a critical component to your virtual infrastructure But it also can be disposable—although this is becoming a much less adopted strategy than it was a couple of years ago In some environments that weren't utilizing features like the Virtual Distributed Switch (VDS) and didn't have a high reliance on the stored event data, we saw that vCenter could be quickly and easily reinstalled from scratch without much of an impact In today's vDC's, however, we see more widespread use
of VDS and peripheral VMware technologies that make the vCenter DB much more important and critical to keep backed up For example, the VDS configuration is housed in the vCenter DB, so if we were to lose the vCenter DB, it becomes difficult and disruptive to migrate those hosts and their VMs to a new vCenter
www.it-ebooks.info
Trang 28Chapter 1
[ 15 ]
There are two other components that require a DB in vCenter and those are VUM and SSO SSO is a new component introduced in vSphere 5.1 and then completely rewritten in vSphere 5.5 If you haven't deployed 5.1 or 5.5 yet, I'd strongly
encourage you to go to 5.5 when you're ready to upgrade due to this re-architecture
of SSO In vSphere 5.1, the SSO setup requires manual DB configuration with SQL authentication and JDBC SSL, which caused many headaches and many support tickets with VMware support during the upgrade process
We have the following choices for our vCenter DB:
• MS SQL Express
• MS SQL
• IBM DB2
• Oracle
The different versions of those DBs are supported depending on the version
of vCenter you are deploying, so check the VMware Product Interoperability
Matrixes located at http://partnerweb.vmware.com/comp_guide/sim/interop_matrix.php
The MS SQL Express DB is used for small deployments—maximum of five hosts and
50 VMs—and is the easiest to set up and configure Sometimes, this is an option for dev/test environments as well
Next, we have probably the most popular option, that is, MS SQL MS SQL can either be installed on the vCenter itself, but it is recommended to have a separate server dedicated to SQL Again, sometimes in smaller deployments, it is OK to install the SQL DB on the vCenter server, but the general best practice is to have that dedicated database server for better protection, flexibility, and availability With vCenter 5.5, we now have broader support of Windows Failover Clustering (formerly Microsoft Cluster Services or MSCS), which enhances our options for the availability of the vCenter DB
Last, we have Oracle and IBM DB2 These two options are common for organizations that already have these databases in use for other applications Each database
type has its own benefits and features Choose the database that will be the easiest
to manage while providing the features that are required In general, any of the database choices are completely capable of providing the features and
performance of vCenter
Trang 29Virtual Data Center Design
[ 16 ]
vSphere clustering – HA and DRS
Aside from vCenter itself, HA and DRS are two features that help unlock the true potential of virtualization This combination of automatic failover and workload balancing result in more effective and efficient use of resources The mechanics of
HA and DRS are a full topic in and of itself, so we'll focus on how to design with HA and DRS in mind
The following components will be covered in this section:
• Host considerations
• Networking design considerations
• Storage design considerations
• Cluster configuration considerations
• Admission control
Host considerations
Providing high availability to your vSphere environment means making every
reasonable effort to eliminate any single point of failure (SPoF) Within our hosts,
we can make sure we're using redundant power supplies, ECC memory, multiple I/O adapters (such as NICs and HBAs), and using appropriate remote monitoring and alerting tools We can also distribute our hosts across multiple racks or blade chassis to guard against the failure of a rack or chassis bringing down an entire cluster So not only is component redundancy important, but it is also important to look at the physical location
Some other items related to our hosts are using identical hardware Although
not always possible, such as hosts that have different lifecycles, we should make
an effort to be consistent in our host hardware within a cluster Aside from
simplifying configuration and management, hardware consistency reduces resource fragmentation For example, if we have hosts with Intel E5-2640 CPUs and E5-2690 CPUs, the cluster will become unbalanced and result in fragmentation HA prepares for a worst-case scenario and plans for the largest host in a cluster to fail resulting in more resources being reserved for that HA event This results in a lower efficiency in resource utilization
The next consideration with HA is the number of hosts in your design We typically look for an N+1 configuration for our cluster N+1 represents the number of hosts we need to service our workload (N) plus one additional host in case of a failure Some describe this as having a "hot spare", but in the case of vSphere, we're typically using all of the hosts to actively serve workload
www.it-ebooks.info
Trang 30Chapter 1
[ 17 ]
In some cases, it can be acceptable to actually reserve a host
or hosts for failover This is truly a hot spare scenario where those designated hosts sit idle and are used only in the case
of an HA event This is an acceptable practice, but it isn't the most efficient use of resources
VMware vSphere 5.5 supports clusters of up to 32 hosts and it is important to
understand how HA reserves resources based on cluster size HA essentially
reserves an entire hosts' worth of resources and then spreads that reservation across all hosts in the cluster For example, if you have two hosts in a cluster, 50 percent
of the total cluster resources are reserved for HA To put it another way, if one host fails, the other host needs to have enough resource capacity to service the entire workload If you have three hosts in a cluster, then 33 percent of the resources are reserved in the cluster for HA You may have picked up on the pattern that HA reserves )N1) resources for the cluster where N is equal to the number of hosts in the cluster As we increase the number of hosts in a cluster, we see a smaller percentage
of resources reserved for HA that result in higher resource utilization One could argue that larger clusters increase complexity but that is negated by the benefits of higher utilization The important takeaway here is to remember that these reserved resources need to be available to the hosts If they are being used up by VMs and
an HA event occurs, it is possible that VMs will not be restarted automatically and you'll experience downtime
Last, with respect to hosts, is versioning It is absolutely recommended to have all hosts within a cluster at the same major and minor versions of ESXi, along with the same patch level Mixed-host clusters are supported but because there are differences
in HA, DRS, and other features, functional and performance issues can result If different host versions are required, it is recommended to separate those hosts into different clusters, that is, a cluster for 5.0 hosts and a cluster for 5.5 hosts
For a VMware HA percentage Admission Control Calculator see
Trang 31Virtual Data Center Design
as dead and VMs will not start up on that host This can also trigger an isolation response due to the datastore heartbeat In either case, this can be prevented by enabling PortFast on those ports on the connected switches Some switch vendors may call the feature something different, but the idea of PortFast is that STP will allow traffic to pass on those ports while it completes its calculations
HA relies on the management network of the hosts This brings with it several design considerations Each host should be able to communicate with all other hosts' management interfaces within a cluster Most of the time, this is not an issue because our network designs typically have all of our hosts' management VMkernel ports on the same layer 2 subnet, but there are scenarios where this is not the case Therefore,
it is important to make sure all hosts within a cluster can communicate properly to ensure proper HA functionality and prevent network partitions Along these same lines, we should be designing our networks so that we have the fewest possible number of hops between hosts Additional hops, or hardware segments, can cause higher latency and increase points of failure
Because of the criticality of the management VMkernel port for proper HA
functionality, it is imperative that we design in redundancy for this network This means having more than one physical NIC (pNIC) in our hosts assigned to this network If there is a failure in the management network, we still may get host isolation since HA won't have a means to test connectivity between vSphere servers.When standardizing your network configuration, you should use consistent names for objects such as Port Groups, VLANs, and VMkernel ports Inconsistent use
of names across our hosts can result in VMs losing network connectivity after a vMotion or even preventing vMotion or an HA restart This, of course, results in increased downtime
www.it-ebooks.info
Trang 32Chapter 1
[ 19 ]
Storage considerations
HA, as previously mentioned, also relies on datastore heartbeats to determine if an
HA event has occurred Prior to vSphere 5.0, we only had the network to determine whether a host had failed or become isolated Now we can use datastores to aid
in that determination enhancing our HA functionality Datastore connectivity is inherently important—after all, our VMs can't run if they can't connect to their storage, right? So, in the case where we are utilizing a storage array, practices such
as multipathing continue to be important and we need to make sure our hosts have multiple paths to our storage array whether that is iSCSI, NFS, FC, or FCoE
vCenter will automatically choose two datastores to use for heartbeats However,
we do have the capability to override that choice A scenario where this makes sense
is if we have multiple storage arrays It would be recommended to have a heartbeat datastore on two different arrays Note that we can have up to five heartbeat
datastores Otherwise, the algorithm for choosing the heartbeat datastores is fairly robust and normally does not need to be tampered with Another consideration is that if we are using all IP-based storage, we should try to have separate physical network infrastructure to connect to that storage to minimize the risk of the
management network and storage network being disrupted at the same time Last, wherever possible, all hosts in a cluster should have access to the same datastores to maximize the ability of isolated hosts to communicate via the datastore heartbeat
Cluster considerations
Of the many configurations we have within a cluster for HA, Host Isolation
Response seems to be often neglected While we previously talked about how to make our management networks robust and resilient, it is still important to decide how we want our cluster to respond in the event of a failure We have the following three options to choose from for the Host Isolation Response:
• Leave Powered On: VMs will remain running on their host even though the
host has become isolated
• Power Off: VMs will be forcibly shutdown as if pulling the power plugs
from a host
• Shutdown: Using VMware tools, this option attempts to gracefully
shutdown the VM and after a default timeout period of 300 seconds will then force the power off
Trang 33Virtual Data Center Design
[ 20 ]
In many cases, the management network may be disrupted but VMs are not
affected In a properly architected environment, host isolation is a rare occurrence, especially with a fiber channel SAN fabric as datastore heartbeats will be out-of-band of the network If the design requires IP storage, which uses the same network infrastructure as the management network, then the recommended option is to shut down during isolation as it is likely that a disruption in the management network will also affect access to the VM's datastore(s) This ensures that another host can power on the affected VMs
Another HA feature is VM and application monitoring This feature uses VMware tools or an application agent to send heartbeats from the guest OS or application to the
HA agent running on the ESXi host If a consecutive number of heartbeats are lost, HA can be configured to restart the VM The recommendation here is to utilize this feature and decide ahead of time how tolerant of heartbeat failures you wish to be
Admission control
Admission Control (AC) is another great feature of HA Because AC can prevent
VMs from being powered on during a normal operation or restarted during an HA event, many administrators tend to turn AC off HA uses AC to ensure that sufficient resources exist in a cluster and are reserved for VM recovery during an HA event
In other words, without AC enabled, you can overcommit your hosts to the point that if there was to be an HA event, you may find yourself with some VMs that are unable to power on Let's take another look at a previous example where we have three hosts in a cluster HA reserves 33 percent of that cluster for failover Without
AC enabled, we can still utilize that 33 percent reserve which puts us at risk With
AC, we are unable to utilize those reserved resources, which mitigates that risk of overprovisioning and ensures we'll have enough resources to power on our VMs during an HA event When Admission Control is enabled, we can choose from the following three policies:
• Host Failures Cluster Tolerates (default): This reserves resources based on
the number of hosts for which we want to tolerate failures For example, in a
10 host cluster, we could want to tolerate two host failures, so instead of the default one host failures the cluster tolerates (remember the N)1) formula), we would be allowing two host failures to tolerate in the cluster for failover
HA actually uses something called a "slot size" when determining
resource reservations Slot sizes are calculated on a worst-case basis;
thus, if you have VMs with differing CPU and memory reservations, the slot size will be calculated from the VM with the highest reservations
This could greatly skew the slot size and result in more resources being
reserved for failover than necessary
www.it-ebooks.info
Trang 34Chapter 1
[ 21 ]
• Percentage of Cluster Resources Reserved: This reserves the specified
percentage of CPU and memory resources for failover If you have a cluster where VMs have significantly different CPU and memory reservations
or hosts with different CPU and memory capacities, this policy is
recommended This setting must be revisited as the cluster grows or shrinks
• Specify a Failover Host: This designates a specific host or hosts as a
hot spare
The general recommendation is to use the percentage of cluster resources policy for Admission Control because it offers maximum flexibility Design the percentage reserved to represent the number of host failures to be tolerated For example, in a four host cluster, if you want to tolerate one host failure your percentage of failures tolerated is 25 percent, and if you wanted to tolerate two host failures it is 50 percent
If there will be hosts in the cluster of different sizes, be sure to use the largest host(s)
as your reference for determining the percentage to reserve Similarly, if the Specify
a Failover Host policy is used, we must specify hosts that have CPU and memory equal to or larger than the largest nonfailover host in the cluster
If the Host Failures Cluster Tolerates policy is to be used, then make every
reasonable attempt to use similar VM resource reservations across the cluster This is due to the slot size calculation mentioned earlier
Summary
In this chapter, we learned about the design characteristics of the Virtual Data Center and the core components of the VMware vSphere platform We also discussed best practices and design considerations specifically around vCenter, Clustering, and HA
In the next chapter, we'll dive into the best practices of design of the ESXi hypervisor
Trang 36Hypervisor Design
This chapter begins a series of chapters regarding the architecture of specific virtual datacenter components The hypervisor is a critical component and this chapter will examine various best practices for hypervisor design
We will cover the following topics in this chapter:
• ESXi hardware design including CPU, memory, and NICs
• NUMA, vNUMA, and JVM considerations
• Hypervisor storage components
• Stateless host design
• Scale-Up and Scale-Out designs
ESXi hardware design
Over the past 15 years we've seen a paradigm shift in server technology from
mainframe and proprietary systems such as PowerPC, Alpha, and Itanium to
commodity x86 architecture This shift, in conjunction with x86 virtualization, has yielded tremendous benefits and has allowed us both flexibility and
standardization with our hardware designs If we take a look at the vSphere
Hardware Compatibility List (HCL), we only see two CPU manufacturers: Intel
and AMD (Advanced Micro Devices) The VMware HCL can be found at
http://www.vmware.com/resources/compatibility/search.php
Between these two manufacturers, however, there are a wide range of CPUs
to choose from and VMware does a great job of supporting new CPUs as they are released and developing new technologies that further enhance the capabilities
of x86 virtualization
Trang 37mentioned in Chapter 1, Virtual Data Center Design, standardization goes a long
way and our host designs are no different However, there are certainly instances
in which we may choose different CPUs for our hosts It remains a best practice
to keep the same CPU model consistent throughout a cluster If we find ourselves
in a situation where we need different CPU models within a cluster, such as
adding new hosts to an existing cluster, we have a great feature called Enhanced
vMotion Compatibility (EVC) As long as we're running CPUs from the same chip
manufacturer (that is all Intel or all AMD), we can mask CPU features to a lowest common denominator to ensure vMotions can occur between different chip models For example, without EVC, we could not vMotion a VM from a host with an Intel Xeon 5500-series CPU to a host with the latest Intel Xeon E5-2600 CPU With EVC this is completely possible and works very well For more information on the EVC requirements, see VMware KB 1003212
CPUs have many parameters to look at when we're choosing the best one(s) for our host design The following parameters are the most common factors in determining the appropriate CPU for a host:
• Cores/threads
• Clock speed
• Cache
• Power draw (watts)
• vSphere feature support
• Memory support
• Max CPUs supported (2-way, 4-way, 6-way, and so on)
Each of these parameters can be integral to a design while we may not use the others
We still need to consider the effects each one may have on our designs Provided we have a good conceptual design, and thus proper requirements, we should be able
to determine how many CPU resources we will need If we have a requirement for eight cores per socket or 12 GHz CPU per host or maybe more commonly a density
of 60 VMs per host is specified We can then profile the workload to determine the proper range of CPUs to use A range is used because cost is typically always an influencing factor Initially, we may come up with a range and then research cost to narrow down our selection
www.it-ebooks.info
Trang 38Chapter 2
[ 25 ]
Continuing with the example where we require 60 VMs per host, there is an
important metric to look at when considering consolidation ratios: the ratio of vCPUs
to pCPUs denoted as vCPU:pCPU A vCPU is a virtual CPU that is presented to a
VM while the pCPU represents a physical CPU in the host system Furthermore, with technologies such as hyperthreading, pCPU can represent a logical processor For instance, an Intel Xeon E5-2640 CPU has eight cores and hyperthreading enabled, which yields 16 pCPUs because each core can execute two threads Each thread represents a logical processor (pCPU) and thus 8 x 2 = 16 logical processors or 16 pCPUs VMware Best Practices for Performance with Hyper-threading can be read about in more detail at http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
Sometimes we have project requirements that allow us to start with a specific goal of
an overall vCPU:pCPU ratio or specific ratios for different parts of the infrastructure From there, we can choose a CPU and server platform that can accommodate those requirements Let's get back to that 60 VMs per host example, and for this we will assume that these are all general purpose VMs with no special requirements and
an average of 1.5 vCPUs per VM because of a mixture of one vCPU and two vCPU machines Our design targets a vCPU: pCPU ratio of 2:1 The math is as follows: 1.5 x
60 = 90 vCPUs Therefore, we'll require at least 45 pCPUs per host to achieve our 2:1 target (90:45) An eight-socket server with six-core CPUs would yield 48 pCPUs and would accommodate the needs for this design A quad socket configuration with 12-core CPUs would also yield 48 pCPUs There are many ways to make the math work and other considerations such as using high availability configurations,
additional capacity for growth that we will discuss as we continue our design
For general purpose predictable workloads that don't have special requirements (such as SQL, Exchange, SharePoint, and VDI), we typically see a vCPU:pCPU ratio
of 2:1 to 4:1 inclusive with higher ratios for test and development environments But this is certainly a situation where your mileage may vary, so you need to know your applications and workload in order to intelligently choose the appropriate ratio for your environment
Some of the other factors, such as cache and clock speed, are a bit more
straightforward More cache and more speed equal higher performance in most cases If you are power conscious, then you'll want to take into account the power draw of each CPU (wattage) Power consumption modeling is often used to
determine the TCO (total cost of ownership) of systems in a datacenter, especially when using co-location facilities where you are charged for power consumption Unless we're specifying refurbished or otherwise dated CPUs, new ones are typically going to support all of VMware's feature set If you are using older equipment, it is vital to review the HCL to validate that your infrastructure is supported
Trang 39Hypervisor Design
[ 26 ]
Memory and NUMA considerations
While CPUs provide the raw processing power for our VMs, memory provides the capacity for our applications and workload to run
Also important are High Availability (HA) admission controls If strict admission
controls are enabled, then guests that violate HA controls won't come back online after a failure Instead, by employing relaxed admission controls, you'll see
performance degradation but won't have to deal with the pain of HA causing VMs
to power off and not come back By considering the speed, quantity, and density
of our host-based RAM, we're able to avoid those issues and provide predictable performance to our virtual infrastructure This means that your design depends on sizing memory and calculating HA minimums to ensure you don't cause problems
At the highest level, the total amount of host memory is pretty straightforward As
we take into consideration our workload and application requirements, we can start
by simply dividing the total amount of RAM required by the number of hosts to get the amount of memory we want in each host Just as before, there are tools to help
us determine the amount of memory we need VMware Capacity Planner is one of those tools and is available to VMware partners VMware has some free flings such
as vBenchmark and ESX System Analyzer Other tools such as Dell Quest's vFoglight and SolarWinds Virtualization Manager are commercially available and can provide detailed reporting and recommendations for properly sizing your virtual environment
In most cases we're designing an N+1 architecture, we'll then add an extra host with that amount of RAM into the cluster For example, if our overall workload requires one TB of RAM, we may design for four hosts each with 256 GB of RAM To achieve the proper failover capacity required for HA and our N+1 architecture, we'll add a fifth host for a total of 1.25 TB
NUMA (non-uniform memory access) is a feature where the system memory (RAM)
is divided up into NUMA "nodes" These nodes are then tied to a pCPU (remember, this also could be a CPU core) and represent the set of RAM (or address space) that can be accessed quickest by a given pCPU
www.it-ebooks.info
Trang 40Chapter 2
[ 27 ]
Each NUMA node is situated the closest to its associated pCPU and results in a
reduced time to access that memory This reduces memory access contention as only
a single pCPU can access the memory space at a time pCPUs may simultaneously access their NUMA nodes resulting in dramatically increased performance over non-NUMA, or UMA systems The following diagram shows the NUMA architecture:
access
3 4
3 4
NUMA Architecture
Some server manufacturers include settings in the server's BIOS, which will impact NUMA One such setting is called Node Interleaving
and should be disabled if we want to take advantage of NUMA
optimizations For more information, have a look at the White Paper
from VMware at https://www.vmware.com/files/pdf/
an increase in memory access times because the VM will span NUMA nodes and will need to access "remote memory" or memory outside of its NUMA node