The fourth task is to successfully move a system board between domains without crashing the Solaris OEs running on each domain.
Note –Do not perform this task if you were directed by your instructor to combine your domain with another student group in Module 5,
“Configuring Sun Fire HES Domains.” Instead, perform the steps in
‘‘Task 4a – Moving Physical Resources Between Domains’’ on page 6-55.
Perform the following steps:
1. Log in to the system controller.
2. Open an additional window on the administration workstation and log in to the domain that will receive the system board.
3. Boot the Solaris OS on each domain.
4. Verify that the ACL for each domain contains the board (slot) to be moved.
If the board is not listed in the domain’s ACL, enter it at this time.
5. Logically disconnect the system board from its current domain.
6. Verify that the disconnect operation completed successfully.
7. Configure the newly available system board into the destination domain.
8. Verify that the disconnect operation completed successfully.
9. Use the appropriate command in the domain to verify that the processors are available to the Solaris OS running on the destination domain.
Task 4a – Moving Physical Resources Between Domains
This task involves successfully reconfiguring your domain by returning (moving) the two system board sets you added in Module 5, “Configuring Sun Fire HES Domains” to their original domain without crashing the Solaris OEs running on the current domain.
Note – Perform this lab only if you were directed by your instructor to combine your domain with another student group in Module 5,
“Configuring Sun Fire HES Domains.”
Perform the following steps:
1. Log in to the system controller.
2. Open an additional window on the administration workstation and log into the domain that will receive the system board.
3. If necessary, boot the Solaris OS in your domain.
4. Verify that the ACL for each domain contains the board (slot) to be moved.
If the board is not listed in the domain’s ACL, enter it at this time.
5. Logically disconnect the system board from its current domain.
6. Verify that the disconnect operation completed successfully.
7. Assign the newly available system board into the original domain as specified in Module 5, “Configuring Sun Fire HES Domains.”
8. Initialize and boot the new domain.
9. Verify that the new domain’s devices are functioning properly.
Exercise Summary
?
!
Discussion – Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercise.
Manage the discussion based on the time allowed for this module, which was provided in the “About This Course” module. If you do not have time to spend on discussion, highlight just the key concepts students should have learned from the lab exercise.
● Experiences
Ask students what their overall experiences with this exercise have been. Go over any trouble spots or especially confusing areas at this time.
● Interpretations
Ask students to interpret what they observed during any aspect of this exercise.
● Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
● Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
Exercise Solutions
Following are task solutions for this exercise.
Task 1 – Replacing a System Board
The first task is to successfully remove and install a system board without halting the Solaris OS.
Perform the following steps:
1. Log in to the system controller on which you will perform DR operations.
2. Boot the Solaris OS in your domain.
3. Using the appropriate DR command, check the status of the system board identified by your instructor at the beginning of this exercise.
Write your findings in the following spaces.
Ap_Id SB0
Receptacle Connected Occupant Configured Condition OK
Information Powered When Feb 24 10:30 Type CPU
Busy n
Phys_Id /devices/pseudo/dr@0:SB0
4. List the issues that must be considered before performing DR operations on the system board.
Quiescence
Permanent memory Vital system resources Status LEDs
How can you determine if the system board contains permanent memory?
Run thercfgadm -d Dom_ID -av Ap_IDcommand. Look for the system board that haspermanentin the information field.
5. Using the appropriate commands, logically disconnect the system board from the domain.
Did the DR operation complete successfully? Yes If not, contact your instructor if you need help.
6. Recheck the status of the system board.
Write your findings in the following spaces.
The answers should be something, such as:
AP_ID SB0
ReceptacleDisconnected OccupantUnconfigured ConditionUnknown
Based on the status, can the system board be physically removed from the domain?Yes
7. Visually observe the status LEDs on the system board.
Based on this observation, can the system board be physically removed from the domain?Yes
The LEDs should be in the following state:
Activated: Off Fault: Off
Removal OK:On
8. If the status from Steps 6 and 7 indicate that the system board can be physically extracted, remove the board from the system.
Did the system board removal crash the Solaris OS? No If so, contact your instructor if you need help.
9. Reinstall the system board into the system.
10. Check the status of the newly installed system board.
Write your findings in the following spaces.
The answers should be something, such as:
AP_ID SB0
ReceptacleDisconnected OccupantUnconfigured ConditionUnknown
11. Using the appropriate commands, logically configure the system board in the domain.
Did the DR operation complete successfully?Yes rcfgadm -d Dom_ID -c connect SB0
rcfgadm -d Dom_ID -c configure SB0
12. Recheck the status of the newly configured system board.
Write your findings in the following spaces.
The answers should be something, such as:
AP_ID SB0
ReceptacleConnected OccupantConfigured ConditionOK
13. Use the appropriate command in the domain to verify that the processors are available to the Solaris OS.
# psrinfo
Task 2 – Replacing an I/O Board
The second task is to successfully remove and install an I/O board without halting the Solaris OS. Perform the following steps:
1. Log in to the system controller on which you will perform DR operations.
2. If necessary, boot the Solaris OS in your domain.
3. Using the appropriate DR command, check the status of the I/O board identified by your instructor at the beginning of this exercise.
Write your findings in the following spaces.
The answers should be something, such as:
AP_ID IO0
ReceptacleConnected OccupantConfigured ConditionOK
InformationPowered WhenFeb 24 10:41 Typeio
Busyn
Phys_Id /devices/pseudo/dr@0:IO0
4. List the issues that must be considered before performing DR operations on the I/O board.
Suspend-safe and suspend-unsafe devices Quiescence
Hot-plug hardware Vital system resources Multipathed I/O I/O board test Status LEDs
5. Using the appropriate commands, logically disconnect the I/O board from the domain.
rcfgadm -d Dom_ID -c unconfigure IO0 rcfgadm -d Dom_ID -c disconnect IO0 Did the DR operation complete successfully? Yes If not, contact your instructor if you need help.
6. Recheck the status of the I/O board.
Write your findings in the following spaces.
The answers should be something, such as:
AP_ID IO0
ReceptacleDisconnected OccupantUnconfigured ConditionOK
Based on the status, can the I/O board be physically removed from the domain?Yes
7. Visually observe the status LEDs on the I/O board.
Based on this observation, can the I/O board be physically removed from the domain?Yes
The LEDs should be in the following state:
Activated: Off Fault: Off
Removal OK:On
This is the proper state for I/O board removal.
8. If the status from Steps 6 and 7 indicates that the I/O board can be physically extracted, remove the board from the system.
Did the I/O board removal crash the Solaris OS?No If so, contact your instructor if you need help.
9. Reinstall the I/O board into the system.
10. Check the status of the newly installed I/O board.
Write your findings in the following spaces.
The answers should be something, such as:
AP_ID IO0
ReceptacleDisconnected OccupantUnconfigured ConditionUnknown
11. Using the appropriate commands, logically configure the I/O board in the domain.
Did the DR operation complete successfully?Yes 12. Recheck the status of the newly configured I/O board.
Write your findings in the following spaces.
The answers should be something, such as:
AP_ID IO0
ReceptacleConnected OccupantConfigured ConditionOK
Task 3 – Replacing an I/O Card
The third task is to successfully remove and install an I/O card without halting the Solaris OS. Perform the following steps:
1. Log in to the system controller on which you will perform DR operations.
2. If necessary, boot the Solaris OS in your domain.
3. Verify that you have a disk controller card under Sun StorEdge Traffic Manager software or DMP control.
Use theformatandluxadm displaycommands. For DMP, also try the ls -l /dev/vx/dmpcommand.
4. If the disk controller card is under DMP control, issue the command to disable physical traffic through this path.
vxdmpadm disable ctlr=c1
5. Using the appropriate DR command, check the status of the I/O card. Write your findings in the following spaces.
Ap_IdThe answer varies ReceptacleThe answer varies OccupantThe answer varies ConditionThe answer varies InformationThe answer varies WhenThe answer varies
TypeThe answer varies BusyThe answer varies Phys_IdThe answer varies
Look for entries likee05b1slot3in the output of thecfgadmorrcfgadm command.
6. Using the appropriate commands, logically disconnect the I/O card from the domain.
cfgadm -c disconnect AP_ID(as previously).
Did the DR operation complete successfully? Yes If not, contact your instructor if you need help.
7. Recheck the status of the I/O card.
Write your findings in the following spaces.
Ap_IdThe answer varies ReceptacleThe answer varies OccupantThe answer varies ConditionThe answer varies
Based on the status, can the I/O card be physically removed from the domain?Yes
8. Visually observe the status LEDs on the I/O card.
Based on this observation, can the I/O card be physically removed from the domain?Yes
9. If the status from Steps 7 and 8 indicates that the I/O card can be physically extracted, remove the card from the system.
Did the I/O card removal crash the Solaris OS?No If so, contact your instructor if you need help.
10. Reinstall the I/O card into the system.
11. Check the status of the newly installed I/O card.
Write your findings in the following spaces.
Ap_IdThe answer varies ReceptacleThe answer varies OccupantThe answer varies ConditionThe answer varies
12. Using the appropriate commands, logically configure the I/O card in the domain.
cfgadm -c configure e05b1slot 3(for example).
Did the DR operation complete successfully?Yes 13. Recheck the status of the newly configured I/O card.
Write your findings in the following spaces.
Ap_IdThe answer varies ReceptacleThe answer varies OccupantThe answer varies ConditionThe answer varies
14. If necessary, re-enable DMP traffic on the disk controller card.
vxdmpadm enable ctlr=c1
Exploring Sun Fire HES
Capacity-on-Demand Version 2.0
Objectives
Upon completion of this module, you should be able to:
● Configure the platform for Capacity-on-Demand (COD) version 2.0 software
● Identify COD systems
● Identify COD part number and license requirements
● Manage COD right-to-use (RTU) licenses
● Identify COD SMS commands
● Install and remove a COD RTU license key
● Enable COD resources
● Monitor COD resources
● Service COD V2.0 FRUs
Relevance
Present the following questions to inspire the students and get them thinking about the issues and topics presented in this module. While they are not expected to know the answers to these questions, the answers should be of interest to them and inspire them to learn the material described in this module.
?
!
Discussion –In this module, you learn how to configure and manage Sun Fire HES to support COD. The following questions are relevant to
understanding what this module is all about:
● What is COD?
● What commands are available to configure and support COD?
● How is each platform and domain command used to configure COD?
Additional Resources
Additional resources – The following references provide additional details on the topics described in this module:
● Sun Microsystems, Inc. Sun Fire™ 15K/12K Software Overview Guide, part number 817-3075-10.
● Sun Microsystems, Inc. System Management Services 1.4 Reference Manual, part number 817-3057-10.
● Sun Microsystems, Inc. System Management Services 1.4 Administrator Guide, part number 817-3056-10.
● Sun Microsystems, Inc. System Management Services 1.4 Installation Guide and Release Notes, part number 817-3055-10.
Note – The examples used in the course are for training purposes only, and while accurate at the time of the development of the course, they might become outdated. Always reference the documents listed in the
“Additional Resources” section of this course for the most current information.
Exploring COD Version 2.0
The capacity-on-demand (COD) feature provides CPU/memory licensing for Sun Fire HES. The COD option provides additional CPU resources on COD-enabled CPU/memory boards that are installed in your system.
Although your Sun Fire HES COD comes configured with a minimum number of standard (active) CPU/memory boards, your system can have a mix of both standard and COD-enabled CPU/memory boards installed, using the maximum capacity allowed for the system. At least one active CPU is required for each domain in the system.
Identifying COD Systems
Sun Fire servers can be purchased as a non-COD system with full purchased resource capacity or as a COD system. COD systems are configured with licensed (enabled) and unlicensed (disabled) CPUs and memory capacity. The customer purchases a quantity of right-to-use (RTU) licenses for these systems. As system load requirements increase, additional licenses can be purchased to enable additional CPU/memory resources in single CPU increments.
COD also provides the ability to instantly turn on CPUs and associated memory when more CPUs are needed, thereby dynamically adding additional capacity to the system. If a CPU fails, the COD system allows you allocate an available CPU to reduce the capacity and performance downtime.
Understanding COD Part Number and License Requirements
A COD system has a different part number from a non-COD system. The system controller software incorporates a set of commands that enables you to install and monitor RTU licenses.
URL Resources –For instructions on contacting the Sun License Center, refer to the COD RTU License Certificate that you received, or check the Sun License Center Web site:
http://www.sun.com/licensing.
The Sun License Center sends an email message containing the RTU license key for the COD resources that were purchased.
Additional COD system requirements are:
● Purchasing a service contract for every COD system.
● Having an existing CPU RTU license for all of the CPUs on a COD system. This means all the CPUs in use must be licensed.
● Install the same version of the Sun Fire HES SMS firmware (1.3) on both the main and spare SC.
Caution – Versions previous to Sun Fire HES firmware SMS version 1.3 will not recognize COD CPU/memory boards.
Managing COD RTU Licenses
COD license key information is always associated with a particular system. You may encounter invalid COD RTU licenses if you do any of the following:
● Move a SC’s boot disk from one system to another.
● Copy the platform and domain configuration files, generated by the smsbackupcommand, from one system to another, and restore the configuration files on the second system by running thesmsrestore command.
● Migrate RTU licenses issued from one COD system to another COD system. This is because RTU licenses are generated from the system serial number located on the centerplane.
● Move COD enabled CPU/memory boards from a COD system to a non-COD system.
● Enable a CPU/memory board to a COD CPU/memory board in a non-COD system.
How COD Allocates Resources
The COD RTU licenses are allocated to the CPUs on a first-come, first-serve basis. However, you can allocate a specific quantity of RTU licenses to a particular domain by using the setupplatformcommand
If there is an insufficient number of COD RTU licenses and a license cannot be allocated to a COD CPU, the COD CPU is not configured into the domain and is considered as unlicensed. A COD CPU is considered to be unused when it is assigned to a domain, but the CPU is not active.
When you activate a domain containing a COD CPU/memory board or when a COD CPU/memory board is connected to a domain through a DR operation, the following occurs automatically:
● The system checks the current COD RTU licenses installed.
● The system obtains a COD RTU license (from the license pool) for each CPU on the COD board.
Understanding COD RTU Licenses and UltraSPARC IV
For UltraSPARC IV system boards each COD license will be for an entire UltraSPARC IV CPU module. This will include both cores on the module, so from the point of view of the Solaris OS each COD license gives you two processors.
Understanding Instant Access CPUs and Headroom
Instant Access is a feature that allows you to temporarily use up to eight CPUs on COD boardswithout having a license. The purpose of instant access is to:
● Allow you to immediately use CPUs for which a COD license has been requested but not yet received. Once the licenses are received, you can enter the licenses and keep using the same CPUs as normal, licensed CPUs rather than instant access CPUs without any
interruption.
● Allow you to use unlicensed CPUs on a COD board as hot spares for CPUs on non-COD boards.
The number of instant access CPUs that are allowed is also known as headroom.
Identifying COD SMS Commands
The COD software consists of the following system controller commands:
● codd– The COD daemon is a process that runs on the main SC. (See
“Managing thecoddDaemon on the Sun Fire HES SC” on page 7-8 for more details).
● addcodlicense– Allows you to add COD licenses.
● deletecodlicense– Allows you to remove (delete) COD licenses.
● showcodlicense– Displays all COD licenses stored in the license database for the system.
● showcodusage– Allows you to view the usage of current COD licensed resources.
● setupplatform -p cod– Enables or disables instant access CPUs (headroom)and allocates domain COD RTU licenses.
● setupplatform -p cod headroom-number– Enables or disables instant access CPUs (headroom).
Note – Headroom is limited to four processors on the Sun Fire
6800/4810/4800/3800 server, and eight processors on a Sun Fire HES.
● setupplatform -p cod -d domain-id RTU-number– Reserves a specific quantity of COD RTU licenses for a particular domain.
● showplatform -p cod– Displays Chassis HostID, Headroom Quantity, and allocated domain COD RTU licenses.
Managing the codd Daemon on the Sun Fire HES SC
The COD daemon (codd) is started automatically by thessddaemon. If thecodddaemon terminates, it is restarted automatically. Do not
manually start this daemon from the command line.
The codddaemon does the following:
● Monitors the COD resources being used and verifies that the resources used are in agreement with the COD RTU licenses in the COD license database file. It also logs any warning messages.
● Provides information on installed licenses, resource use, and board status.
● Handles the requests to add or delete COD RTU license keys.
● Configures headroom and COD RTU licenses reserved for domains.
The codddaemon releases COD RTU licenses when the following events occur:
● A COD CPU board is powered off
● A domain virtual keyswitch state changes fromon/secureto standby/off