CHAPTER 4 Understanding Reference Types and Value TypesAs its name suggests, a reference type has a value that is a reference to an object inmemory.. The Unified Type SystemSystem.Object
Trang 1This page intentionally left blank
Trang 2CHAPTER 4
Understanding Reference Types and
Value Types
IN THIS CHAPTER
A Quick Introduction toReference Types and ValueTypes
The Unified Type System
Reference Type and Value TypeMemory Allocation
Reference Type and Value TypeAssignment
More Differences BetweenReference Types and ValueTypes
C# and NET Framework Types
Nullable Types
Deep in the course of coding, you’re often immersed in
logic, solving the problem at hand Simple actions, such as
assignment and instantiation, are tasks you perform
regu-larly, without much thought However, when writing C#
programs, or using any language that targets the Common
Language Runtime (CLR), you might want to take a second
look What appears to be simple can sometimes result in
hard-to-find bugs This chapter goes into greater depth on
CLR types and shows you a few things about coding in C#
that often catch developers off guard More specifically,
you learn about the differences between reference types
and value types
The NET type system, which C# is built upon, is divided
into reference types and value types You’ll work with each
of these types all the time, and it’s important to know the
differences between them This chapter shows you the
differences via memory allocation and assignment
behav-iors This understanding should translate into helping you
make smart design decisions that improve application
performance and reduce errors
A Quick Introduction to Reference
Types and Value Types
There is much to be said about reference types and value
types, but this section gives a quick introduction to the
essentials You learn a little about their behaviors and what
they look like in code
Trang 3CHAPTER 4 Understanding Reference Types and Value Types
As its name suggests, a reference type has a value that is a reference to an object inmemory However, a value type has a value that contains the object itself
Up until now, you’ve been creating custom reference types, which is defined with the
classkeyword shown here:
Thestructkeyword classifies the Moneytype as a value type
In both of these examples, I used the publicmodifier on the NameandAmountfields Thisallows code using the CustomerandMoneytypes to access the NameandAmountfields,respectively You can learn more about the different access modifiers in Chapter 9,
“Implementing Object-Oriented Principles.”
Later sections of this chapter go into even greater depth on the differences between thesetypes, but at least you now know the bare minimum to move forward The next sectionstarts your journey into understanding the differences between reference types and valuetypes and how these differences affect you
The Unified Type System
Before looking at the specific behaviors of reference types and value types, you shouldunderstand the relationship between them and how the C# type system, the Unified TypeSystem, works The details described here help you understand the coding practices andperformance issues that you learn later in the chapter
How the Unified Type System Works
Essentially, the Unified Type System ensures that all C# types derive from a commonancestor, System.Object The C# object type is an alias for System.Object, and furtherdiscussion will use a C# perspective and refer to System.Objectasobject Figure 4.1 illus-trates this relationship
Trang 4The Unified Type System
System.Object
Reference Type System.ValueType
Reference Type Value Type
FIGURE 4.1 In the Unified Type System, all objects derive from the object type
WHAT IS INHERITANCE
Inheritance is an object-oriented principle that promotes reuse and helps build
hierar-chical frameworks of objects In the context of this chapter, you learn that all types
derive from object This gives you the ability to assign a derived object to a variable of
type object Also, whatever belongs to object is also a member of a derived class
In Chapter 8, “Designing Objects,” and Chapter 9, “Implementing Object-Oriented
Principles,” you can learn a lot more about C# syntax supporting inheritance and how
to use it Throughout the rest of the book, too, you’ll see many examples of how to use
inheritance
In Figure 4.1, the arrows are Unified Modeling Language (UML) generalization symbols,showing how one type, a box, derives from the type being pointed to The direction of
inheritance shows that all types derive either directly or indirectly from object
Reference types can either derive directly from System.Objector from another referencetype However, the relationship between value type objects and objectis indirect All
value types implicitly derive from the System.ValueTypeclass, a reference type object,
which inherits object For simplicity, further discussion omits the fact of either explicit orimplicit inheritance relationships
At this point, you might be scratching your head and wondering why you should care (anatural reaction) The big deal is that your coding experience with treating types in a
generic manner is simplified (the good news), but you must also be aware of performancepenalties that are possible when treating types in a generic manner In Chapter 17,
“Parameterizing Type with Generics and Writing Iterators,” you can learn about the bestway to manage generic code, but the next two sections explain the implications of the
Unified Type System and how it affects you
Trang 5CHAPTER 4 Understanding Reference Types and Value Types
Using object for Generic Programming
Because both reference types and value types inherit object, you can assign any type to avariable of type object as shown here:
decimal amount = 3.50m;
object obj1 = amount;
Customer cust = new Customer();
object obj2 = cust;
Theamountvariable is a decimal, a value type, and the custvariable is a Customerclass, areference type
Any assignment to object is an implicit conversion, which is always safe However, doing
an assignment from type object to a derived type may or may not be safe C# forces you
to state your intention with a cast operator, as shown here:
Customer cust2 = (Customer)obj2;
The cast operator is necessary because the C# compiler can’t tell whether obj2is actually a
Customertype Chapter 10, “Coding Methods and Custom Operators,” goes into greaterdepth on conversions, but the basic idea is that C# is type-safe and has features thatensure safe assignments of one object to another
A more concrete example of when you might see a situation where a variable can beassigned to another variable of type object is with standard collection classes The firstversion of the NET Framework Class Library (FCL) included a library of collection classes,one of them being ArrayList These collections offered many conveniences that youdon’t have in C# arrays or would have to create yourself
One of the features of these collections, including ArrayList, was that they could workgenerically with any type The Unified Type System makes this possible because the collec-tions operate on the object type, meaning that you can use them with any NET type.Here’s an example that uses an ArrayListcollection:
ArrayList customers = new ArrayList();
Customer cust1 = new Customer();
cust1.Name = “John Smith”;
Customer cust2 = new Customer();
cust2.Name = “Jane Doe”;
Trang 6The Unified Type System
customers ArrayList Notice that the foreachloop works seamlessly with collections aswell as it does with arrays
Again, because the ArrayListoperates on type object, it is convenient to use with any
type, whether it is a reference type or value type The preceding example showed you how
to assign a reference type, the Customerclass, to an ArrayList, which is convenient
However, there is a hidden cost when assigning value types to object type variables, such
as the elements of an ArrayList The next section explains this phenomenon, which isknown as boxing and unboxing
Performance Implications of Boxing and Unboxing
Boxing occurs when you assign a value type variable to a variable of type object
Unboxing occurs when you assign a variable of type object to a variable with the same
type as the true type of the object The following code is a minimal example that causesboxing and unboxing to occur:
decimal amountIn = 3.50m;
object obj = amountIn; // box
decimal amountOut = (decimal)obj; // unbox
Figures 4.2 to 4.4 illustrate what is happening in the preceding algorithm Figure 4.2
shows the first line
Before boxing, as in the declaration ofamountIn, the variable is just a value type that
contains the data directly However, as soon as you assign that value type variable to anobject, as in Figure 4.3, the value is boxed
Managed Heap
amountIn
FIGURE 4.2 A value type variable before boxing
Trang 7CHAPTER 4 Understanding Reference Types and Value Types
(boxed amountIn)
FIGURE 4.4 Unboxing a value
As shown in Figure 4.3, boxing causes a new object to be allocated on the heap and acopy of the original value to be put into the boxed object Now, you have two copies ofthe original variable: one in amountInand another in the boxed decimal, obj, on theheap You can pull that value out of the boxed decimal, as shown in Figure 4.4
In Figure 4.4, the boxed value in objis copied into the decimal variable, amountOut Now,you have three copies of the original value that was assigned to amountIn
Writing code as shown here is pointless because the specific example doesn’t do
anything useful However, the point of this boxing and unboxing exercise is so that youcan see the mechanics of what is happening and understand the overhead associated with
it On the other hand, you could write a lot of code similar to the ArrayListexample inthe previous section; that is, unless you understood the information in this section Here’s
Trang 8Reference Type and Value Type Memory Allocation
Because of the Unified Type System, this code is as convenient as the code written for the
Customerclass, but beware If the prices ArrayListheld 10, 20, or 100 decimal type ables, you probably wouldn’t care However, what if it contains 10,000 or 100,000? In thatcase, you should be concerned because this could have a serious impact on the perfor-
vari-mance of your application
Generally, any time you assign a value type to any object variable, whether a collection or
a method parameter, take a second look to see whether there is potential for performanceproblems In development, you might not notice any performance problem; after deploy-ment to production, however, you could get slammed by a slow application with a hard-to-find bug
From the perspective of collections, you have two choices: arrays or generics You can
learn more about arrays in Chapter 6, “Using Arrays and Enums.” If you are programming
in C# 1.0, your only choices will be arrays or collections, and you’ll have to design withtradeoffs between convenience and performance, or type safety and no type safety If
you’re using C# 2.0 or later, you can have the best of both worlds, performance and typesafety, by using generics, which you can learn more about in Chapter 17
Now that you know the performance characteristics of boxing and unboxing, let’s dig alittle deeper The next sections tell you more about what reference types and value typesare, their differences, and what you need to know
Reference Type and Value Type Memory Allocation
Reference type and value type objects are allocated differently in memory This can affectyour code in the area of method call parameters and is the basis for understanding assign-ment behavior in the next section This section takes a quick look at memory allocationand the differences between reference types and value types
Trang 9CHAPTER 4 Understanding Reference Types and Value Types
Managed Heap
cust
FIGURE 4.5 Reference type declaration
Reference Type Memory Allocation
Reference type objects are always allocated on the heap The following code is a typicalreference type object declaration and instantiation:
Customer cust = new Customer();
In earlier chapters, I explained that this was how you declare and instantiate a referencetype, but there is much more to the preceding line By declaring custas type Customer,the variable custis strongly typed, meaning that only compatible objects can be assigned
to it Figure 4.5 shows the declaration of cust, from the left side of the statement
In Figure 4.5, the custbox is in your code, representing the declaration of custas
Customer On the right side of the preceding code, the new Customer()is what creates thenew instance of a Customerobject The assignment puts a reference into custthat refers
to the new Customerobject on the heap, as shown in Figure 4.6
Figure 4.6 shows how the custvariable holds a reference to the new instance of a
Customerobject on the heap The heap is a portion of memory that the CLR uses to cate objects This is what you should remember: A reference type variable will either hold
allo-a reference to allo-an object on the heallo-ap or it will be set to the C# vallo-alue null
Next, you learn about value type memory allocation and how it is different from referencetype memory allocation
Value Type Memory Allocation
The answer to where value type variables are allocated is “It depends.” The two places that
a value type variable can be allocated is either the stack or along with a reference type onthe heap See the sidebar “What Is the Stack?” if you’re curious about what the stack is
Trang 10Reference Type and Value Type Memory Allocation
Managed Heap
cust new Customer()
FIGURE 4.6 Reference type object allocated on the heap
WHAT IS THE STACK?
The CLR has a stack for keeping track of the path from the entry point to the currently
executing method in an application Just like any other stack, the CLR stack works on a
last-in, first-out fashion When your program runs,Main(the entry point) is pushed onto
the stack Any method that Maincalls is then pushed onto the top of the stack
Method parameter arguments and local variables are pushed onto the stack, too When
a method completes, it is popped off the top of the stack, and control returns to the
next method in the stack, which was the caller of the method just popped
Value type variables passed as arguments to methods, as well as local variables defined
inside a method, are pushed onto the stack with the method However, if the value typevariable is a field of a reference type object, it will be stored on the heap along with thereference type object
Regardless of memory allocation, a value type variable will always hold the object that isassigned to it An uninitialized value type field will have a value that defaults to some
form of zero (booldefaults to false), as described in Chapter 2
C# 2.0 and later versions have a feature known as nullable types, which also allow valuetypes to contain the value null A later section of this chapter explains how to use
nullable types
Now you know where reference type and value type variables are allocated in memory, butmore important, you understand the type of data they can hold and why This opens thedoor to understanding the assignment differences between reference types and value
types, which is discussed next
Trang 11CHAPTER 4 Understanding Reference Types and Value Types
Reference Type and Value Type Assignment
Based on what you know so far about reference types and value types—their relationshipthrough the Unified Type System and memory allocation—the step to understandingassignment behavior is easier This section examines assignment among reference typesand assignment among value types You’ll see how reference type and value type assign-ment differs and what happens when assigned values are subsequently modified We look
at reference type assignment first
Reference Type Assignment
To understand reference type assignment, it’s helpful to look at previous sections of thischapter, focusing on reference type features Because the value of a reference type resides
on the heap, the reference type variable holds a reference (to the object on the heap).Keeping this in mind, here’s an example of reference type assignment:
Customer cust5 = new Customer();
cust5.Name = “John Smith”;
Customer cust6 = new Customer();
cust6.Name = “Jane Doe”;
Console.WriteLine(“Before Reference Assignment:”);
In the preceding example, you can see there are two variables, cust5andcust6, of type
Customerthat are declared and initialized Between sets of Console.WriteLinestatements,there is an assignment of cust6tocust5 The Console.WriteLinestatements show theeffect of the assignment, and here’s what they show when the program runs:
Before Reference Assignment:
cust5: John Smith
cust6: Jane Doe
After Reference Assignment:
cust5: Jane Doe
cust6: Jane Doe
You can see from the preceding output that the value of the Nameproperty in cust5and
cust6is different There are no surprises here because that is what the code explicitly didwhen declaring and instantiating the variables What could be misleading is the result,
Trang 12Reference Type and Value Type Assignment
after assignment, where both cust5.Nameandcust6.Nameproduce the same results Thefollowing statement and results show why the preceding results could be misleading:
cust6.Name = “John Smith”;
Console.WriteLine(“After modifying the contents of a Reference type object:”);
Console.WriteLine(“cust5: {0}”, cust5.Name);
Console.WriteLine(“cust6: {0}”, cust6.Name);
And here’s the output:
After modifying the contents of a Reference type object:
cust5: John Smith
cust6: John Smith
In the preceding code, the only assignment was to change theNamefield ofcust6to”JohnSmith”, but look at the results TheNamefields of bothcust5andcust6are set to the
same value What’s tricky is that the code in this last example didn’t use thecust5able at all
vari-Now, let’s see what happened To start off, look at Figure 4.7, which shows what the
memory layout is right after cust5andcust6were declared and initialized
Becausecust5andcust6are reference type variables, they hold a reference (address) of anobject on the heap Figure 4.7 represents the reference as an arrow coming from the vari-ablescust5andcust6toCustomerobjects These Customerobjects were allocated duringruntime when the new Customerexpression ran Each object contains a different value in
Managed Heap
cust5 new Customer()
{Name = “John Smith”}
cust6 new Customer()
{Name = “John Smith”}
FIGURE 4.7 Two reference type variables declared and initialized separately
Trang 13CHAPTER 4 Understanding Reference Types and Value Types
Managed Heap
cust5 new Customer()
{Name = “John Smith”}
cust6 new Customer()
{Name = “Jane Doe”}
FIGURE 4.8 Assigning one reference type variable to another
itsNamefield Next, you see the effects of assigning the cust6variable to cust5, shown inFigure 4.8
The assignment of cust6tocust5didn’t copy the object; it actually copied the contents ofthecust6variable, which was the reference to the object So, as shown in Figure 4.8, both
cust5andcust6now refer to the same object Figure 4.9 shows what happens after fying the Namefield of cust6
modi-Figure 4.9 shows that changing the Namefield of cust6actually modified the objectreferred to by cust6 This is important because both cust5andcust6refer to the sameobject, and any modification to that object will be seen through both the cust5andcust6
reference That’s why the output, after we modified cust6, showed that the Namefield of
cust5andcust6were the same
Managed Heap
cust5 new Customer()
{Name = “John Smith”}
cust6 new Customer()
{Name = “Jane Doe”}
FIGURE 4.9 Affects of modifying the contents of an object that has multiple references to it
Trang 14Reference Type and Value Type Assignment
Looking at reference type assignment in a more general perspective, you can assume
that modifications to the contents of an object are visible through any reference to thesame object
Assignment behavior isn’t the same for reference types and value types The next sectionshows how value type assignment works and how reference type and value type assign-ment differs
Value Type Assignment
Value type assignment is a whole lot simpler than reference type assignment Because avalue type variable holds the entire object, rather than only a reference to the object, nospecial actions can be occurring behind the scenes to affect that value Here’s an example
of a value type assignment, using the Money struct that was created earlier in the chapter:
field of both cash1andcash2is the same, as shown in the following output:
Before Value Assignment:
Trang 15CHAPTER 4 Understanding Reference Types and Value Types
same as reference type assignment The next example demonstrates how value types areseparate entities that hold their own values:
After modifying contents of Value type object:
assign-In the next section, you learn a few more of the differences between reference types andvalue types
More Differences Between Reference Types and Value Types
In addition to memory allocation, variable contents, and assignment behavior, there areother differences between reference types and value types These differences can be catego-rized as inheritance, construction, finalization, and size recommendations These issuesare covered thoroughly in later chapters, so I just give you a quick overview here of whatthey mean and let you know where in this book you can get more in-depth information
Inheritance Differences Between Reference Types and Value Types
Reference types support implementation and interface inheritance They can derive fromanother reference type or have a reference type derive from them However, value typescan’t derive from other value types Chapter 9 goes into detail about how implementationinheritance works in C#, but here’s a quick example:
class Customer
Trang 16More Differences Between Reference Types and Value Types
In the preceding example, Customeris the base class PotentialCustomerand
RegularCustomerderive from Customer, that is, they are types of Customer, as indicated
by the :(colon) on the right side of the classidentifier
SINGLE-IMPLEMENTATION INHERITANCE
Reference types support single-implementation inheritance, meaning that they can
derive only from a single class However, both reference types and value types support
multiple-interface inheritance where they can implement many interfaces
Construction and Finalization Differences Between Reference Types and Value Types
Construction is the process that occurs during the instantiation process to help ensure anobject has the information you want it to have when it starts up Chapter 15, “ManagingObject Lifetime,” discusses this in detail, but for reference, you should know that you
can’t create a default constructor for a value type Chapter 3, “Writing C# Expressions andStatements,” contains the default values of built-in types, which are some form of zero.Value types are automatically initialized when declared, which is why the cash1andcash2
variables in the previous section didn’t need to be instantiated with new Money()
Finalization is a process that occurs when the CLR is performing garbage collection, ing up objects from memory Value type objects don’t have a finalizer, but reference types
clean-do A finalizer is a special class member that could be called by the CLR garbage collectorduring cleanup Value types are either garbage collected with their containing type or
when the method they are associated with ends Therefore, value types don’t need a izer Because garbage collection is a process that operates on heap objects, it is possible for
final-a reference type to hfinal-ave final-a finfinal-alizer Chfinal-apter 15 goes into the gfinal-arbfinal-age collection process indetail, giving you the information you need to make effective design decisions on imple-menting finalizers, and other techniques for managing object lifetime effectively
Trang 17CHAPTER 4 Understanding Reference Types and Value Types
Object Size Considerations for Reference Types and Value Types
Because of the way an object is allocated differently for reference types and value types,you might need to consider the impact of the size of the object on resources and perfor-mance Reference type objects can generally be whatever size you need because the vari-able just holds a reference, which is 32 bits on a 32-bit CLR and 64 bits on a 64-bit CLR.However, value type size might need more thought
If a value type is a field inside of a class or at some level of containment that puts it into aclass, its size shouldn’t be much concern However, think about scenarios where you mightneed to pass a value type to a method In the case of a reference type argument, it is simplythe reference being passed, but for a value type argument, the entire object is passed.With local variables and parameters that are value types, the CLR pushes the entire objectonto the stack when calling the associated method Now, instead of the 4 or 8 bytes thatwould have been pushed with a reference type, you have potentially much more informa-tion to push, which represents overhead A recommended rule of thumb for value typesize is 16 bytes I’ve benchmarked this by calling methods that have value type parameters
of differing sizes and verified that performance does tend to deteriorate faster as the size ofthe value type increases above 16 bytes That said, you should also look at how manytimes you’ll call the method before the performance implications matter to you; that is,consider whether the method is called frequently or in a loop
REFERENCE TYPE OR VALUE TYPE: WHICH TO CHOOSE?
As a rule of thumb, I typically create new types as classes, reference types The tion is when I have a type that should behave more like a value type For example, a
excep-ComplexNumberwould probably be better as a struct value type, because of its
memo-ry allocation, assignment behavior, and other capabilities such as math operations thatare similar to built-in value types such as int and double
Among the many tips you get from this chapter for working with both reference types andvalue types, you also have a lot of facts related to tradeoffs Look at the differences to seewhat matters the most in your situation and choose the tradeoffs that are best for you.The next section looks at specific NET types, building on what you learned so far in thischapter
C# and NET Framework Types
In Chapter 1, “Introducing the NET Platform,” you learned about the CTS and in
Chapter 3, you learned how to use the C# built-in types This section melds these twofeatures together and builds upon them so that you can see how C# types support theCTS We also look at a couple NET Framework types (specifically, DateTimeandGuid) thatare important but don’t have C# keyword aliases
Trang 18C# and NET Framework Types
C# Aliases and the CTS
C# types are specified with keywords that alias NET CLR types Table 4.1 shows all the
.NET types and which C# keywords alias them
Some of the types in Table 4.1 are marked as “No alias” because C# doesn’t have a
keyword that aliases that type, but the type is still important For example, DBNullis a
value that comes from a database field that is set to NULLbut is not equal to the C# null
value The following sections show you how to work with the System.Guidand
System.DateTimetypes, which don’t have C# aliases either
Using System.Guid
A globally unique identifier (GUID) is a 128-bit string of characters used whenever there is
a need for a unique way to identify something You can see GUIDs used throughout theMicrosoft operating system Just look at the registry; all of those long strings of characters
TABLE 4.1 NET Types with Matching C# Aliases
.NET Type C# Alias
Trang 19CHAPTER 4 Understanding Reference Types and Value Types
are GUIDs Another place GUIDs are used is as unique columns in SQL Server for whenyou need unique IDs across separate databases Generally, any time you need a uniquevalue, you can reliably use a GUID
GUIDs are Microsoft’s implementation of universally unique identifiers (UUID), an OpenSoftware Foundation (OSF) standard You can find more information about UUIDs atWikipedia, http://en.wikipedia.org/wiki/Universally_Unique_Identifier
.NET implements the GUID as the System.Guid(Guid) struct You can use the Guidtype togenerate new GUIDs or work with an existing Guidvalue Here’s an example of how youdon’t want to create a new GUID:
Guid uniqueVal1 = new Guid();
Console.WriteLine(“uniqueVal1: {0}”, uniqueVal1.ToString());
The problem here is that Guidis a value type and it is immutable (can’t be modified) Ifyou recall from a previous section, value types have a default (no parameter) constructorthat you can’t override, and the default value is some form of zero Therefore, the follow-ing output from the preceding code makes sense:
uniqueVal1: 00000000-0000-0000-0000-000000000000
BecauseGuidis immutable, you can’t change this value Fortunately, if you have a Guid
value already defined, you are still able to work with it because Guidhas several overloadsfor specifying an existing GUID Here’s the Guidconstructor overload that takes a string:
uniqueVal1 = new Guid(“89e9f11b-00ee-47dc-be15-01f70eeac3f9”);
Trang 20C# and NET Framework Types
Working with System.DateTime
Many programs need to work with dates and times Fortunately, NET has the
System.DateTime(DateTime) type to help out You can use DateTimeto hold DateTime
values, extract portions such as the day of a DateTime, and perform arithmetic
calcula-tions You can also parse strings into DateTimeinstances and emit DateTimeinstances as astring in the format of your choice
Creating New DateTime Objects
The default value ofDateTimeis Jan 1, 0001 at 12:00 midnight Here’s how to create the
You can initialize the DateTimethrough the constructor, which has several overloads
Here’s an example of how to use one of the more detailed overloads:
date = new DateTime(2008, 7, 4, 21, 35, 15, 777);
This section worked with the entire DateTime, but sometimes you only want to have
access to parts of a DateTime The next section shows you how to extract different parts of
aDateTime
Extracting Parts of a DateTime
You can access any part of a DateTimeinstance, including parts of the date, day of the
week, or day of the year Here’s an example:
Console.WriteLine(
“{0} day {1} of the month is day {2} of the year”,
date.DayOfWeek, date.Day, date.DayOfYear);
Trang 21CHAPTER 4 Understanding Reference Types and Value Types
And here’s the output
Friday day 4 of the month is day 186 of the year
You can also extract other parts of the date (for example, month and hour) If you’re usingVS2008, you can see all of them in IntelliSense
Next, you learn how to manipulate DateTimeobjects
DateTime Math and TimeSpan
You often need to manipulate DateTimeobjects However, because they are immutable,you need to create a new instance with a modified value Here’s an example:
Console.WriteLine(“date before AddDays(1): {0}”, date);
date.AddDays(1);
Console.WriteLine(“date after AddDays(1): {0}”, date);
The preceding code calls the AddDaysmethod, trying to add a day, but the original value,
date, doesn’t change, as shown by the following output:
date before AddDays(1): 11/4/2007 8:48:26 PM
date after AddDays(1): 11/4/2007 8:48:26 PM
This just proves that DateTimeis immutable and hopefully saves you from making thiscommon mistake yourself Here’s how you can change the datevariable:
Console.WriteLine(“date before AddDays(1): {0}”, date);
date = date.AddDays(1);
Console.WriteLine(“date after AddDays(1): {0}”, date);
If you look at the documentation for AddDaysand other methods of DateTimethat ulate dates, you see that they return a DateTime Just reassign the return value to the origi-nal variable, as in the preceding example, and it will work fine Here’s the output:
manip-date before AddDays(1): 11/4/2007 8:52:08 PM
date after AddDays(1): 11/5/2007 8:52:08 PM
The preceding date shows that datewas truly modified because the day was incremented
by one as intended
You can also use the DateTimetype for quick-and-dirty performance benchmarks Here’ssome code that does DateTimemath and produces a TimeSpanobject to tell how long analgorithm took Here’s an example of how you might go about this:
int testIterations = int.MaxValue/4;
DateTime start = DateTime.Now;
Trang 22C# and NET Framework Types
DateTime finish = DateTime.Now;
TimeSpan elapsedTime = finish - start;
Console.WriteLine(“Elapsed Time: {0}”, elapsedTime);
Theforloop is code that you might change to hold whatever algorithm you need to
benchmark The example gets the current time before and after the forloop Notice howthe mathematical operation, subtracting startfromfinish,produced a TimeSpan A
TimeSpanis used to represent an amount of time, as opposed to an exact time as held by
DateTime Any time you perform a mathematical operation on DateTimetypes, the returnvalue is a TimeSpan Here’s the output:
Elapsed Time: 00:00:16.2834144
Here’s an exercise that you might find fun Create a few methods that take value type
parameters of varying sizes For example, you could create multiple versions of Moneyandadd more decimal fields to make them bigger Then use the benchmark preceding code tocall each method a specified number of times and compare the TimeSpanresults of each
Do the same with a reference type This exercise will let you know at what point the size
of the value type affects performance
Converting Between DateTime and string Types
If the user inputs a date and/or time,it will often reach the code in the form of a string.Alternatively, sometimes a DateTimeneeds to be formatted and presented in the form of astring This section shows you how to read string types into a DateTimeand how to formatthe output of a DateTime
TheDateTimetype has a Parsemethod you can use to get the value of a string Here’s
how you can use it:
Console.Write(“Please enter a date (mm/dd/yyyy): “);
string dateStr = Console.ReadLine();
date = DateTime.Parse(dateStr);
Console.WriteLine(“You entered ‘{0}’”, date);
The user’s input, retrieved by the call to Console.ReadLine, came back in the form of a
string,dateStr The call to DateTime.Parseconverted the string to a DateTime, which cannow be manipulated with DateTimemembers as described in previous sections
Trang 23CHAPTER 4 Understanding Reference Types and Value Types
You could see an exception message after typing in the date on the command line Thiswould be because the date was not typed in the correct format In Chapter 11, “Error andException Handling,” you’ll learn how to handle errors like this, and Chapter 10 shows youhow to use theTryParsemethod, which is effective for handling user input Here’s theoutput:
Please enter a date (mm/dd/yyyy): 11/04/2007
You entered ‘11/4/2007 12:00:00 AM’
Notice from the output that the response to the Please enter a date (mm/dd/yyyy)
prompt was 11/04/2007 However, the response included the time, which you may or maynot want In case you don’t want the time to show or you want the output to appeardifferently, you have the option to specify the output format Here’s what you could do toremove the time from the preceding output:
Console.WriteLine(“Date Only: {0:d}”, date);
The preceding example used the format specifier in the placeholder of
Console.WriteLine’s format string parameter You could have also used the DateTimeToStringmethod like this:
Console.WriteLine(“Date Only: {0}”, date.ToString(“d”));
A lowercase dmeans to print a short date time Here’s what it looks like:
Date Only: 11/4/2007
In addition to thed, there are several other predefined format specifiers, shown in Table 4.2
TABLE 4.2 StandardDateTime Format Strings
Format String Output for date = new DateTime(2008, 7, 4, 21, 35, 15, 777);
Trang 24C# and NET Framework Types
Table 4.2 shows a predefined set of strings for formatting dates, but you aren’t limited bythis list You can also customize DateTimestrings Here’s an example that ensures two
characters for each part of the date:
Console.WriteLine(“MM/dd/yy: {0:MM/dd/yy}”, date);
Based on the input used for Table 4.2, the output would be this:
be useful; it also includes the ones used in the preceding example
As with DateTime, most of the other built-in types are value types whose value is alwaysdefined The next section discusses nullable types and helps you deal with those situationswhere the value you have to work with is not defined, but is null
TABLE 4.3 Common Custom DateTime Format Strings
Format String Purpose
Ddd Abbreviated name of day (for example, Fri)
dddd Full name of day (for example, Friday)
MMM Abbreviated name of month (for example, Jul)
MMMM Full month name (for example, July)
Trang 25CHAPTER 4 Understanding Reference Types and Value Types
Nullable Types
As you’ve learned previously, the default value for reference types is null, and the defaultvalue for value types is some form of zero Sometimes, you receive values from externalsources, such as XML files or databases that don’t have a value—they could be nilor
null, respectively For reference types, this is no problem However, for value types, youhave to find your own solution for working with nullvalues
This problem, not being able to assign nullto value types, was alleviated in C# 2.0 withthe introduction of a feature called nullable types It essentially allows you to declarenullable value types to which, as the name suggests, you can assign the value null.Think about how useful this is SQL Server has column types that map to the C# built-intypes For example, SQL Server moneyanddatetimecolumn types map to C# decimaland
DateTimetypes In SQL Server, these values can be null However, that is particularlyproblematic when dealing with an application that interfaces with multiple databases Youcould be working with FoxPro, SQL Server, and another database, and they all return adifferent default DateTimevalue, which makes a mapping between DBNulland the default
DateTimevalue impractical This is just one element of complexity you have to deal withfornulldatabase values, and there are many more By having nullable types, we can morequickly write easier-to-maintain code
An entire part of this book, Chapters 19 to 23, provides extensive coverage of workingwith data in NET, and this material is applicable in the context of those chapters
However, the examples here assume that there is code that has extracted data from a datasource that contains nullvalues The following example assumes there is a value from adatabase for the creation date of a record:
DateTime? createDate = null;
The most noticeable part of the preceding statement is the question mark suffix, ?, on the
DateTimetype The proper terminology for this is that the type of createDateis a nullable
DateTime It is explicitly set to the value null, which is not possible in non-nullable valuetype objects
There are a couple ways to check a nullable type to see whether it has the value null.Here’s an example:
bool isNull;
isNull = createDate == null;
isNull = createDate.HasValue;
Using the C# equals operator, you can learn whether createDateis set to null Calling
HasValuewill return trueifcreateDateis not null The C# not equals operator, !=, isequivalent in behavior to HasValue
Trang 26createDate = createDate ?? DateTime.Now;
As you can see, the ??operator is quicker to code for such a simple task Here’s anotherexample:
DateTime? defaultDate = null;
createDate = createDate ?? defaultDate ?? DateTime.Now;
In the preceding code, if createDateisnull,defaultDateis evaluated If defaultDateisnotnull,defaultDateis assigned to createDate Otherwise, the next expression,
DateTime.Now, is evaluated If none of the expressions are non-null, the last expression inthe chain of ??operators is returned, even if it’s null, too
mater-back to this chapter for clarification
A related subject is the C# built-in types and how they relate to NET types Some NETtypes don’t have a C# keyword equivalent, such as GuidandDateTime Whereas you
might use Guidjust occasionally, you will probably use DateTimea lot, and this chaptershowed you much of the common usage you’ll need
This chapter discussed nullable types, which are very applicable for working with valuetype data In later Chapters, 19 through 23 to be specific, you’ll see extensive discussion
of C# and NET data capabilities, which demonstrates effective implementation of
Nullable types
Up until now, we’ve mostly discussed the in value types However, there is one
built-in reference type, strbuilt-ing, that is pervasive for most programs The next chapter goes built-intodepth about the string type and how to use it
Trang 27This page intentionally left blank
Trang 28CHAPTER 5
Manipulating Strings
IN THIS CHAPTER
The C# String Type
TheStringBuilderClass
Regular Expressions
Strings are ubiquitous in programming, so much so that
the NET Framework Class Library (FCL) has extensive
support for strings Besides the string type with numerous
methods, there is a special class called StringBuilderfor
manipulating strings efficiently In this chapter, you’ll read
about the string and StringBuildertypes
A related feature, regular expressions, has FCL APIs that
offer even greater flexibility for working with strings In this
chapter, you’ll learn how to build a regular expression and
use it for pattern matching on blocks of text Let’s look at
the C# string type first
The C# String Type
Among the C# built-in types, string is the only reference
type This suprises people sometimes because of the fact
that it is a built-in type and has behavior similar to value
types If you are a little fuzzy on the differences between
reference types and value types, you might want to refer to
Chapter 4, “Understanding Reference Types and Value
Types,” for a quick refresher The features of a string type
that makes it behave like a value type are immutability and
being sealed
The string type is immutable, meaning that a string can’t be
modified once created All methods that appear to modify a
string really don’t; they create a new stringobject on the
heap and return a reference to the new stringobject
The string type is also sealed, meaning that it can’t be
derived from
Trang 29CHAPTER 5 Manipulating Strings
Being immutable and sealed makes the string type more efficient and secure The ciency comes from the way the Common Language Runtime (CLR) manages strings inmemory with an intern pool and limits the overhead of changing string content From asecurity perspective, sealing a string keeps derived classes from manipulating stringcontent Sealing also supports CLR memory efficiencies and eliminates the overhead ofvirtual type member management
effi-Now, let’s check out what you can do with string types The following sections describemembers of the stringclass Remember that members called on the string type (forexample,string.Format) are static methods, and those called on a stringinstance areinstance methods
YOU CAN FIND OVERLOADS WITH INTELLISENSE
Many string methods have overloads, allowing you to use the method with different
types or numbers of parameters A quick way to see the overloads is to take advantage
of IntelliSense in the editor
For example, if you type str, type (dot) to fill in the string, type C, and type ( (left
parenthesis) to fill in the Compare, you’ll see IntelliSense pop up On the left side of
the IntelliSense pop-up, you’ll find up and down arrows labeled 1 to 8 You can pressthe up and down arrows on the keyboard to traverse the available overloads
to implement the Formatmethod:
para-string.Format({0,15}, string 2) = [string 2]
You might want to take a closer look at how this output occurred by noticing that the
formatStringvariable itself was used as input to the first index, {0}, which is part of
Trang 30The C# String Type
Rorr Round trip (guarantees conversion from floating point to string
and back again)
Console.WriteLine’s format string parameter Used with Format, the formatStringable becomes a format item; otherwise, it is a normal string
vari-As you can see, the result is a 15-character string, between brackets, with the text right
aligned and padded to the left with spaces The comma between the 0 and 15 in{0,15}
separates the index from alignment (specifies both alignment and character width)
If you don’t want the result to be right-aligned, make the alignment negative so that it
reads as {0,-15}, which will look like this:
string.Format({0,-15}, string 2) = [string 2]
In addition to alignment, you can control the output format of the parameter matching
an index with a format string The following example applies a numeric parameter, 10, totwo different format items, currency and hex:
Currency: $10.00, Hex: ‘ A’
You can see that currency used a U.S dollar sign and a period to separate dollars from
cents If your machine were set to another locale, the output would have matched yourcurrency symbol and other punctuation Table 5.1 shows several other standard numericformat strings
Trang 31CHAPTER 5 Manipulating Strings
TABLE 5.2 Custom Numeric Format Strings
Custom Numeric Format Character Meaning
The standard numeric format strings are useful because they are quick to use for commonscenarios Sometimes, however, you need more control over the output, building your owncustom format strings Table 5.2 has a list of custom numeric format strings
Using Table 5.2, we can re-create the currency format like this:
Trang 32The C# String Type
The first format string matches a positive number, the second format string matches a
negative number, and the third format string matches zero
Comparing Strings
When comparing strings, it’s often easier to use comparison operators, such as ==,<, or >.However, the CompareandCompareOrdinalmethods are available to retrieve a single int
value for the results of the comparison Some types in the FCL actually require an int
value specifying less than, equal, or greater than, so having this available to call is nient The following paragraphs discuss the CompareandCompareOrdinalmethods To
conve-keep from repeating code, you can assume the values being used are as follows:
The variable, intResult, is –1
TheCompareOrdinalmethod compares two strings, independent of localization It
produces the following integer results:
Trang 33CHAPTER 5 Manipulating Strings
Notice how I switched the order of strings from ComparetoCompareOrdinal The
intResultfrom the call to CompareOrdinalis1
TheCompareTomethod compares the value of the thisinstance with a parameter string Itproduces the following integer results:
this < string= negative
this == string= zero
this > string= positive
The result, intResult, is –1
Checking for String Equality
Comparemethods, as you learned about earlier, are good for sorting algorithms becausethey help figure out which value comes before another However, sometimes you justneed to know whether two strings are equal (for example, in a search algorithm)
A quick and common way to check for string equality is to use the==operator Here’s
Trang 34The C# String Type
The instance Equalsmethod also determines whether two strings are equal, returning a
boolvalue of truewhen they are equal and a boolvalue of falsewhen they’re not
Here’s an example:
boolResult = str1.Equals(str2);
Console.WriteLine(“{0}.Equals({1}) = {2}\n”,
str1, str2, boolResult);
In this example, the Equalsmethod accepts one string parameter Because str1has a
different value than str2, the return value is false
Concatenating Strings
C# has a concatenation operator, +, that makes is easy to concatenate strings Here’s howyou can use it:
strResult = str1 + “, “ + str2;
ThestrResultvariable will equal “string 1, string 2” after the preceding statement
executes This is equivalent to calling the Concatmethod, but with shorter syntax
As you’ve seen previously, most of the Console.WriteLinestatements have used
place-holders to define where a parameter should go You can also use the concatenation tor instead, like this:
The first string uses the results of the previous statement, the second uses the format
string technique you’ve seen in all earlier examples, and the third uses concatenation toproduce a single parameter for the Console.WriteLinecall You have several choices, andall are valid
Trang 35CHAPTER 5 Manipulating Strings
Yet another concatenation method is the Concatmethod, which creates a new string fromone or more input strings or objects Here’s an example of how to implement the Concat
method using two strings:
TheCopymethod makes a copy of str1 The result is a copy of str1placed in
stringResult This is not the same as assignment, shown here:
strResult = str1;
After executing the preceding line, both strResultandstr1hold identical references tothe same string in memory However, Copycreated a new instance of the string Rememberthat string is a reference type (Chapter 4 has more info if you need a refresher.)
If you don’t want to copy an entire string, perhaps just a subset, you can use the CopyTo
method, which copies a specified number of characters from one string to an array ofcharacters Here’s an example of how to implement the CopyTomethod:
char[] charArr = new char[str1.Length];
str1.CopyTo(0, charArr, 0, str1.Length);
Console.WriteLine(
“{0}.CopyTo(0, charArr, 0, str1.Length) = “,
str1);
Trang 36The C# String Type
And here’s the output:
string 1.CopyTo(0, charArr, 0, str1.Length) =
s t r i n g 1
This example shows the CopyTomethod filling a character array It copies each characterfromstr1intocharArr, beginning at position 0 and continuing for the length of str1.Theforeachloop iterates through each element of charArr, printing the results
TheClonemethod returns a copy of a string Here’s an example of how to implement the
return value must be cast to a string before assignment to stringResult
Inspecting String Content
Sometimes you need to search for a string to see whether it begins, ends, or contains a
substring anywhere in between For these tasks, you can use the StartsWith,EndsWith,
andContainsstring methods
TheStartsWithmethod determines whether a string prefix matches a specified string
Here’s an example of how to implement the StartsWithmethod:
an example of how to implement the EndsWithmethod:
Trang 37CHAPTER 5 Manipulating Strings
The results of the call to Containsaretruebecause the value of str1does contain ”ring”
Extracting String Information
Beyond just checking to see whether a string contains a value, you can find out tion about where the string is located by using the IndexOfandLastIndexOfmethods.You can use these results in the CopyTomethod, shown earlier, or for explicitly extractingthe contents of a substring with the SubStringmethod
informa-TheIndexOfmethod returns the position of a string IndexOfreturns–1if the string isn’tfound Here’s an example of how to implement the IndexOfmethod:
intResult = str1.IndexOf(‘1’);
Console.WriteLine(“str1.IndexOf(‘1’): {0}”, intResult);
The return value of this operation is 7because that’s the zero-based position within str1
where the character ’1’occurs
TheLastIndexOfmethod returns the position of the last occurrence of a string or ters within a string Here’s an example of how to implement the LastIndexOfmethod:
charac-string filePath = @”c:\Windows\Microsoft.NET\Framework\v3.5.x.x\csc.exe”;
Trang 38The C# String Type
You can use the IndexOfandLastIndexOfmethods to extract substrings from a string
using the SubStringmethod The SubStringmethod retrieves a substring at a specifiedlocation of a string Here’s an example of how to implement the SubStringmethod:
strResult = str1.Substring(str1.IndexOf(“ring”), 4);
Console.WriteLine(“str1.Substring(str1.IndexOf(\”ring\”), 4) : {0}”, strResult);
Here’s the output:
str1.Substring(str1.IndexOf(“ring”), 4) : ring
The first parameter was the position in str1to begin, which was returned by the call to
IndexOf The second parameter was the length of the substring
Padding and Trimming String Output
When displaying strings, you’ll often want to control the spacing or characters surroundingeach side of the string For example, you might want to apply spacing or zero padding on oneside or the other of a string to get it to line up properly, perhaps in a column, in the output.Other times, you’ll receive strings with spaces or some other character on the beginning, end,
or both sides of a string (things you would rather not see) This section introduces you to
padding methods for adding characters and trimming methods for removing characters
ThePadLeftmethod right-aligns the characters of a string and pads the left with spaces(by default) or a specified character Here’s an example of how to implement the PadLeft
Opposite to the PadLeftmethod, the PadRightmethod left aligns the characters of a
string and pads on the right with spaces (by default) or a specified character Here’s an
example of how to implement the PadRightmethod:
strResult = str1.PadRight(15, ‘*’);
Console.WriteLine(“str1.PadRight(15, ‘*’): ”, strResult);
Trang 39CHAPTER 5 Manipulating Strings
The example shows the PadRightmethod creating a 15-character string with the originalstring left-aligned and filled to the right with *characters, as shown here:
TheTrimEndmethod removes a specified set of characters from the end of a string Here’s
an example of how to implement the TrimEndmethod:
strResult = trimString.TrimEnd(new char[] {‘ ‘});
Console.WriteLine(“trimString.TrimEnd(): ”,
strResult);
In this example, the TrimEndmethod removes all the whitespace from the end of
trimString The result is ”nonwhitespace”, with no spaces on the right side
TheTrimStartmethod removes whitespace or a specified number of characters from thebeginning of a string Here’s an example of how to implement the TrimStartmethod:
strResult = trimString.TrimStart(new char[] {‘ ‘});
Console.WriteLine(“trimString.TrimStart(): ”,
strResult);
Here, the TrimStart()method removes all the whitespace from the beginning of
trimString The result is ”nonwhitespace”, with no spaces on the left side
Trang 40The C# String Type
Modifying String Content
A few string methods return a modified version of a string You can insert, remove, or
replace the content of a string by using the Insert,Remove, and Replacemethods, tively Other modification methods include ToLowerandToUpper, which convert all stringcharacters to lowercase and uppercase, respectively
respec-TheInsertmethod returns a string where a specified string is placed in a specified tion of an original string All characters at and to the right of the insertion point are
posi-pushed right to make room for the inserted string Here’s an example of how to
imple-ment the Insertmethod:
Strictly speaking, you never really modify a string A string is immutable, meaning that
it can’t change What really happens when calling a method such as Insert,Remove,
orReplaceis that the CLR creates a new string object and returns a reference to that
new string object The original string never changed
This is a common mistake by people just getting started with C# programming, so
remember this any time you look at a string after one of these operations, thinking that
it should be changed Instead, assign the results of the operation to a new string
vari-able Assigning the result of the string manipulation to the same variable will work, too;
it just assigns the new string object reference to the same variable
TheRemovemethod deletes a specified number of characters from a position in a string.Here’s an example of how to implement the Removemethod:
strResult = str2.Remove(3, 3);
Console.WriteLine(“str2.Remove(3, 3): {0}”,
strResult);
This example shows the Removemethod deleting the fourth, fifth, and sixth characters
fromstr2 The first parameter is the zero-based starting position to begin deleting, andthe second parameter is the number of characters to delete The result is ”str 2”, wherethe”ing”was removed from the original string
TheReplacemethod replaces all occurrences of a character or string with a new character
or string, respectively Here’s an example of how to implement the Replacemethod: