Roadmap 12 .1 String Overview 12.2 String Literals 12.3 Format Specifiers and Globalization 12.4 Working String from Outsite Sources 12 5 StringBuilder 12.6 Searching Strings with Regula
Trang 1Chapter 12 Working With String
Hoang Anh Viet
VietHA@it-hut.edu.vn
HaNoi University of Technology
1
Trang 2“Describes how strings are a first-class type in the CLR and how to use them effectively in C# A large portion of the chapter covers the string-formatting capabilities of various types in the NET Framework and how to make your defined types behave similarly by implementing IFormattable Additionally, I introduce you to the globalization capabilities of the framework and how to create custom CultureInfo for cultures and regions that the NET Framework doesn’t already know about .”
Trang 3Roadmap
12 1 String Overview
12.2 String Literals
12.3 Format Specifiers and Globalization
12.4 Working String from Outsite Sources
12 5 StringBuilder
12.6 Searching Strings with Regular Expression
3
Trang 412.1 String Overview
In C#, String is a built-in type
In the built-in type collection , String is a reference type and but most of the built-in types are value types
Trang 5String Basics
A string is an object of type String whose value is text
The text is stored as a readonly collection of Char
objects
Each of which represents one Unicode character
encoded in UTF-16
There is no null-terminating character at the end of a
C# string (unlike C and C++) therefore a C# string can contain any number of embedded null characters ('\0')
The length of a string represents the number of
characters regardless of whether the characters are
formed from Unicode surrogate pairs or not
5
Trang 6Alias and String Class
Alias
• In C#, the string keyword is an alias for String -> string and
String are equivalent
Trang 7Declaring and Initializing Strings
We can declare and initialize strings in various ways, as shown in the following example:
// Declare without initializing
string message1;
// Initialize to null
string message2 = null;
// Initialize as an empty string
// Use the Empty constant instead of the literal ""
string message3 = System.String.Empty;
//Initialize with a regular string literal
string oldPath = "c:\\Program Files\\Microsoft Visual Studio 8.0";
7
Trang 8// Use System.String if we prefer
System.String greeting = "Hello World!";
// In local variables (i.e within a method body)
// you can use implicit typing
var temp = "I'm still a strongly-typed System.String!";
// Use a const string to prevent 'message4' from /
/ being used to store another string value
const string message4 = "You can't get rid of me!";
// Use the String constructor only when creating
// a string from a char*, char[], or sbyte* See
// System.String documentation for details
char[] letters = { 'A', 'B', 'C' };
string alphabet = new string(letters);
Declaring and Initializing Strings
Trang 9Immutability of String Objects
String objects are immutable: they cannot be changed after they have been created.
All of the String methods and C# operators that appear to
modify a string actually return the results in a new string
object.
For example:
string s1 = "A string is more ";
string s2 = "than the sum of its chars.";
// Concatenate s1 and s2 This actually creates a new
// string object and stores it in s1, releasing the
// reference to the original object
s1 += s2;
System.Console.WriteLine(s1);
// Output: A string is more than the sum of its chars.
9
Trang 10Immutability of String Objects
Note:
• When create a reference to a string, and then "modify" the
original string, the reference will continue to point to the original object instead of the new object that was created when the string was modified.
Trang 11Remark
When we declare a string in your C# code, the compiler creates a System.String object for us
And then it places into an internal table in the module
called the intern pool.
The compiler first checks to see if we’ve declared the
same string elsewhere, and if we have, then the code simply references the one already interned
11
Trang 1212 1 String Overview
12.2 String Literals
12.3 Format Specifiers and Globalization
12.4 Working String from Outsite Sources
12 5 StringBuilder
12.6 Searching Strings with Regular Expression
Trang 13
string columns = "Column 1\tColumn 2\tColumn 3";
//Output: Column 1 Column 2 Column 3
string rows = "Row 1\r\nRow 2\r\nRow 3";
string title = "\"The \u00C6olean Harp\", by Samuel Taylor Coleridge";
//Output: "The olean Harp", by Samuel Taylor Coleridge
13
Trang 14Regular and Verbatim String Literals
Use verbatim strings for convenience and better
readability when the string text contains backslash
characters, for example in file paths
Verbatim strings use the delaration preceded with the @ character
For example:
Trang 15string filePath = @"C:\Users\scoleridge\Documents\";
//Output: C:\Users\scoleridge\Documents\
string text = @"My pensive SARA ! thy soft
cheek reclined
Thus on mine arm, most soothing sweet it is
To sit beside our Cot, ";
/* Output:
My pensive SARA ! thy soft cheek reclined
Thus on mine arm, most soothing sweet it is
To sit beside our Cot,
*/
string quote = @"Her name was ""Sara.""";
//Output: Her name was "Sara."
Example
15
Trang 16String Escape Sequences
\U Unicode escape sequence for surrogate pairs. \Unnnnnnnn
\u Unicode escape sequence \u0041 = "A"
Trang 17Roadmap
12 1 String Overview
12.2 String Literals
12.3 Format Specifiers and Globalization
12.4 Working String from Outsite Sources
12 5 StringBuilder
12.6 Searching Strings with Regular Expression
17
Trang 1812.3 Format Specifiers and
Globalization
Format the data to display to users in a specific way
• Dealing with these sorts of issues
• Handling formatting of values,so on.
For example:
• Display a floating-point value representing some tangible
metric in exponential form or
• Display in fixed-point form.
Trang 19Format Strings
A format string is a string whose contents can be
determined dynamically at runtime
We create a format string by using
• Using the static Format method
• Embedding placeholders in braces that will be replaced by other values at runtime
The built-in numeric objects use the standard numeric format strings or the custom numeric format strings
defined by the NET Framework
19
Trang 20Format Strings
The standard format strings are typically of the form Axx
• A is the desired format requested
• And xx is an optional precision specifier.
Examples of format specifiers for numbers are
• "C" for currency
• "D" for decimal
• "E" for scientific notation
• "F" for fixed-point notation
• And "X" for hexadecimal notation
• "G" for general This is the default format specifier, and is also the format that we get when we call Object.ToString.
• Suports one of the custom format strings.
Trang 21class FormatString
{
static void Main()
{
// Get user input.
System Console.WriteLine("Enter a number");
string input = System Console.ReadLine();
// Convert the input string to an int.
int j;
System Int32.TryParse(input, out j);
// Write a different string each iteration.
string s;
for (int i = 0; i < 10; i++)
{
// A simple format string with no alignment formatting.
s = System String.Format("{0} times {1} = {2}", i, j, (i * j));
Trang 24Object.ToString, IFormattable
All built-in numeric types as well as date-time types
implement this interface
An object that implements the IFormatProvider interface is—surprise—a format provider
Trang 25Roadmap
12 1 String Overview
12.2 String Literals
12.3 Format Specifiers and Globalization
12.4 Working String from Outsite Sources
12 5 StringBuilder
12.6 Searching Strings with Regular Expression
25
Trang 2612.4 Working String from Outside
Trang 27Remark
Encoding is the process of transforming a set of Unicode characters into a sequence of bytes
Decoding is the reverse; it is the process of transforming
a sequence of encoded bytes into a set of Unicode
characters
The Unicode Standard assigns a code point (a number)
to each character in every supported script
A Unicode Transformation Format (UTF) is a way to
encode that code point
27
Trang 28The Standard Unicode
UTF-x Describle
UTF-8 Which represents each code point as a sequence of one
to four bytes.
UTF-16 Which represents each code point as a sequence of one
to two 16-bit integers
UTF-32 Which represents each code point as a 32-bit integer
Trang 29are supported
UTF32Encoding Encodes Unicode characters using the
UTF-32 encoding
29
Trang 30GetEncoding and method
Use the GetEncoding method to obtain other encodings
Use the GetEncodings method to get a list of all
encodings
Trang 31GetByteCount and GetByte method
The GetByteCount method determines how many bytes result in encoding a set of Unicode characters
The GetBytes method performs the actual encoding
Likewise, the GetCharCount method determines how
many characters result in decoding a sequence of bytes
The GetChars method performs the actual decoding
For example:
31
Trang 32string unicodeString = "This string contains the
unicode character Pi(\u03a0)";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[ ] unicodeBytes =
unicode.GetBytes(unicodeString);
Example
Trang 33// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii,
// the use of GetCharCount/GetChars.
char[ ] asciiChars = new
char[ascii.GetCharCount(asciiBytes, 0,
asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length,
asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the
conversion.
Console.WriteLine( "Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}",
Trang 3412 1 String Overview
12.2 String Literals
12.3 Format Specifiers and Globalization
12.4 Working String from Outsite Sources
12 5 StringBuilder
12.6 Searching Strings with Regular Expression
Trang 35
12.5 StringBuilder
The System.Text.StringBuilder class can be used to
modify a string without creating a new object
• To decrease to store strings in the memory after performing
repeated modifications to a string.
• To boost performance when concatenating many strings
together in a loop
We can create a new instance of the StringBuilder
class by initializing a variable with one of the overloaded constructor methods
StringBuilder MyStringBuilder = new
StringBuilder("Hello World!");
35
Trang 36Setting the Capacity and Length
The maximum number of characters that a StringBuilder can hold is called capacity of the object
Distinguish between Capacity and Length that a
StringBuider hold
StringBuilder MyStringBuilder = new
StringBuilder("Hello World!", 25);
Capacity of a StringBuider
Trang 37Setting the Capacity and Length
If the Capacity property is geater than the Length
property The Capacity property does not change
If otherwise, the Capacity property is automatically
changed to the same value as the Length property
We can use the read/write Capacity property to set the maximum length of your object
The EnsureCapacity method can be used to check the
capacity of the current StringBuilder
The Length property can also be viewed or set
MyStringBuilder.Capacity = 25;
37
Trang 38Modifying the StringBuilder String
The following table lists the methods to modify the contents of a
StringBuilder
StringBuilder.Append Appends information to the end of the current
StringBuilder
StringBuilder.Append
Format Replaces a format specifier passed in a string with formatted text StringBuilder.Insert Inserts a string or object into the specified
index of the current StringBuilder
StringBuilder.Remove Removes a specified number of characters
from the current StringBuilder.
StringBuilder.Replace Replaces a specified character at a specified
index
Trang 39Append
The Append method can be used to add text or a string
representation of an object to the end of a string represented by the
Trang 40StringBuilder MyStringBuilder = new
StringBuilder("Your total is ");
MyStringBuilder.AppendFormat("{0:C} ", MyInt);
Console.WriteLine(MyStringBuilder);
Result: Your total is $25.00
Trang 41Insert
The Insert method adds a string or object to a specified position in the current StringBuilder.
StringBuilder MyStringBuilder = new
StringBuilder("Hello World!");
MyStringBuilder.Insert(6,"Beautiful ");
Console.WriteLine(MyStringBuilder);
Result: Hello Beautiful World!
41
Trang 42 The Remove method is used to remove a specified
number of characters from the current StringBuilder,
beginning at a specified zero-based index
StringBuilder MyStringBuilder = new
StringBuilder("Hello World!");
MyStringBuilder.Remove(5,7);
Console.WriteLine(MyStringBuilder);
Result: Hello
Trang 43StringBuilder MyStringBuilder = new
StringBuilder("Hello World!");
MyStringBuilder.Replace('!', '?');
Console.WriteLine(MyStringBuilder);
Result: Hello World?
43
Trang 44Regular Expression
Regular Expression enables to do the folowings:
• Creating, comparing, and modifying strings
• rapidly parsing large amounts of text and data to search for,
remove, and replace text patterns
Trang 45Roadmap
12 1 String Overview
12.2 String Literals
12.3 Format Specifiers and Globalization
12.4 Working String from Outsite Sources
12 5 StringBuilder
12.6 Searching Strings with Regular Expression
45
Trang 4612.6 Searching Strings with Regular
Expression.
The System.Text.RegularExpression.Regex class can
be used to search strings
These searches can range in complexity from very
simple to making full use of regular expressions
The static method Regex.IsMatch performs the search given the string to search and a string that contains the search pattern
For example:
Trang 48foreach (string s in numbers)
// Keep the console window open in debug mode.
System.Console.WriteLine( "Press any key to exit.");
System.Console.ReadKey();
} //end of main
}//end of TestRegularExpressionValidation
Trang 50Replacing Text with Regex
Using NET regular expressions via the Regex.Replace method overloads, we can replace text
For example:
Trang 51using System.Text.RegularExpressions;
public class EntryPoint {
static void Main(string[] args)
Trang 52 In this chapter, we have known:
• The string-handling capabilities of the NET Framework and C#
• Why the string is included in the base class library and why the CLR designers chose to annex it into the set of built-in types.
• How common string usage is Furthermore, the library provides
a thorough implementation of cultural-specific patterns, via CultureInfo.
• Can create our own cultures easily using the
CultureAndRegionInfoBuilder class.
• Gaven a brief tour of the regular-expression capabilities of
the NET Framework, even though a full treatment of the regularexpression language is outside the scope of this book
• The string and text-handling facilities built into the CLR,
the NET Framework, and the C# language are welldesigned and easy to use.