The curious case of strings in C#

Tags:

In the last article I tried to describe the difference between value types and reference types in C# and I received some interesting feedback from colleagues and former colleagues as well. And based on discussions I had, it makes a lot of sense to talk a little bit about strings. Strings are very strange, because they are a reference type but behave somehow similar to value types. So, there are some common misconceptions and misunderstanding when it comes to strings that I will try to clarify in this article.

Note: I have also created a video on this topic. If you find it easier to follow the video, then here it is:

I would like to start with a really important fact about strings: “String” is a class in C# and therefore a reference type! No matter what string behavior might suggest us, this fact still remains valid and there is no exception to it. So the value of a string variable is always a reference to a place in memory where the string resides. Now, trying to individuate the root cause of the misunderstandings about strings that I often hear, I think that the way we declare and initialize strings might be misleading. Further, strings are immutable and this is possibly another source of misconceptions. Let’s consider them one by one.

String declaration and initialization

Remember, string is a class in C#. However, when we declare variables of a class type and initialize them, we would use a code similar to this one:


StringBuilder sb = new StringBuilder();

However, when we declare and initialize a string we use so-called string literals:


string name = "Dan";

And when we declare and initialize an integer we do it like this:


int number = 5;

So at this point many less experienced developers would think that strings and integers might be similar, since strings are declared and initialized in a similar way integers are. And this is of course wrong. We could create easily custom classes that are declared and initialized in a similar way and they are still classes, hence reference types. For instance:


public class EmailAddress
{
private readonly string myEmail;
public EmailAddress( string email)
{
myEmail = email;

}

public static implicit operator EmailAddress(string email)
{
return new EmailAddress(email);
}

public static implicit operator string(EmailAddress emailAddress)
{
EmailAddress eAddress = new EmailAddress(emailAddress.myEmail);
return new string(eAddress.myEmail.ToCharArray());

}

}

And if we put everything in a console app and run it, it will return the email address:


EmailAddress ea = "something@something.com";
Console.WriteLine(ea);

Console.ReadLine();

Still, EmailAddress is a class and the value of “ea” is a reference to the place in memory where “something@something.com” lives.  I don’t want to say here that the “string” class is implemented exactly like this, but I just want to underline that classes and structs could have similar behavior when it comes to declaration and initialization, but they still remain reference types and value types.

Strings are immutable

This can also generate confusion, but strings are readonly and therefore immutable. This means that the string itself cannot change. So how is this possible?


string name = "Dan";
name += " Patrascu";
Console.WriteLine(name);

//Output: Dan Patrascu

Well it’s possible because under the hood what happens here is that the fist time we declare and initialize “name” it gets as value a reference to the place in memory where “Dan” lives. However, when we add ” Patrascu”, a totally new string is created that holds “Dan Patrascu”, that lives in a different place in memory and the value of “name” (which is a reference) is updated to point to the place where the new string lives. We need to not here that every time we perform an operation on a string (or several strings) the result is always a totally new string! And therefore the value of the variable (which is a reference) is updated. In fact, we can maybe better see this here:


string str1 = "Hello";
Console.WriteLine(str1);
string str2 = str1;
Console.WriteLine(str2);
str2 = "world";
Console.WriteLine(str2);

What will be printed to the console in the last Console.WriteLine(str2)? “Helloworld” or “world”? If needed, take few seconds and rethink this in the light of what we have explained previously!

If your answer is still “Helloworld” it means that I did a terrible job with this article 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *