Lost Website

You Are Here

Bits on info on Java strings

with one comment

Update: I don’t keep IRC logs and thus cited the wrong guy. Sorry Vince.

On this blog entry I will take on an assertion stated by systemfault. He declared on the #programmeur IRC channel on Freenode that:

<systemfault> String foo = new  String("lol"); est pareil que String foo = "lol";
<systemfault> Quand le compilo voit: String foo = "lol";
              quand il compile, il va vraiment faire String foo = new String("lol");

Which in english means that:

String s = "hello";

is syntaxic sugar for:

String s = new String("hello");

Afterward, on the same subject, this behavior was regarded as a lack of consistency was deemed a fault of the Java language and the subject was closed. If I was a language lawyer, I would bore people to death by including the proper reference in the Java Language Specification. Since I’m more practically minded and that I have some experience with Java bytecode, I will dig and explain a bit of Java internal you my reader (or more optimistically, my readers).

An Elegant Proof

Vince is right that there is some syntactic sugar around Java string, but the example he gave isn’t correct. He failed to see consider the fact that, in Java, string literals are in fact first class objects.

Since This associates the reference s to the "hello" String object,

String s = "hello";

The next snippet creates a new reference s to a String object which is copied from the "hello" String object. It creates another object having the exact same content as the object "hello". Since Java strings are immutable, it’s really not that useful to make to duplicate references to the same string content this way.

String s = new String("hello");

That behavior is summarized by the following code snippet. If you run this code in your Java virtual machine, you will see that all conditions are satisfied.

    String s1 = "hello";
    String s2 = new String("hello");
    if (s1.equals(s2)) System.out.println("String are equal");
    if (s1 == "hello") System.out.println("s1 refers to the \"hello\"; object.");
    if (s2 != "hello") System.out.println("s2 doesn't refer to the \"hello\" object");
    if (s1 != s2) System.out.println("String references are not equal.");

In line 3 we see that both String object have the same content. Line 4 checks that s1 is indeed a reference pointing to the "hello" object. Line 5 shows that even if the content of the string refered to by s2 string is the same as s1, it doesn’t point to the "hello" object. Line 6 further drives that same point home.

The Magic

In the next half of this article, I’ll try to explain a bit why string behave the way they do in Java.

All string objects in Java are stored in what is called the Runtime Constant Pool. This mystery object is compared in the specification to the concept of symbol table that is present in many programming language.

The constant pool includes many informations, including strings, string literals, numeral constants, and references to other class methods.

String literals are load from the constant pool at the moment the class is loaded by the class loader of the virtual machine. All direct access to those literals will refer to the same instance of the object from the pool.

The Java Virtual Machine Specification goes very far to make sure that all String objects loaded by the virtual machine are not duplicated in memory:

The Java programming language requires that identical string literals (that is, literals that contain the same sequence of characters) must refer to the same instance of class String. In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus,

(“a” + “b” + “c”).intern() == “abc”

must have the value true.

And indeed, the 2 conditions in the following program will get fired.

While the VM goes far trying to make sure strings are not duplicated in memory, the fact that new String("hello") creates another object should come as no surprise to a programmer. It’s really a case where DWIM prevails.

It’s also the same principle you can see behind the following code. If you can write a condition such as this "abc" == "abc", you will instinctively expects that the condition “a” + “b” + “c” == “abc” will be true either.

    if (("a" + "b" + "c") == "abc") System.out.println("yes it is");
    if (("a" + "b" + "c").intern() == "abc") System.out.println("yes it is");

It may make sense to somebody used to object-oriented programming to think that “a” + “b” + “c” should return a new string instance. Since Java, I think, sticks to the principle of least surprise, it would be a strange discrepency if the result of "a" + "b" + "c" would not be comparable to "abc" using ==.

All that work is the result of considering string literals as first class objects in the code. Things would be very much different if Java strings were defined as simple byte arrays like in many other languages. There is more high-calory sugar built around Java string that I might consider for my next blog post.

Written by fdgonthier

October 30th, 2009 at 8:00 pm

One Response to 'Bits on info on Java strings'

Subscribe to comments with RSS or TrackBack to 'Bits on info on Java strings'.

  1. [...] might have hinted about in in my previous post on the subject of strings in Java, yet I did not realize the significance of String.intern() method. The following code sample [...]

Leave a Reply