Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Java by (10.2k points)

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?

1 Answer

0 votes
by (46k points)

I'll fill in a bit more background.

The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.

Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.

This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.

When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)

For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.

We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.

The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.

So, this boils down to choosing pragmatically among several alternatives:

  1. We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
  2. we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
  3. we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).

We chose the last one.

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...