Explanation to Teaser 2



Note:
Skip the point 0 and point 1 in the discussion if you want, but do read the points 2, 3, 4 and 5.
They can blow your mind on byte datatype.

This is an explanation to the teaser 2 : Bite me, you damn bytes!!! , in case you haven’t read it.


Poor Eva is in a fix again.

This time the compiler complains:

possible loss of precision
required: byte
found:    int
a=a+b;
   ^


0. How to fix this compiler error?

Child’s play. Just change the culprit line to this:

a=(byte)(a+b);

And everything works fine.

But is that it? What has been our learning? Why did this happen?
Was the teaser just to remove the no-brainer compiler error?

Let’s dig in.


1. THE RULE!

This is interesting: a is a byte, b is a byte, and suddenly a+b is an int and has to be casted back to byte!!

Specifically, we fall prey to this beast: Binary Numeric Promotion.
Straight from the Java Language Specification :

Numeric promotion is applied to the operands of an arithmetic operator. When the arithmetic operator is one of the following binary operators(operating on two operands), then the promotion is called Binary Numeric Promotion:

the multiplicative operators *, / and %,

the addition and subtraction operators for numeric types + and -,

the numerical comparison operators <, <=, >, and >=,

the numerical equality operators == and !=,

the integer bitwise operators &, ^, and |,

and in certain cases, the conditional operator ? :

So, in our case the operands are variables a and b, and the arithmetic operator being the binary plus(+). But did I tell you what happens in a Numeric Promotion?

A Numeric promotion can use a widening primitive conversion. Widening primitive conversion is applied to convert either or both operands as specified by the following rules:

1. If either operand is of type double, the other is converted to double.

2. Otherwise, if either operand is of type float, the other is converted to float.

3. Otherwise, if either operand is of type long, the other is converted to long.

4. Otherwise, both operands are converted to type int.

Right there in the last line is the crux of our problem:
‘Otherwise, both operands are converted to type int.’

This is what happens, in our case:

In our System.out.println() the plus + sign is used with the two byte type variables, so it behaves as a binary operator. From the rules of 'Numeric promotion' : before any arithmetic operation, the byte type is converted(Widening primitive conversion) to int type. So, the variables a and b are both numerically promoted to type int, before the addition. Once the addition is made, the result is an int. And this int can not be stuffed into the byte type variable a again, for int use 4 bytes of memory and byte uses 1 byte of memory. Thus, an explicit cast is required to byte so that the result of addition is truncated to 1 byte, enough to fit in a byte type variable.

Story doesn’t end here. The million dollar question arises now.


2. Why bytes get promoted to int at all? Why can’t we perform addition on bytes directly?

To answer this question for myself, I had to dig deep into JVM specification.

Let’s take a look at this:

Table: Type support in the Java virtual machine instruction set.

opcode byte short int long float double char reference
Tipush bipush sipush
Tconst iconst lconst fconst dconst aconst
Tload iload lload fload dload aload
Tstore istore lstore fstore dstore astore
Tinc iinc
Taload baload saload iaload laload faload daload caload aaload
Tastore bastore sastore iastore lastore fastore dastore castore aastore
Tadd nbsp; iadd ladd fadd dadd
Tsub isub lsub fsub dsub
Tmul imul lmul fmul dmul
Tdiv idiv ldiv fdiv ddiv
Trem irem lrem frem drem
Tneg ineg lneg fneg dneg
Tshl ishl lshl
Tshr ishr lshr
Tushr iushr lushr
Tand iand land
Tor ior lor
Txor ixor lxor
i2T i2b i2s i2l i2f i2d
l2T l2i l2f l2d
f2T f2i f2l f2d
d2T d2i d2l d2f
Tcmp lcmp
Tcmpl fcmpl dcmpl
Tcmpg fcmpg dcmpg
if_TcmpOP if_icmpOP if_acmpOP
Treturn ireturn lreturn freturn dreturn areturn

Did you see that?
Compare the instruction set for bytes and ints.
Come on, see the table again.

The JVM Specification says:

Note that most instructions in the above table do not have forms for the integral types byte, char, and short. None have forms for the boolean type. Compilers encode loads of literal values of types byte and short using Java virtual machine instructions that sign-extend those values to values of type int at compile-time or runtime. Loads of literal values of types boolean and char are encoded using instructions that zero-extend the literal to a value of type int at compile-time or runtime. Likewise, loads from arrays of values of type boolean, byte, short, and char are encoded using Java virtual machine instructions that sign-extend or zero extend the values to values of type int. Thus, most operations on values of actual types boolean, byte, char, and short are correctly performed by instructions operating on values of computational type int.

So, what does this mean? This means: There are no instructions in the JVM to add two bytes(and the like operations). For this, help is taken from the integer instruction set. The JVM specification clearly says (Read this very carefully):

There is no direct support for integer arithmetic on values of the byte, short, and char types, or for values of the boolean type; those operations are handled by instructions operating on type int.

So, when there is no direct support for integer arithmetic on values of the ‘byte’ type in the jvm, BYTE HAS TO BE CONVERTED INTERNALLY TO INT!!

Okay, this was good. But a bigger query looms.


3. Why are there no instructions to perform arithmetic operations for the byte type in the JVM?

And the answer could again be found in the JVM spec:

The Java virtual machine provides the most direct support for data of type int. This is partly in anticipation of efficient implementations of the Java virtual machine's operand stacks and local variable arrays. It is also motivated by the frequency of int data in typical programs. Other integral types have less direct support. There are no byte, char, or short versions of the store, load, or add instructions, for instance.

I will explain it in two simple lines:

In a 32 bit system(say), a word is 4 bytes.

It’s faster for the underlying operating system to address a word at a time.

So, its faster to address an int(4 bytes) than a byte or a short.


4. If I use bytes and shorts in a loop, then?

It is evident from above 3 points, if you plan to use bytes or shorts to perform some looping, you won’t gain anything.

Consider this:

for (byte b=0; b<10; b++)

{ … }

Here in each iteration of the loop, the byte variable b is promoted to int.

This time I shout so that you understand:

Every time byte will get promoted to int during an arithmetic operation!!! Period.

And when you thought you knew everything about bytes, the billion-dollar-question pops up!


5. Why the hell does byte type exist? It gets promoted to int anyway!

There is a reason for everything in Java.

I said, bytes get promoted to ints during an operation. Let me re-frame it: bytes get promoted to ints ONLY during an operation.

There is a scneario where we do not perform operations on bytes. There is a scenario where we use byte solely foe the data they contain inside them.

Ever heard of byte array, when handling streams? Why not an int array?

Think.

Found this while surfing on the net, to make it more clear to you:

A byte array of 20 bytes is in fact only 20 bytes in memory. That is because the java bytecode only knows ints and longs as number types (so it must handle all numbers as either type of both, 4 bytes or 8 bytes), but it knows arrays with every possible number size (so short arrays are in fact two bytes per entry and byte arrays are in fact one byte per entry).

Don’t forget that a byte array also has the normal overheads of being an object, and the length, but when size of the array is large then this overhead is nothing.

Phew! Enough with the bytes! :|


About these ads

2 comments

  1. Tarush Kumar Nigam · · Reply

    Good Work Abhishek. I learned lot about bytes.
    But its still unclear to me that if
    File A contains 20 bytes of data.
    File B contains 30 bytes of data. If file B’s contents are appended to file A’s, then what is the final size of file A?
    The same question that you asked while starting the complete conversation.

    1. obviously, it would be 50 bytes, as far as the idiotic logic goes.
      the post was to tell the problem when adding ‘byte’ types in java.
      the background situation i created, was just to add some fun to it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: