Class Utf8Safe


  • public final class Utf8Safe
    extends Utf8
    A set of low-level, high-performance static utility methods related to the UTF-8 character encoding. This class has no dependencies outside of the core JDK libraries.

    There are several variants of UTF-8. The one implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1, which mandates the rejection of "overlong" byte sequences as well as rejection of 3-byte surrogate codepoint byte sequences. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate codepoint byte sequences.

    The byte sequences considered valid by this class are exactly those that can be roundtrip converted to Strings and back to bytes using the UTF-8 charset, without loss:

     
     Arrays.equals(bytes, new String(bytes, Internal.UTF_8).getBytes(Internal.UTF_8))
     

    See the Unicode Standard,
    Table 3-6. UTF-8 Bit Distribution,
    Table 3-7. Well Formed UTF-8 Byte Sequences.

    • Constructor Summary

      Constructors 
      Constructor Description
      Utf8Safe()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String decodeUtf8​(java.nio.ByteBuffer buffer, int offset, int length)
      Decodes the given UTF-8 portion of the ByteBuffer into a String.
      int encodedLength​(java.lang.CharSequence in)
      Returns the number of bytes in the UTF-8-encoded form of sequence.
      void encodeUtf8​(java.lang.CharSequence in, java.nio.ByteBuffer out)
      Encodes the given characters to the target ByteBuffer using UTF-8 encoding.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Utf8Safe

        public Utf8Safe()
    • Method Detail

      • encodedLength

        public int encodedLength​(java.lang.CharSequence in)
        Description copied from class: Utf8
        Returns the number of bytes in the UTF-8-encoded form of sequence. For a string, this method is equivalent to string.getBytes(UTF_8).length, but is more efficient in both time and space.
        Specified by:
        encodedLength in class Utf8
      • decodeUtf8

        public java.lang.String decodeUtf8​(java.nio.ByteBuffer buffer,
                                           int offset,
                                           int length)
                                    throws java.lang.IllegalArgumentException
        Decodes the given UTF-8 portion of the ByteBuffer into a String.
        Specified by:
        decodeUtf8 in class Utf8
        Throws:
        java.lang.IllegalArgumentException - if the input is not valid UTF-8.
      • encodeUtf8

        public void encodeUtf8​(java.lang.CharSequence in,
                               java.nio.ByteBuffer out)
        Encodes the given characters to the target ByteBuffer using UTF-8 encoding.

        Selects an optimal algorithm based on the type of ByteBuffer (i.e. heap or direct) and the capabilities of the platform.

        Specified by:
        encodeUtf8 in class Utf8
        Parameters:
        in - the source string to be encoded
        out - the target buffer to receive the encoded string.