2
0
mirror of synced 2025-02-22 22:38:18 +00:00
mobile/bind/seq/string.go
Elias Naur ba0a725146 mobile/bind: avoid intermediate []rune copy converting Java string to Go
Converting a Go string to a string suitable use a specialized function,
UTF16Encode, that can encode the string directly to a malloc'ed buffer. That
way, only two copies are made when strings are passed from Go to Java; once
for UTF-8 to UTF-16 encoding and once for the creation of the Java String.

This CL implements the same optimization in the other direction, with a
UTF-16 to UTF-8 decoder implemented in C. Unfortunately, while calling into a
Go decoder also saves the extra copy, the Cgo overhead makes the calls much
slower for short strings.

To alleviate the risk of introducing decoding bugs, I've added the tests from
the encoding/utf16 package to SeqTest.

As a sideeffect, both Java and ObjC now always copy strings, regardless of
the argument mode. The cpy argument can therefore be removed from the string
conversion functions. Furthermore, the modeRetained and modeReturned modes
can be collapsed into just one.

While we're here, delete a leftover function from seq/strings.go that
wasn't removed when the old seq buffers went away.

Benchmarks, as compared with benchstat over 5 runs:

name                          old time/op  new time/op  delta
JavaStringShort               11.4µs ±13%  11.6µs ± 4%     ~     (p=0.859 n=10+5)
JavaStringShortDirect         19.5µs ± 9%  20.3µs ± 2%   +3.68%   (p=0.019 n=9+5)
JavaStringLong                 103µs ± 8%    24µs ± 4%  -77.13%   (p=0.001 n=9+5)
JavaStringLongDirect           113µs ± 9%    32µs ± 7%  -71.63%   (p=0.001 n=9+5)
JavaStringShortUnicode        11.1µs ±16%  10.7µs ± 5%     ~      (p=0.190 n=9+5)
JavaStringShortUnicodeDirect  19.6µs ± 7%  20.2µs ± 1%   +2.78%   (p=0.029 n=9+5)
JavaStringLongUnicode         97.1µs ± 9%  28.0µs ± 5%  -71.17%   (p=0.001 n=9+5)
JavaStringLongUnicodeDirect    105µs ±10%    34µs ± 5%  -67.23%   (p=0.002 n=8+5)
JavaStringRetShort            14.2µs ± 2%  13.9µs ± 1%   -2.15%   (p=0.006 n=8+5)
JavaStringRetShortDirect      20.8µs ± 2%  20.4µs ± 2%     ~      (p=0.065 n=8+5)
JavaStringRetLong             42.2µs ± 9%  42.4µs ± 3%     ~      (p=0.190 n=9+5)
JavaStringRetLongDirect       51.2µs ±21%  50.8µs ± 8%     ~      (p=0.518 n=9+5)
GoStringShort                 23.4µs ± 7%  22.5µs ± 3%   -3.55%   (p=0.019 n=9+5)
GoStringLong                  51.9µs ± 9%  53.1µs ± 3%     ~      (p=0.240 n=9+5)
GoStringShortUnicode          24.2µs ± 6%  22.8µs ± 1%   -5.54%   (p=0.002 n=9+5)
GoStringLongUnicode           58.6µs ± 8%  57.6µs ± 3%     ~      (p=0.518 n=9+5)
GoStringRetShort              27.6µs ± 1%  23.2µs ± 2%  -15.87%   (p=0.003 n=7+5)
GoStringRetLong                129µs ±12%    33µs ± 2%  -74.03%  (p=0.001 n=10+5)

Change-Id: Icb9481981493ffca8defed9fb80a9433d6048937
Reviewed-on: https://go-review.googlesource.com/20250
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-03-05 10:02:05 +00:00

50 lines
1.2 KiB
Go

// Copyright 2014 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package seq
import "unicode/utf16"
// Based heavily on package unicode/utf16 from the Go standard library.
const (
replacementChar = '\uFFFD' // Unicode replacement character
maxRune = '\U0010FFFF' // Maximum valid Unicode code point.
)
const (
// 0xd800-0xdc00 encodes the high 10 bits of a pair.
// 0xdc00-0xe000 encodes the low 10 bits of a pair.
// the value is those 20 bits plus 0x10000.
surr1 = 0xd800
surr2 = 0xdc00
surr3 = 0xe000
surrSelf = 0x10000
)
// UTF16Encode utf16 encodes s into chars. It returns the resulting
// length in units of uint16. It is assumed that the chars slice
// has enough room for the encoded string.
func UTF16Encode(s string, chars []uint16) int {
n := 0
for _, v := range s {
switch {
case v < 0, surr1 <= v && v < surr3, v > maxRune:
v = replacementChar
fallthrough
case v < surrSelf:
chars[n] = uint16(v)
n += 1
default:
// surrogate pair, two uint16 values
r1, r2 := utf16.EncodeRune(v)
chars[n] = uint16(r1)
chars[n+1] = uint16(r2)
n += 2
}
}
return n
}