update unicode section in toc.md

This commit is contained in:
jangko 2021-11-01 11:26:26 +07:00
parent c1f75b87f3
commit 4e0b7cbced
No known key found for this signature in database
GPG Key ID: 31702AE10541E6B9
1 changed files with 6 additions and 5 deletions

View File

@ -89,13 +89,14 @@ can bring down the service, both lexer and parser are configurable to mitigate t
- Input string:
- Accepted encoding for input string are UTF-8.
- Escaped unicode in quoted string take the form of UTF-16 BE:
- Fixed 4 digit hex: e.g. `\u000A`
- variable length: `\u{1F4A9}` with range (>= 0x0000 and <= 0xD7FF or >= 0xE000 and <= 0x10FFFF)
- Fixed length notation using 4 digit hex: e.g. `\u000A`.
- Variable length notation using curly braces `\u{1F4A9}` with range (0x0000..0xD7FF, 0xE000..0x10FFFF).
- Escape sequences are only meaningful within a single-quoted string.
In multiline string, unicode char must be encoded using UTF-8.
- SurrogatePair: "\uD83D\uDCA9" is equal to "\u{1F4A9}"
- In multiline string, unicode char must be encoded using UTF-8.
- Surrogate pair using fixed length notation "\uD83D\uDCA9" is equal to variable length notation "\u{1F4A9}".
- Orphaned surrogate will result in error.
- Output string:
- Output string subject to output serialization format specification.
- For example, output using json as serialization format will result in UTF-8 encoded string.
- Or if the escape flag is set, it will use UTF-16 BE 4 digit hex fixed length similar to GraphQL escape sequence.
- If the escape flag is set, it will use UTF-16 BE 4 digit hex fixed length similar to GraphQL escape sequence.