py/objstr: Add check for valid UTF-8 when making a str from bytes.

This patch adds a function utf8_check() to check for a valid UTF-8 encoded
string, and calls it when constructing a str from raw bytes.  The feature
is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and
is enabled if unicode is enabled.  It costs about 110 bytes on Thumb-2, 150
bytes on Xtensa and 170 bytes on x86-64.
This commit is contained in:
tll
2017-06-24 08:38:32 +08:00
committed by Damien George
parent 069fc48bf6
commit 68c28174d0
5 changed files with 58 additions and 0 deletions

View File

@@ -33,3 +33,17 @@ try:
int('\u0200')
except ValueError:
print('ValueError')
# test invalid UTF-8 string
try:
str(b'ab\xa1', 'utf8')
except UnicodeError:
print('UnicodeError')
try:
str(b'ab\xf8', 'utf8')
except UnicodeError:
print('UnicodeError')
try:
str(bytearray(b'ab\xc0a'), 'utf8')
except UnicodeError:
print('UnicodeError')