Fixes#566
* try to fix issue of some Chinese characters cannot be extracted
correctly (#566).
* format code to pass flake8 check.
* fix typo and refer to issue 593.
Co-authored-by: huan_cheng <huan_cheng@bestsign.cn>
Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>