Open access research
publication repository
publication repository
Keiser, John, & Lemire, Daniel (2021). Validating UTF-8 In Less Than One Instruction Per Byte. Software: Practice and Experience, 51 (5), 950-964. https://doi.org/10.1002/spe.2920
File(s) available for this item:|
PDF
- 2010.03090.pdf
Content : Submitted Version License : Creative Commons Attribution. |
|
| Item Type: | Journal Articles |
|---|---|
| Refereed: | Yes |
| Status: | Published |
| Abstract: | The majority of text is stored in UTF-8, which must be validated on ingestion. We present the lookup algorithm, which outperforms UTF-8 validation routines used in many libraries and languages by more than 10 times using commonly available SIMD instructions. To ensure reproducibility, our work is freely available as open source software. |
| Official URL: | https://onlinelibrary.wiley.com/doi/abs/10.1002/sp... |
| Depositor: | Lemire, Daniel |
| Owner / Manager: | Daniel Lemire |
| Deposited: | 15 Oct 2020 13:53 |
| Last Modified: | 06 Apr 2021 13:29 |
|
RÉVISER |