LogoTeluq
Français
Logo
Open access research
publication repository

Transcoding Unicode Characters with AVX-512 Instructions [r-libre/3026]

Clausecker, Robert, & Lemire, Daniel (2023). Transcoding Unicode Characters with AVX-512 Instructions. Software: Practice and Experience, 53 (12). https://doi.org/10.1002/spe.3261

File(s) available for this item:
[img]  PDF - simdutfavx512.pdf
Content : Submitted Version
License : Creative Commons Attribution.
 
Item Type: Journal Articles
Refereed: Yes
Status: Published
Abstract: Intel includes in its recent processors a powerful set of instructions capable of processing 512-bit registers with a single instruction (AVX-512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF-8 and UTF-16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF-8 to UTF-16 at more than 5 GiB/s using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open source library. Our library is part of the popular Node.js JavaScript runtime.
Official URL: https://onlinelibrary.wiley.com/doi/10.1002/spe.32...
Depositor: Lemire, Daniel
Owner / Manager: Daniel Lemire
Deposited: 07 Aug 2023 20:26
Last Modified: 11 Nov 2023 10:12

Actions (login required)

RÉVISER RÉVISER