LogoTeluq
Français
Logo
Open access research
publication repository

Transcoding Billions of Unicode Characters per Second with SIMD Instructions [r-libre/2400]

Lemire, Daniel, & Muła, Wojciech (2022). Transcoding Billions of Unicode Characters per Second with SIMD Instructions. Software: Practice and Experience, 52 (2).

File(s) available for this item:
[img]  PDF - Transcoding Billions of Unicode Characters per Second with SIMD Instructions.pdf
Content : Submitted Version
License : Creative Commons - Public Domain Dedication.
 
Item Type: Journal Articles
Refereed: Yes
Status: Published
Abstract: In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art disks and networks. These transcoding functions make little use of the single-instruction-multiple-data (SIMD) instructions available on commodity processors. By designing transcoding algorithms for SIMD instructions, we multiply the speed of transcoding on current systems (x64 and ARM). To ensure reproducibility, we make our software freely available as an open source library.
Depositor: Lemire, Daniel
Owner / Manager: Daniel Lemire
Deposited: 22 Sep 2021 16:00
Last Modified: 04 Feb 2022 14:49

Actions (login required)

RÉVISER RÉVISER