Toto je staršia verzia dokumentu!
Obsah
Unicode NFC normalisation for Rclone on macOS
Apple devices create all filenames in Unicode Decomposed Normalisation Form (NFD), while every other major OS uses Composed Normalisation Form (NFC). This makes you, as a Mac user, the bad guy, because it is you who is incompatible with the rest of the world.
In a nutshell, the problem is this: Whenever you create files with diacritics they will be copied to other devices with filenames stored as decomposed strings. This is a non‑standard for these OS'es, and you never know what problems that will cause.
This article presents my way of solving the problem by configuring Rclone to create all files in NFC (composed form) instead of NFD (decomposed form) – which is not at all that straightforward is it would seem.
Direct way to solve the problem
TL;DR: If you just want to solve the problem without actually delving into the problem and its technical details, simply follow the steps below. Otherwise, head over to Technical background.
Prereqs: download Rclone and macFUSE.1)
- Download the custom-made iconv library – this is actually Apple's own version of the library (which you have on your macOS), but with
iconvbase updated to the latest version and with support for surrogate pairs (this is actually important, because for example all emojis are surrogate pairs of characters). - Tweak this library yourself, because it is not fully compatible with Apple's version. Specifically, open the file
./include/iconv.h.inand comment out these 6 code blocks (I list the lines below already commented):- lines 69–71:
//#ifndef LIBICONV_PLUG //#define iconv_open libiconv_open //#endif
- lines 79–81:
//#ifndef LIBICONV_PLUG //#define iconv libiconv //#endif
- lines 85–87:
//#ifndef LIBICONV_PLUG //#define iconv_close libiconv_close //#endif
- line 129:
//#define iconv_open_into libiconv_open_into - line 134:
//#define iconvctl libiconvctl - line 214:
//#define iconvlist libiconvlist
- Compile and install the double‑tweaked library using the following commands:
make -f Makefile.utf8mac autogen ./configure --prefix=/usr/local make make install
These are stated here by author of the tweaked version and are similar to the building commands of the original GNU
libiconv. However, I added the–prefix=/usr/localparameter to theconfigurecommand (present in the GNU version, but not in the tweaked version), since I wasn't sure where the library would put itself without it. - Now you have two Apple‑style
libiconvbinaries on your system: the original (and old) Apple one in/usr/bin/iconvand the tweaked (and updated) one in/usr/local/bin/iconv.- First, test that the new binary itself works with the following conversions:
echo 🙂 | /usr/bin/iconv -f utf-8 -t utf-8-mac > � echo 🙂 | /usr/local/bin/iconv -f utf-8 -t utf-8-mac > 🙂
- Second, test that the new dynamic library has a proper symbol table. If everything went right, typing the following command:
$ nm -gU /usr/local/lib/libiconv.2.dylib
should output something similar to left column, and not to the right column (the order of lines might wary, the symbol names are important):
Correct (Apple‑style) symbol table
00000000000e3290 D __libiconv_version 0000000000002ce0 T _iconv 0000000000003430 T _iconv_canonicalize 0000000000002d10 T _iconv_close 00000000000016b0 T _iconv_open 0000000000002d20 T _iconv_open_into 0000000000003160 T _iconvctl 0000000000003270 T _iconvlist 0000000000015eb0 T _libiconv_set_relocation_prefix 0000000000015dd0 T _locale_charset
Incorrect (GNU‑style) symbol table
00000000000e3290 D __libiconv_version 0000000000002ce0 T _libiconv 0000000000003430 T _iconv_canonicalize 0000000000002d10 T _libiconv_close 00000000000016b0 T _libiconv_open 0000000000002d20 T _libiconv_open_into 0000000000003160 T _libiconvctl 0000000000003270 T _libiconvlist 0000000000015eb0 T _libiconv_set_relocation_prefix 0000000000015dd0 T _locale_charset
- Now, you need to update the Fuse library to search for the dynamically‑loaded
libiconvlibrary on the new place:- First, check that Fuse actually looks for the library under
/usr/lib/:$ otool -L /usr/local/lib/libfuse.2.dylib /usr/local/lib/libfuse.2.dylib: /usr/local/lib/libfuse.2.dylib (compatibility version 12.0.0, current version 12.9.0) /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) … [other libraries]
- Now, change the path by using
install_name_tool:sudo install_name_tool -change /usr/lib/libiconv.2.dylib /usr/local/lib/libiconv.2.dylib /usr/local/lib/libfuse.2.dylib
- Finally, check that the change is successful:
$ otool -L /usr/local/lib/libfuse.2.dylib /usr/local/lib/libfuse.2.dylib: /usr/local/lib/libfuse.2.dylib (compatibility version 12.0.0, current version 12.9.0) /usr/local/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) … [other libraries]
Technical background
Historically, the way of encoding filenames2) on Mac OS X started to differ from other operating systems when Apple switched from HFS to HFS+ file system in 1998. HFS+ uses Unicode to store filenames on disk and it enforces canonical decomposed form for them.
In contrast, virtually all other common filesystems treat filenames simply as sequences of bytes – this is true both for Linux distros (ext*, ReiserFS) and for Windows filesystems (FAT with Long filenames support, NTFS).
However, standard filename‑manipulation libraries on both Linux and Windows normalize filenames to NFC, so while it is technically possible to create decomposed filenames on them, this does not usually happen unless the user specifically wants to do that and bypasses standard OS routines. The same is true for all web technologies (CSS, HTML, XML), where W3C specifically recommends to always use decomposed form.
When Apple switched from HFS+ to APFS file system in its devices back in 2017, the situation could have changed, since APFS adopts the same approach as other file systems and does not normalize filenames. However, just as on Linux distros and Windows, standard filename‑manipulation libraries on macOS do normalize filenames – but to NFD, so exactly in opposition to these other OS's. So the current situation with Apple devices is that while the APFS file system itself is normalisation‑agnostic, the underlying macOS routines standardly create decomposed filenames.
Problem
through Rclone, they are copied to your clouds with filenames stored as decomposed strings. This creates three different problems:
- First, if other users on different OS'es are renaming the files you created (or renamed), they need to press Backspace twice when they want to remove a letter with diacritics (e.g. á or ü). Renaming a filename with a lot of diacritics, like XXX, can become pretty lengthy process. And note, this applies also to you when you are accessing these clouds from a web client (i.e. your browser).
- Second,
First (naïve) attempt to solve the problem: Rclone with iconv module
Rclone has a special ‑o switch which will forward its parameters to the underlying macFUSE/FUSE-T system providing the mounting functionality of the remote system.
This way, it is possible to order Fuse to load iconv module and have it automatically converting all filenames to NFC when they are moved to remote cloud. Thus, the straightforward way to solve the problem should be to use the following command when mounting the system:
$ rclone mount […] -o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC
And this actually works, but with some problems.
Problem: bugged Apple implementation of iconv
The problem is that macOS uses its own “Apple‑tweaked” implementation of iconv, which is (1) very old; (2) non‑standard; and (3) it cannot convert significant parts of Unicode characters – for example, emoji. All three of these problems will be crucial in our attempt to deal with the problem. Moreover, the library itself resides in /usr/bin/iconv, which is under SIP, so you cannot normally do anything with it, and the only way to update it is to actually update the whole macOS.
The first problem: the standard library versions Apple provides are almost always very obsolete.5) In the case of iconv, versions supplied with different macOS'es are these:
| macOS | iconv | |||
|---|---|---|---|---|
| version | release date | version (Apple) | version (library) | release date |
| macOS Sequoia 15 | 2024‑09‑16 | libiconv‑107 | FreeBSD libiconv 1.11[?] | 2009‑03‑03 |
| macOS Sonoma 14 | 2023‑09‑26 | libiconv‑102 | FreeBSD libiconv 1.11[?] | 2009‑03‑03 |
| macOS Ventura 13 | 2022‑10‑24 | libiconv‑64 | GNU libiconv 1.11 | 2006‑07‑19 |
| macOS Monterey 12 | 2021‑10‑25 | libiconv‑61 | GNU libiconv 1.11 | 2006‑07‑19 |
| macOS Big Sur 11 | 2020‑11‑17 | libiconv‑59 | GNU libiconv 1.11 | 2006‑07‑19 |
| macOS Catalina 10.15 | 2019‑10‑07 | libiconv‑59 | GNU libiconv 1.11 | 2006‑07‑19 |
| macOS Mojave 10.14 | 2018‑09‑24 | libiconv‑51.200.6 | GNU libiconv 1.11 | 2006‑07‑19 |
In a nutshell, despite what the internal Apple versioning says, all macOS'es still use libiconv 1.11 released back in 2006.
⚠️‑TODO‑⚠️
$ nm -gU /usr/lib/libiconv.2.dylib 00000000000f2700 D __libiconv_version 0000000000002360 T _iconv 000000000000267a T _iconv_canonicalize 0000000000002382 T _iconv_close 0000000000001049 T _iconv_open 000000000000238f T _iconvctl 0000000000002488 T _iconvlist 0000000000013ff8 T _libiconv_set_relocation_prefix |
$ nm -gU /usr/local/lib/libiconv.2.dylib 00000000000e3290 D __libiconv_version 0000000000003430 T _iconv_canonicalize 0000000000002ce0 T _libiconv 0000000000002d10 T _libiconv_close 00000000000016b0 T _libiconv_open 0000000000002d20 T _libiconv_open_into 0000000000015eb0 T _libiconv_set_relocation_prefix 0000000000003160 T _libiconvctl 0000000000003270 T _libiconvlist 0000000000015dd0 T _locale_charset |
$ nm -gU /usr/lib/libiconv.2.dylib
00000000000f2700 D __libiconv_version 0000000000002360 T _iconv 000000000000267a T _iconv_canonicalize 0000000000002382 T _iconv_close 0000000000001049 T _iconv_open 000000000000238f T _iconvctl 0000000000002488 T _iconvlist 0000000000013ff8 T _libiconv_set_relocation_prefix
$ nm -gU /usr/local/lib/libiconv.2.dylib 00000000000e3290 D __libiconv_version 0000000000003430 T _iconv_canonicalize 0000000000002ce0 T _libiconv 0000000000002d10 T _libiconv_close 00000000000016b0 T _libiconv_open 0000000000002d20 T _libiconv_open_into 0000000000015eb0 T _libiconv_set_relocation_prefix 0000000000003160 T _libiconvctl 0000000000003270 T _libiconvlist 0000000000015dd0 T _locale_charset
Solution: libiconv with UTF-8-MAC support
There is a patched libiconv library on GitHub which adds support for UTF‑8‑MAC encoding. Installing it allows you not only to convert between real UTF‑8 and UTF‑8‑MAC encodings
Testing whether the patched iconv works correctly
$ echo "test📖" | /usr/bin/iconv -f utf-8 -t utf-8-mac test� $ echo "test📖" | /usr/local/bin/iconv -f utf-8 -t utf-8-mac test📖
Further reading
Tools for manual conversion of filenames between NFC/NFD
Comments
file to mean any inode, whether it is a file or a directory.