The default sort attribute in Dolphin for Chinese is pinyin (the romanization of chinese characters)
but this makes Chinese numbers not in their right position which is: (from top to bottom: Chinese, pinyin, Arabic numbers)
一、二、三、四、五、六、七、八、九、十
yi er san si wu liu qi ba jiu shi
1 2 3 4 5 6 7 8 9 10
in alphabetical order it will be
八、二、六、九、七、三、四、十、五、一、
ba er liu jiu qi san si shi wu yi
8 2 6 9 7 3 4 10 5 1
This made it frustrating when there is lots of files named in Chinese numbers like 第一章(chapter 1)
jinliu
December 31, 2023, 12:21pm
2
I guess Dolphin does the sorting with QCollator from Qt, which in turn uses ICU. So you should change ICU for that.
And part of the problem is that, "一“ (U+4E00 CJK UNIFIED IDEOGRAPH-4E00: 一 – Unicode – Codepoints ) in unicode is not classified as a digit (category Nd), albeit having a numeric value of 1. So probably ICU doesn’t see it as a number when sorting.
1 Like
jinliu
January 1, 2024, 3:15pm
3
You probably need to change this function to set DIGIT_TAG on all characters that have a numeric value:
UChar32 jamo = jamoCpFromIndex(j);
jamoCE32s[j] = copyFromBaseCE32(jamo, base->getCE32(jamo),
/*withContext=*/ true, errorCode);
}
}
}
return anyJamoAssigned && U_SUCCESS(errorCode);
}
void
CollationDataBuilder::setDigitTags(UErrorCode &errorCode) {
UnicodeSet digits(UNICODE_STRING_SIMPLE("[:Nd:]"), errorCode);
if(U_FAILURE(errorCode)) { return; }
UnicodeSetIterator iter(digits);
while(iter.next()) {
U_ASSERT(!iter.isString());
UChar32 c = iter.getCodepoint();
uint32_t ce32 = utrie2_get32(trie, c);
if(ce32 != Collation::FALLBACK_CE32 && ce32 != Collation::UNASSIGNED_CE32) {
int32_t index = addCE32(ce32, errorCode);
if(U_FAILURE(errorCode)) { return; }
But I’m not sure if ICU would accept such a change.