KDevelop code hightlight misplaced issue


In my device, kdevelop couldn’t highlight the source code correctly, if it’s a bug or some other problems?

1 Like

Every Chinese character shifts the alignment of two English characters. It’s must be a bug some models couldn’t deal with such CJK characters.

1 Like

Hi,

Could you please report this on https://bugs.kde.org with an example file?

1 Like

I have report this bug. https://bugs.kde.org/show_bug.cgi?id=518035
And I know the Bug 453742 have report such issue 4 years ago.
So how kdevelop analyses code such as clang-ast? Personally, I think if I know the specific position of it where handle such logic and the position transmitting to front Qt, the root of the problem should be exposed.

1 Like

Not only C++ source file, and if UChain part should be blame?

1 Like

KDevelop uses libclang, an example of how the ranges get translated is here . This builder.cpp file is the root of the C++ parser.

At first guess I see 3 possibilities:

  • clang reports the range wrong
  • the clang-KTextEditor translation is wrong
  • There’s a bug in KDevelop’s KTextEditor usage.

The problem is that our editor (ktexteditor) operates in utf-16 and clang/parsing operates in utf-8. we will need to find a way to quickly translate from one to another to create the highlight ranges

so it’s possibility 3.

KTexteditor uses UTF-16 such as QString methods, and my analyses listed here,
UTF-8 1 byte character ==> 0 misplaced ─┐
UTF-8 2 byte character ==> 1 misplaced ─┼─ UTF-16 1byte character
UTF-8 3 byte character ==> 2 misplaced ─┘
UTF-8 4 byte character ==> 2 misplaced ======> UTF-16 2 byte character
I think KDevelop may incorrectly use UTF-8 offset directly on these UTF-16 shown characters. It’s just my guess. In theory, it could explain the problem, but the opinion might be wrong, because I’m tring to confim it with my low speed to understand these code, could you give me some suggestions?

I have kdevelop/kdevplatform/language/highlighting/codehighlighting.cpp use UChain’s data reported the location and uses transformed KTexteditor colums and lines, so maybe it’s the rooted problem.

My guess (haven’t investigated it) is that the range is off from the transformation from CXSourceRange to RangeInRevision.

It would be interesting to know if it also happens with other languages, (Python is well supported), that would give a better idea where to look.

In python, the issue alsao existed. It should be Uchain or someting else… I mean no matter what moulde calculate width of the utf-8 files, clang, python Uchain module or the main UChain. In my reasoning think, they passed an offset cacluted by source utf-8 file to some related module such as hightling, but kdevelop uses Ktexteditor’s lines and cloums based utf-16 to gain these offsets, it leads to the problem. And my find of above file’s logic could comfim it.
Addition: In the highlight logic file, I just observe the use of UChain. I am analysing it, may be I should make the module return utf-16 offset directly?

It could be that kdevelop somewhere uses QStrings as QByteArrays, which indeed don’t interpret the characters. I just tried this example code:

    QString s("Normal中");
    qDebug() << s.size()  << s.toUtf8().size();

which shows pretty much the difference you’re experiencing in kdevelop.

edit: A quick grep on the codebase landed me here as another hunch from where this issue could originate. And there’s a matching TODO that sounds related.