Duplication due to faulty Auto-Split in Simplified Chinese

When updating the English source in the GitHub repo, Simplified Chinese translations in Crowdin sometimes get duplicated. This may be related to how it splits strings based on punctuation, which differs between the two languages Simplified Chinese.

It seems the auto-splitting logic may not account for Chinese-specific punctuation, or maybe it splits only at punctuation followed by spaces.

Example

Source String Current Translation Incoming Translation
This page contains a reference for all options present in the loom Gradle extension. 本页包含 loom Gradle 扩展中所有选项的参考。 本页包含 loom Gradle 扩展中所有选项的参考。
Please see the Fabric API DSL page for options related to Fabric API specific features. 请参阅 Fabric API DSL 页面,了解与 Fabric API 特定功能相关的选项。 本页包含 loom Gradle 扩展中所有选项的参考。 请参阅 Fabric API DSL 页面,了解与 Fabric API 特定功能相关的选项。

In the “Incoming Translation,” notice how the first sentence has been duplicated at the start of the second translation segment.

1 Like

Hello,

It seems that the auto-splitting logic might not be fully compatible with Chinese-specific punctuation, which could lead to the duplication you’re experiencing.

You are welcome to create a specific parsing rule for Chinese to change the way the system recognizes the file: