Various issues after deleting and recreating Github Integration

This is our Crowdin project. We recently needed to delete our Github Integration and recreate it under another user (an ownership transfer feature might be nice). After recreating the integration with the same settings, we did see new commits to the l10n_main branch, and a PR was successfully made. But we are seeing some problems with synced translations on both sides.

On the Crowdin end, there now appears to be duplicated sources with different statistics. For example at https://crowdin.com/project/inaturalistweb/af there are multiple representations of /config/locales.en, one (presumably the original as the URL uses /38/af) lists 43k words, and the second (presumably the new one with a URL of /810/en-af) only lists 5k words.

Some strings have been modified (and seemingly approved) after recreating the sync. For example the key onboarding.youve_got_updates_panel_body_p2 from es-MX.yml was imported with different text than what is in that source file in our main repo. The UI suggests we added that text change and approved it, even though the Github integration settings are set to the default of not approving added translations, and the text used is not the source text - and interpolation variable of %{site_name} has been excluded somehow.

Some strings with multiple forms have been imported as separate strings (e.g. x_changes.zero), but others with multiple forms have had their forms preserved (e.g. x_comments.

On the Github commits end, the biggest issue is that some translation strings were just not included, thus deleted from their respective translation files, e.g. x_changes removed from most files.

Our context comments now have spaces restored, e.g. was was being synced as #Tool tip is now being synced properly as listed in the en.yml source # Tool Tip. This is a positive change, but curious the behavior changed by just recreating the Github Integration, and it generates a lot of modified lines in the PR.

Some indentation has been changed. The indentation used to be consistently two spaces, but on the latest sync many two-space indents were changed to four spaces. This is done seemingly inconsistently across locales, and within locales.

These are some of the most obvious issues we are seeing on first glance. We’d appreciate any help or feedback that will help us resolve these issues to prevent confusion on the side of our translators, and to address the data import errors, and synced data export errors.

Thank you for any help you can offer.

-Patrick

One other point to note: Most unfortunate is the loss of attribution for the new source - all translations came in attributed to our user. It is really important to us not to lose the ability to attribute the effort by our many contributors. Is there any way to have the source of new integration be seen as exactly the same source from the old integration? Everything about the integration is exactly the same (as far as I can tell).

Thanks!

Hi Patrick,

Regarding the duplicated sources with different statistics, this could be a result of the reintegration. The new integration may be treating the files as new entities. To resolve this, you may need to remove the old source entries if they are no longer needed:

For the issue with the key and the missing %{site_name} interpolation variable, please review the translation history for this string in Crowdin to identify any unexpected changes and revert them if necessary (3 bullets near the strings in the editor → view history).

The problem with multiple forms being imported as separate strings could be due to the way your source files interpret pluralization. You might need to adjust the source file syntax to ensure that plural forms are handled correctly:

As for the missing translation strings on GitHub, this could be a sync issue where the integration did not recognize some strings as translated.

You may need to manually check the affected files and ensure that the translations are present in Crowdin before syncing again. Perhaps run a TM pre-translation with auto-approval being turned on, to fulfill the project with translations.

The change in context comments and indentation is likely due to updates in the handling of file formatting. If the old file was uploaded a while ago, it has a different parser version that may recognize the file differently.

If you continue to experience issues or need further assistance, please don’t hesitate to reach out directly to support@crowdin.com so our team can review everything in detail from our side.

New integration = new files, with new internal IDs, etc. It’s expected that the system treats it as totally new files. The best workflow here would be next:

  1. You add branch 1 with new files inside
  2. Users translate it
  3. You activate the duplicate sharing option (import settings of the project)
  4. You add branch 2 with new files inside (let’s say 90% of the source content is identical, 10% is different/new)
  5. Duplicates are activated, so 90% of the file becomes translated (translation migrates from master strings), the remaining 10% is empty and needs to be localized
  6. You safely delete branch 1. Once it’s deleted, the strings in branch 2 (that 90% were duplicates) become master strings

And the same workflow circle when you upload branches 3, 4, 5, etc. This would work while you keep the same source field, but change only the key, i.e., upload new files with new keys, but keep the source text field the same.

Summarizing, in short, you need to activate duplicates and then upload the file as a new one, not just update the existing file.

As for the 2nd option if you miss this in the configuration of the project you can always run Translation memory pre-translation - this would apply translation to the strings as well (a must-have option in non-key formats, where the system can’t easily map the key relation).

It can be used with both Perfect (string + key) or a 100% match.

Thanks to help from @Dima we have since resolved all these issues. They both are rooted in our Crowdin project being several years old, and a few changes made since then that caused the new integration to behave differently. It may be that not many projects would see these same effects, but I’ll report what worked for us in case any older projects experience the same when recreating Github integrations.

One change we needed to make was to update the name of the root of our original source. It had been something like inaturalist/main, but new integration imports expect it to just be main. This difference resulted in a second, duplicate, source being created, resulting in some issues related to translation attribution, and lack of preserved comments.

I addressed this by renaming the new one to main (new) which allowed me to update the original to main so that new integrations would sync with the original source.

The second change we needed to make was to change the YAML parser for our project. When we initially created our integration, Crowdin used a different default YAML parser. This difference resulted in all of changes I reported to translations changing, multiple forms not being recognized, indentation changing, some strings being removed in the Github pull request, and context comments gaining a space.

To revert to this legacy parser, I needed to install the tool Configurable JSON&YAML. This doesn’t have an immediate effect. But after deleting the Github integration and recreating it, YAML translation parsing is now behaving like it used to.

With those two changes (renaming our original source root to main and installing the alternate YAML parser), our new Github integration appears to be working exactly as our old one was. Thanks again to @Dima for the excellent and responsive support!

@inaturalist

It’s always my pleasure to help! I hope you are enjoying every moment with Crowdin :slight_smile: