Inline HTML in markdown gets transformed

bitstein · June 19, 2024, 9:09pm

I have markdown files that have inline HTML that include segments that need to be translated, so they cannot be excluded. However, when Crowdin makes the output, it transforms the HTML into a different structure that doesn’t meet our code standards.

For example, an English input:

<figure>
  <blockquote>
    <p>It might make sense just to get some in case it catches on. If enough people think the same way, that becomes a self fulfilling prophecy.</p>
  </blockquote>
  <figcaption><a href="/satoshi/emails/cryptography/17/">Satoshi Nakamoto</a>, 1/17/2009</figcaption>
</figure>

Becomes this Korean output:

<figure>
  <blockquote>
    <p>널리 사용될 때를 대비해 비트코인을 조금 갖고 있는 것도 좋습니다. 충분히 많은 사람이 같은 생각을 하게 된다면, 그것은 자기 실현적 예언이 될 것입니다.</p>
  </blockquote>  <figcaption><a href="/satoshi/emails/cryptography/17/">사토시 나카모토</a>, 2009년 1월 17일</figcaption>
</figure>

What can I do to make sure the HTML whitespace remains as is?

Thank you.

Dima · June 20, 2024, 12:32am

Hi,

To maintain the original whitespace and structure, you may need to adjust the parser settings for markdown files in Crowdin. This will help ensure that the HTML is not altered during the translation process.

You can check those:

bitstein · June 20, 2024, 3:47pm

Thank you. The link for the stable release here no longer works: Ratel - Okapi Framework

Is this software still in development? Is there another good resources for editing and testing SRX rules?

Natalia · June 20, 2024, 4:34pm

Hi,

You may generate and test SRX rules with the next app:

bitstein · June 20, 2024, 9:20pm

Thank you. Do you have a recommendation of how to learn how to write these rules?

In this case, I want to have as segments just the content within the

tags and the tags.

Dima · June 20, 2024, 9:35pm

You can use this application to test rules:

You can define your own segmentation rules for each source file individually using the SRX 2.0 standard. You might consider using some AI chat tools for this task.