How can I join two sequential strings?

I have some strings that are being split by the segmentation rules due to abbreviations of a sentence in Title Case, for example:

Location, Alternate Graphics, Etc. Fields

Is being split in two, with Fields as a single string, as such there is no proper way to translate the sentence. How can I fix that without resorting to custom segmentation rules (which is definetely overkill here).

Hi @luizbgomide-ep2e , we have the Segmentation Rules Generator app, you’re welcome to use it and create your own segmentation rule for this case

I’ve already tried this tool, but I’m not sure how the Merge works mode works.

I created a rule using the Merge button on the string line and using the default values in the windows (one string in the Before Break, and the following string in the After Break text boxes), then I applied the rule the the specific file that I’m having issues (and that I’ve used to create the rule), and it still doesn’t work. The strings are still separated.

This is the rule created:

When translating the file after that custom rule has been applied, the strings are still separated
image

Curiously on the same list that I’m translating there are more strings in the same pattern that aren’t being split (even without custom rules), I assume this has something to do with whitespace in the HTML:

      <li><a href="bd25x28x.html">25X-28X: Edition, Imprint, Etc.
      Fields</a></li>

      <li><a href="bd3xx.html">3XX: Physical Description, Etc.
      Fields</a></li>

     ...

      <li><a href="bd84188x.html">841-88X: Holdings, Location,
      Alternate Graphics, Etc. Fields</a></li>

Dear @luizbgomide-ep2e , is there a chance that you could share your file with us, so we could check everything with the team?

Sure, here is a link to the problematic file:

Here is the rule created by the Segmentation Rule Tool:

      <rule break="no">
          <beforebreak>841-88X: Holdings, Location,
      Alternate Graphics, Etc\.</beforebreak>
          <afterbreak>Fields</afterbreak>
      </rule>

Hello @luizbgomide-ep2e!

Can you try adding the space after Etc\. in the rule? Devs say it should help.

Looking forward to your reply,

1 Like

Hello @DianaO
It worked!
Thanks