Pages in topic:   [1 2] >
From Ms Word table to TMX file
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Aug 20, 2022

At FB in the mQ forum someone asked:

I have a bilingual .docx document. Is there any way to create TM from it without necessity to create a separate source file for every language?


Since he has closed the post for replies, I'm replying here, since this may be of interest for others too.

So we have:

1

And would like to get this, without any file splitting or conversion:

2

That shouldn't be too complicated, macro-wise, would it?


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:18
Member (2006)
English to Afrikaans
+ ...
@Hans Aug 20, 2022

You may be able to write a single macro that does everything, but I would not be able to do that. If I were to do this, some steps would be manual.

The very first thing to do, is to use find/replace to remove or convert any characters that can break the XML. For example, replace & with &amp;, or replace < with &lt;.

Then, you have to add a column to the left, and then populate the column with numbers starting from 1. I have a macro that I can't remembe
... See more
You may be able to write a single macro that does everything, but I would not be able to do that. If I were to do this, some steps would be manual.

The very first thing to do, is to use find/replace to remove or convert any characters that can break the XML. For example, replace & with &amp;, or replace < with &lt;.

Then, you have to add a column to the left, and then populate the column with numbers starting from 1. I have a macro that I can't remember where I got it from, that does that (you type "1" in the first cell and then run the macro, and it adds numbers to the cells below it):

Sub AddNumbersToTable()
Dim RowNum As Long
Dim ColNum As Long
Dim iStartNum As Integer
Dim J As Integer
If Selection.Information(wdWithInTable) Then
RowNum = Selection.Cells(1).RowIndex
ColNum = Selection.Cells(1).ColumnIndex
iStartNum = Val(Selection.Cells(1).Range.Text)
If iStartNum 0 Then
iStartNum = iStartNum + 1
For J = RowNum + 1 To ActiveDocument.Tables(1).Rows.Count
ActiveDocument.Tables(1).Cell(J, ColNum).Range.Text = iStartNum
iStartNum = iStartNum + 1
Next
Else
MsgBox "Cell doesn't contain a non-zero starting number."
Exit Sub
End If
Else
MsgBox "Not in table"
End If
End Sub

The next step is to convert the table to tabbed text. You can do that manually, but I use a macro for that:

Sub TablesConvert_to_tab()
For Each aTable In ActiveDocument.Tables
aTable.ConvertToText wdSeparateByTabs, True
Next aTable
End Sub

(this macro processes all tables in the file, though)

Then, you have to make sure that there is one blank line at the top of the text, and one blank line underneath the text.

like so

And then you just record a find/replace macro:

Sub atable2tmx()
'
' atable2tmx Macro
' Macro recorded 8/20/2022
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^l([0-9]@)^t"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^t"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.HomeKey Unit:=wdStory
Selection.Delete Unit:=wdCharacter, Count:=1
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^l"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.HomeKey Unit:=wdStory
Selection.TypeText Text:="###"
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "###"
.Replacement.Text = _
"^p^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.TypeText Text:="###"
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "###"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^l"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

You can also create a macro that calls other macros, so you can automate everything in separate macros and then call them all from a single macro that you run once.
Collapse


 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 19:18
English to French
+ ...
Heartsome Aug 20, 2022

I've done this in the past with Heartsome TMX Editor. Bilingual table in words to TMX is a supported functionality just requiring a few clicks. I think you need to have the language initials at the top of the columns, but to confirm the formatting just export a TMX to Word to see how it looks (or use it as a template).

Stepan Konev
Jorge Payan
Michael Beijer
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:18
English to Russian
Language codes Aug 20, 2022

Philippe Locquet wrote:
I've done this in the past with Heartsome TMX Editor. Bilingual table in words to TMX is a supported functionality just requiring a few clicks. I think you need to have the language initials at the top of the columns, but to confirm the formatting just export a TMX to Word to see how it looks (or use it as a template).
You are absolutely right. All you need is just a 2-column table with a pair of language codes at the top of it (headers) and the remaining part of the table below (segments).


Philippe Locquet
 
Soonthon LUPKITARO(Ph.D.)
Soonthon LUPKITARO(Ph.D.)  Identity Verified
Thailand
Local time: 01:18
English to Thai
+ ...
Excel Bilingual file in Trados Aug 21, 2022

My easiest way is copying and pasting the Word texts into an Excel file. In Trados I set Excel file type as bilingual: first column is source and second column as target text. By setting the file type to automatically verify the translation as "translated" in Trados, the bilingual Trados file can be exported through any TM such as TMX format.
This procedure is quick and transpatent.

Regards,
Soonthon Lupkitaro


Ines Radionovas-Lagoutte, PhD
Grigori Gazarian
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:18
Member (2006)
English to Afrikaans
+ ...
What Aug 21, 2022

Hans Lenting wrote:
At FB in the mQ forum someone asked:
I have a bilingual .docx document. Is there any way to create TM from it without necessity to create a separate source file for every language?

What does he mean by "separate source file"?


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:18
Member (2006)
English to Afrikaans
+ ...
Heartsome Aug 21, 2022

Philippe Locquet wrote:
I've done this in the past with Heartsome TMX Editor.

Yes, although Heartsome doesn't include TU IDs:

heartsome


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:18
English to Russian
TU IDs Aug 21, 2022

Samuel Murray wrote:
Yes, although Heartsome doesn't include TU IDs
Why do you need TU IDs? When you import a tmx file into a CAT tool, all the entries get arranged accordingly and obtain their IDs and other attributes. Or probably I misunderstand something...


Hans Lenting
Jorge Payan
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Do not split Aug 21, 2022

Samuel Murray wrote:

What does he mean by "separate source file"?


As far as I know, he didn't want to create 2 files: one with the left column, one with the right column.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Not necessary Aug 21, 2022

Samuel Murray wrote:

Yes, although Heartsome doesn't include TU IDs:


And I don't think these are necessary in this context.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Very thorough Aug 21, 2022

Screen Shot 2022-08-21 at 14.30.49

Your approach is very thorough. For instance: I hadn't thought of conversion of these ampersands.

My initial thought was: it would be nice to have a macro to use when you're reading/proofreading an Ms Word document that contains a bilingual table with two columns and you want to create a simple TMX file from that table. Perhaps retaining the bold and italic formatting, nothing fancy.

Language codes can either be asked via a prompt or set to e.g. us_US and de_DE to change them later.

So, the macro should:

  • Copy the whole table.
  • Replace bold and italic formatting and ampersands with markup.
  • Replace TAB characters and NEWLINE characters with the correct strings to create valid TUs.
  • Prepend the clipboard's context with the first lines of a TMX file (up to the BODY markup)
  • Append the /TMX closing markup to the clipboard's context.
  • Write the clipboard's context to a file with the extension '.tmx'.



[Edited at 2022-08-21 12:41 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:18
Member (2006)
English to Afrikaans
+ ...
@Hans Aug 21, 2022

Hans Lenting wrote:
  • Replace bold and italic formatting and ampersands with markup.

  • Before you deal with formatting, you have to ask yourself what kind of a TMX file your target system will accept. If it's a fairly modern system, it should be able to handle standard TMX formatting tags, but it may be that your CAT tool has specific additional requirements, e.g. that the formatting tags must look a certain way.

    For example:
    cat sat mat3

    The sentence "The cat sat on the mat." would have to end up like this:
    <seg>The <bpt type="b">{{b}}</bpt>cat<ept> {{/b}}</ept> sat on the <bpt type="i">{{i}} </bpt><bpt type="i">{{u}}</bpt>cat<ept> {{/u}}</ept><ept>{{/i}}</ept>.</seg>

    The problem is, it's easy to replace a bold character with the same character plus markup, but it's not easy to replace a set of bold characters with the same character plus markup. And I don't think any CAT tool can automatically convert this:
    <b>t</b><b>h</b><b>i</b><b>s</b>
    into this:
    <b>this</b>

    Can you think of a Find syntax in Word that would find a piece of bold text and select the entire bold text? I can't. This is because Word regex is non-greedy, so you can't tell it to select an entire piece of bold text.

    That said (just thinking out loud), you could tell it to replace this:
    </b><b>
    with nothing.

    [Edited at 2022-08-21 13:58 GMT]


     
    Hans Lenting
    Hans Lenting
    Netherlands
    Member (2006)
    German to Dutch
    TOPIC STARTER
    Solution Aug 21, 2022

    Sub TabletoTMX()

    Dim rngTemp As Range
    Dim tableTemp As Table

    Options.AutoFormatReplaceQuotes = False
    Selection.Tables(1).Select
    Selection.Copy
    Documents.Add
    Selection.Paste


    Set tableTemp = ActiveDocument.Tables(1)
    Set rngTemp = _
    tableTemp.ConvertToText(Separator:=wdSeparateByTabs)
    Selection.Delete

    Selection.Find.ClearFormatting
    Se
    ... See more
    Sub TabletoTMX()

    Dim rngTemp As Range
    Dim tableTemp As Table

    Options.AutoFormatReplaceQuotes = False
    Selection.Tables(1).Select
    Selection.Copy
    Documents.Add
    Selection.Paste


    Set tableTemp = ActiveDocument.Tables(1)
    Set rngTemp = _
    tableTemp.ConvertToText(Separator:=wdSeparateByTabs)
    Selection.Delete

    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
    .Text = "^p"
    .Replacement.Text = _
    "«/seg»«/tuv»«/tu»^p«tu»«tuv xml:lang=""en-US""»«seg»"
    .Forward = False
    .Wrap = wdFindAsk
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
    .Text = "^t"
    .Replacement.Text = "«/seg»«/tuv»«tuv xml:lang=""nl-NL""»«seg»"
    .Forward = False
    .Wrap = wdFindAsk
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

    Selection.TypeText Text:="«/seg»«/tuv»«/tu»«/body»«/tmx»"
    Selection.HomeKey Unit:=wdStory
    Selection.TypeText Text:="«?xml version=""1.0"" encoding=""utf-8""?»«tmx version=""1.4""»«header»«/header»«body»«tu»«tuv xml:lang=""en-US""»«seg»"


    ActiveDocument.SaveAs2 FileName:="memory.tmx", FileFormat:= _
    wdFormatText, LockComments:=False, Password:="", AddToRecentFiles:=True, _
    WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
    SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
    False, Encoding:=65001, InsertLineBreaks:=False, AllowSubstitutions:= _
    False, LineEnding:=wdLFOnly
    End Sub


    [Edited at 2022-08-22 10:52 GMT]
    Collapse


     
    Hans Lenting
    Hans Lenting
    Netherlands
    Member (2006)
    German to Dutch
    TOPIC STARTER
    Different approach Aug 22, 2022

    Since you cannot Find and Replace in the clipboard (perhaps I'm missing something?), I chose another approach, via a temporary document. See the posting above.

    This approach doesn't handle bold and italics, nor the ampersand. Perhaps I'll add that some day.


    [Edited at 2022-08-22 11:02 GMT]


     
    Hans Lenting
    Hans Lenting
    Netherlands
    Member (2006)
    German to Dutch
    TOPIC STARTER
    Demo Aug 22, 2022

    Demo:

    3


     
    Pages in topic:   [1 2] >


    To report site rules violations or get help, contact a site moderator:


    You can also contact site staff by submitting a support request »

    From Ms Word table to TMX file







    CafeTran Espresso
    You've never met a CAT tool this clever!

    Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

    Buy now! »
    Wordfast Pro
    Translation Memory Software for Any Platform

    Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

    Buy now! »