OPUS: How to use Gigabytes TMX files with CAT tools?
Thread poster: Bishoy Salib
Bishoy Salib
Bishoy Salib
Egypt
Oct 13, 2021

Good evening, guys.
Actually I am facing a problem with making a Spanish - Arabic TMX file work with CAT tools (4 Gigabytes!),
Anyone knows how to make it work?
I am trying to use the TMX files from this Corpus Crawling website https://opus.nlpl.eu/index.php
But unfortunately, I can't use it correctly.
I tried using Trados 2019 and 2021, MemoQ 9.3, and WordFast Pro 6.
... See more
Good evening, guys.
Actually I am facing a problem with making a Spanish - Arabic TMX file work with CAT tools (4 Gigabytes!),
Anyone knows how to make it work?
I am trying to use the TMX files from this Corpus Crawling website https://opus.nlpl.eu/index.php
But unfortunately, I can't use it correctly.
I tried using Trados 2019 and 2021, MemoQ 9.3, and WordFast Pro 6.
I tried to split the file using Emeditor but it didn't work with me.
Any suggestions?
Thanks in advance.
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 00:38
English to Russian
OPUS is an NMT engine built on the corpus you mentioned Oct 13, 2021

You can use it by enabling OPUS CAT MT engine and plugin (both items are required for the said NMT). When both the engine and the plugin have been installed, up and running, you can select OPUS CAT MT in the list of TM providers (for Trados) to get suggestions or pretranslate your translatable files. If you really want to handle the entire corpus itself, I think you need a different OS like Linux.

[Edited at 2021-10-13 19:10 GMT]


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
CafeTran Espresso's Total Recall technology Oct 14, 2021

Bishoy Salib wrote:

I am trying to use the TMX files from this Corpus Crawling website https://opus.nlpl.eu/index.php


You can use CafeTran Espresso's Total Recall technology for this.

https://cafetran.freshdesk.com/support/solutions/folders/6000058183

I downloaded an 837 MB TMX file with segments about the European Parliament from the OPUS site. I imported it in a new TR database and then I imported an MS Word document with the following content:

Während der Plenartagung debattierten die Abgeordneten über Lösungen für die aktuellen Energiepreissteigerungen und neue Enthüllungen im Zusammenhang mit Steuervermeidung.
Energiepreise

Das Europäische Parlament gab am Mittwoch grünes Licht für die überarbeiteten Auswahlregeln für Energieprojekte, die für eine EU-Finanzierung infrage kommen, um grenzüberschreitende Energieinfrastrukturen nachhaltig zu gestalten. In einer Debatte mit dem für Energie zuständigen Mitglied der Europäischen Kommission, Kadri Simson, betonten die Abgeordneten die dringende Notwendigkeit einer Unterstützung von schutzbedürftigen Haushalten, welche mit Gas- und Strompreisen in Rekordhöhe konfrontiert sind.


Steuerenthüllungen

Im weiteren Verlauf des Tages debattierten die Abgeordneten über die Enthüllungen der Pandora Papers, die weltweite Steuervermeidung und Steuerhinterziehung dokumentieren, und beklagten das Unvermögen der Regierungen, veraltete Steuergesetze angemessen zu reformieren.



When I search the concordance for a specific term, I get an immediate result:

Screenshot 2021-10-14 at 10.09.09



[Edited at 2021-10-14 10:11 GMT]


Bishoy Salib
 
Bishoy Salib
Bishoy Salib
Egypt
TOPIC STARTER
Explanation Oct 14, 2021

Could you please explain to me in details how did you import it in a new TR database (I didn't understand your words about that)
Another thing is that I have the program with a trial license so what should I do?


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
See the link Oct 14, 2021

Bishoy Salib wrote:

Could you please explain to me in details how did you import it in a new TR database (I didn't understand your words about that)
Another thing is that I have the program with a trial license so what should I do?


The steps are described in the link that I included in my reply. However, you'll need a licensed CafeTran Espresso to test the power feature Total Recall ...


 
Milan Condak
Milan Condak  Identity Verified
Local time: 23:38
English to Czech
64bit and big RAM Oct 14, 2021

Bishoy Salib wrote:

(4 Gigabytes!)



1. You need very much memory. TMX and project have to have the same set-up parameters.

2. Create model for statistical or neural MT. My example is with SMT Slate Desktop, only one works in Windows.

http://www.condak.cz/nove/2020-03/22/cs/03.html

and 04.html and 05.html. My model has 5.5 GB!!

3. Create model for neural MT in Linux.

There are ready models for engines Fiskmö and Opus CAT (NMT engines) which works in Windows.

In "Opus CAT" can be models fine-tuned with data from TMX for specific field.

@Stepan Konev, on web https://opus.nlpl.eu/index.php are data. There are no NMT.

My examples of using local NMT in Czech:

https://www.proz.com/forum/czech/352042-bezplatný_lokální_překladač_pro_windows_opus_cat.html

Milan


[Edited at 2021-10-14 14:30 GMT]


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 00:38
English to Russian
RE: OPUS: How to use Gigabytes TMX files with CAT tools Oct 14, 2021

Milan Condak wrote:
@Stepan Konev, on web https://opus.nlpl.eu/index.php are data. There are no NMT.
Yes*, I know, but the title reads OPUS: How to use Gigabytes TMX files with CAT tools? The only [known to me] way to use OPUS with Trados at a user level is that I described above. The above-referenced "Gigabytes TMX files" is what the OPUS CAT MT engine built on. They are already incorporated in there.
*BTW, there is a Tools & Info section that includes OPUS-CAT link (NMT).

[Edited at 2021-10-15 14:30 GMT]


 
Milan Condak
Milan Condak  Identity Verified
Local time: 23:38
English to Czech
Wordfast Server Oct 14, 2021

Bishoy Salib wrote:

I tried using ..., and WordFast Pro 6.


Wordfast Server is free for translator, for person.
For almost all CAT tools are servers for TM and glossaries. Not all are free.
User have to invest time to learn to use them and import the data.

Milan


 
Bishoy Salib
Bishoy Salib
Egypt
TOPIC STARTER
Olifant from Okapi Framework and making of TMs of wordfast.txt Oct 15, 2021

I see that I figured up a way of using the millions of TUs for free in Wordfast Pro, but I can't use it correctly. Because the output of Olifant that it is the translation memory (Wordfast TM Files.txt) when I use it in Wordfast it shows me that all the source language segments have text like this > 20211015~145807 and in the target language all the segments have the username account of Olifant when I start the application like this > bishoy salib

When I load the translation memori
... See more
I see that I figured up a way of using the millions of TUs for free in Wordfast Pro, but I can't use it correctly. Because the output of Olifant that it is the translation memory (Wordfast TM Files.txt) when I use it in Wordfast it shows me that all the source language segments have text like this > 20211015~145807 and in the target language all the segments have the username account of Olifant when I start the application like this > bishoy salib

When I load the translation memories in Wordfast they load in about 3 mins (2 millions TUs), so is there any other way to use this program (Olifant with Trados or any other program)?
Thanks in advance.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

OPUS: How to use Gigabytes TMX files with CAT tools?







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »