What’s the problem with sentence matching? (CAT Tools Technical Help)

Technical forums » CAT Tools Technical Help »
What’s the problem with sentence matching?
Track this topic

What’s the problem with sentence matching?

Thread poster: Els Eerdekens

Els Eerdekens
Belgium
French to Dutch

May 2, 2013

Dear all,

In her article from 2008, Carme Colominas writes the following:

(http://benjamins.com/#catalog/journals/babel.54.4.03col/details)

“Most of the current Translation Memory systems are based on segments determined by marks that in most cases correspond to a complete sentence. The problem of complete sentence matching is that examples are often excluded from the matching candidates even though they probably contain one or more useful sub-segments that could be helpful to the translation."

I don’t completely understand what the problem with sentence matching is. I suppose that the concordance search resolves the problem, but that noun phrases or pre-and postmodified noun phrases cannot be found. How is it possible that some matching candidates are excluded?

"In view of these limitations, some proposals have been made in the literature regarding the possibility of building Translation Memory systems that operate “below” the sentence level, that is to say, at a sub-sentential level. Existing work demonstrates that sub-sentential segmentation of Translation Memories clearly shows a significantly best recall with respect to sentential segmentation.”

Are there yet systems that work “below” the sentence level?

Thanks!

Els ▲ Collapse

Sergei Leshchinsky

Ukraine
Local time: 23:38
Member (2008)
English to Russian
+ ...

...

May 2, 2013

Simply add a custom end-of-segment separator to break long sentences into smaller pieces.

E.g. if you set comma as a custom separator, it will break at each comma.

IrimiConsulting

Sweden
Local time: 22:38
Member (2010)
English to Swedish
+ ...

Below sentence level -> phrase level

May 2, 2013

Matching on the phrase level would definitely be possible, but would require a lot more intelligence from the software since it needs to analyse word classes and grammar rather than just text strings, which in turn requires the use of dictionaries. There will always be problems with words not found in the dictionary and discontinous phrases, and some languages will be less suitable for phrase-level matching.

For "my" languages (English, Swedish, German and French), phrase-level matching would be fairly easy in English, Swedish and French. The German word order would complicate matters a bit, but it would still be quite doable.

In the end, the result would depend to a large extent on the quality of the source text. The GIGO principle (garbage in - garbage out) is very valid in all sorts of language automation.

"In view of these limitations, some proposals have been made in the literature regarding the possibility of building Translation Memory systems that operate “below” the sentence level, that is to say, at a sub-sentential level. Existing work demonstrates that sub-sentential segmentation of Translation Memories clearly shows a significantly best recall with respect to sentential segmentation.”

Are there yet systems that work “below” the sentence level?

▲ Collapse

Heinrich Pesch

Finland
Local time: 23:38
Member (2003)
Finnish to German
+ ...

Its real

May 2, 2013

In SDL Studio it is called Autosuggest, in DVX Deep Mining.
I haven't used those features yet, but they search for phrases within the text and in the TM and would speed up translation process.

Christine Andersen

Denmark
Local time: 22:38
Member (2003)
Danish to English
+ ...

I find AutoSuggest very useful

May 2, 2013

Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored.

It might be possible to avoid some of them by filtering or editing the TM before using it to create the AutoSuggest dictionary, if one was aware of what to avoid. I did not do that, but still only get a few 'impossible' suggestions.

It would be ideal if the... See more

Els Eerdekens
Belgium
French to Dutch

TOPIC STARTER

@ Sergei

May 3, 2013

Sergei Leshchinsky wrote:

Simply add a custom end-of-segment separator to break long sentences into smaller pieces.

E.g. if you set comma as a custom separator, it will break at each comma.

Dear Sergei,

Where can I do this? In WinAlign from Trados (segmentation rules) or in the source document?
If I have to change the segmentation rules, what do I have to do exactly?

Kind regards,

Els

David Turner

Local time: 22:38
French to English
+ ...

All CAT tools should be able to segment at a comma

May 4, 2013

In TWB, for example, File/Setup/Segmentation rules, click "Add", add "Comma" and then in "Rule"/"Stop character", enter a comma (",").

TWB will then segment:

"Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored"

as four segments:

Because it works purely on statistical analysis and is... See more

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie	[Call to this topic]
Peter Zauner	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

What’s the problem with sentence matching?

Translation news related to CAT tools

» Memsource Sells to Carlyle: The Inside Story
(0 comments)
» memoQ 9.4: Turbo-Charging Productivity
(0 comments)
» The Future Of Work Now: The Computer-Assisted Translator And Lilt
(0 comments)

Submit translation news about CAT tools »
Read more translation news »

Forum rules

Help and orientation

CafeTran Espresso
You've never met a CAT tool this clever! Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free Buy now! »

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

What’s the problem with sentence matching?

What’s the problem with sentence matching?

You have native languages that can be verified

Your current localization setting

Select a language