An algorithm for segmentation?
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Feb 5, 2022

Since we are discussing segmentation at the Keyboard Maestro forum, I was wondering whether someone can provide an algorithm for segmentation? E.g. in VBA, basic, AppleScript etc.

 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 20:04
German to Swedish
+ ...
Objective-C Feb 5, 2022

Core Foundation and Cocoa on MacOS. Not very portable, but it illustrates the general approach: An object that slices up a string based on delimiters (in this case built into CFStringTokenizer). This should work across many languages and writing systems. If locale was irrelevant we wouldn't need the tokenizer and could reduce the code to one line or a couple of lines.

NSMutableArray<NSString*>* sentences = [NSMutableArray array];
CFLocaleRef locale = CFLocaleCopyCurrent
... See more
Core Foundation and Cocoa on MacOS. Not very portable, but it illustrates the general approach: An object that slices up a string based on delimiters (in this case built into CFStringTokenizer). This should work across many languages and writing systems. If locale was irrelevant we wouldn't need the tokenizer and could reduce the code to one line or a couple of lines.

NSMutableArray<NSString*>* sentences = [NSMutableArray array];
CFLocaleRef locale = CFLocaleCopyCurrent ();
NSString* aStr = @"A string? With some sentences. In it!";
CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (__bridge CFStringRef) aStr, CFRangeMake(0, aStr.length), kCFStringTokenizerUnitSentence, locale);

for(;;)
{
CFStringTokenizerTokenType tokenType = CFStringTokenizerAdvanceToNextToken (tokenizer);

if(tokenType != kCFStringTokenizerTokenNone)
{
CFRange cfr = CFStringTokenizerGetCurrentTokenRange (tokenizer);

[sentences addObject:[aStr substringWithRange:NSMakeRange(cfr.location, cfr.length)]];
}
else
{
break;
}
}

CFRelease(tokenizer);
CFRelease(locale);

[Bearbeitet am 2022-02-05 20:35 GMT]
Collapse


Hans Lenting
Philippe Locquet
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

An algorithm for segmentation?







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »