No registered users in community xowiki
in last 10 minutes
in last 10 minutes
Tokenizer
Listing of doc/example-scripts/rosetta-tokenizer.tcl
Assumes Tcl 8.6 (couroutine support)
if {[catch {package req Tcl 8.6}]} return
Rosetta example: Tokenize a string with escaping
Write a class which allows for splitting a string at each non-escaped occurrence of a separator character.
package req nx nx::Class create Tokenizer { :property s:required :method init {} { :require namespace set coro [coroutine [current]::nextCoro [current] iter ${:s}] :public object forward next $coro } :public method iter {s} { yield [info coroutine] for {set i 0} {$i < [string length $s]} {incr i} { yield [string index $s $i] } return -code break } :public object method tokenize {{-sep |} {-escape ^} s} { set t [[current] new -s $s] set part "" set parts [list] while {1} { set c [$t next] if {$c eq $escape} { append part [$t next] } elseif {$c eq $sep} { lappend parts $part set part "" } else { append part $c } } lappend parts $part return $parts } }
Run some tests incl. the escape character:
% Tokenizer tokenize -sep | -escape ^ ^| | % Tokenizer tokenize -sep | -escape ^ ^|^| || % Tokenizer tokenize -sep | -escape ^ ^^^| ^| % Tokenizer tokenize -sep | -escape ^ | {} {}
Test for the output required by the Rosetta example:
% Tokenizer tokenize -sep | -escape ^ one^|uno||three^^^^|four^^^|^cuatro| one|uno {} three^^ four^|cuatro {}