Friday, 13 December 2013

Collapse and Capture a Repeating Pattern

This particular Stackoverflow question shows the ease of use of parse over regex. (It was so simple with parse that this post took me about 5 minutes to write :-) 

This is what the OP says:
I keep bumping into situations where I need to capture a number of tokens from a string and after countless tries I couldn't find a way to simplify the process.
This is what the input looks like:
start:test-test-lorem-ipsum-sir-doloret-etc-etc-something:end
view raw string hosted with ❤ by GitHub


How does parse deal with it? 

Simple. 'parse <string> none' will let you split a string by the spaces. Similarly, 'parse/all <string> "x"' will let you split a string by the character "x", whatever it might be. You can even use more characters, like "xyz", and the string will be split by them. See:
>> parse "Off-the-charts On-the-asphalt" "-" ;; No parse/all here
== ["Off" "the" "charts" "On" "the" "asphalt"]
>> parse/all "Off-the-charts On-the-asphalt" "-"
== ["Off" "the" "charts On" "the" "asphalt"]
>> parse/all "Off-the-charts On-the-asphalt" "- " ;; Single dash and space
== ["Off" "the" "charts" "On" "the" "asphalt"]


This is how we can get strings from the given situation using parse:
divide-string-by-dash: func [str [string!]] [
return (parse/all str "-")
]


Usage:
>> divide-string-by-dash "Long-live-regexps"
== ["Long" "live" "regexps"]
view raw usage hosted with ❤ by GitHub


OP's string

Now, the OP's string starts with "start:" and ends with ":end", and we need to factor that. There are many ways of doing this. Different people will do it differently, but this is what I'll do:
divide-string-by-dash: func [str [string!]] [
str-block: parse/all str ":" ;; result: ["start" "<string as needed>" "end"]
return (parse/all (second str-block) "-")
]


You can use a single line for lines 2 and 3. Like:
divide-string-by-dash: func [str [string!]] [
return (parse/all (second (parse/all str ":")) "-")
]


Here is another version if you don't like too many parenthesis:
divide-string-by-dash: func [str [string!]] [
return parse/all second parse/all str ":" "-"
]


Usage (same for all 3 versions of the function):
>> divide-string-by-dash "start:test-test-lorem-ipsum-sir-doloret-etc-etc-something:end"
== ["test" "test" "lorem" "ipsum" "sir" "doloret" "etc" "etc" "something"]
view raw usage hosted with ❤ by GitHub


Feel free to ask or say anything.

No comments:

Post a Comment