Saturday, 2 March 2013

Rebol's answer to Regex: parse and Rebol types


If you wish to know more about Rebol, you can start at Rebol in 10 steps. After that, there is awesome documentation and of course, Google is your friend.

In case you want a human touch, feel free to come to the Rebol (and Red) chat room at stackoverflow. If the Red part mystifies you, there are a lot of kind and helpful people there to tell you about it and more (you should talk to a certain individual named "dockimbel" in this regard, or see the site). You will meet normal programmers, doctors, language implementors, open source contributors, cat herders, businessmen, graphic artists, people living off Rebol. You get it.

Rebol believes in doing things differently. Here is a proof of concept involving parse, Rebol's answer to Regular Expressions. In case you have used Regular Expressions before, you know whats happening at the 8 Regular Expressions you should know page.

In case you have used Rebol before (I strongly recommend it before going through this blog post), you most likely know that it supports a lot of datatypes. So banging your head against things like urls, for example is not really necessary in Rebol. Hence the parse and Rebol types. 


Preliminaries

A great, short introduction to parsing in Rebol can be found here.

Assuming you know a bit about Rebol and parse now, here are a few things we should define before we go ahead. So fire your interpreter and type the following (a '>>' means the repl line starting delimiter):

You defined three character sets here. As the names suggest, these were for alphabets, numbers and special characters. You utilized two ways of defining character sets: one is to provide ranges for characters, and the other is to provide characters themselves, just like you did for special characters.

Next, you will want to use the combinations of the above defined character sets to define other character sets. This is as simple as:

You just created a new character set, which will match alphabets and characters. Try this:


See, you love Rebol already! If you had problems upto here, you probably need to brush up your Rebol introduction.

1. A username

Consider a username as only letters and numbers, and a range, say, between 3 and 16 characters.
So, with the above written code, we have

If you do not like using magic numbers, with the use of variables for minimum and maximum (a range), you can simply use two variables and make a rule with compose.

2. A password

Considering a password as consisting of anything: letters, numbers and/or special characters, your job here is similar to the previous one. You just need to add another character set to your alpha-num character set. So you type the following:

and by doing so you have done just that. Time to check it.

Keep in mind that the union function takes only 2 arguments. So something like the following would neither work nor show an error:
>> password-chars: union alpha numbers specials
This is because password-chars is now a union of alpha and numbers and the whole line returns specials  A better way is to either make it the union of alpha-num and specials  like we have done. You can also do the whole thing in a line:

In order to keep out some special character, you can keep it outside simply by not including it in specials. For example, you might not want to include "@" or "{" in a password. (You have quirky choices :-))

3. A hex value

Considering a hex value as numbers from 0 to 9 and alphabets from 'a' to 'f'. So, a simple change to the username parse will do.

So, you are not interested in "stringified" hex values, and want to use hex values practically, like to handle color values etc. In that case, consider using tuples. But I might be nuts, so do whatever you want. There is no right-only way of doing things in Rebol, so ... moving on...

4. A slug

People working on the web will know what a slug is. Which means everybody. But wait, I do know know what it is. Basically, a slug is the part of a web address that represents the content, like the file name in the url. For parsing purposes, it is a sequence of characters containing only numbers and alphabets and "-". Since you have already done usernames and passwords and hex values, a slug is, well... a slug :-) Here you go:

You created a rule for slugs  in the first line, and simple as anything you checked it for two strings. You are parsing faster than I thought :-)

Rebol3 users can also simply append a string to a charset. So, this would have worked fine in r3:



Enough of this, lets move to using Rebol types.

5, 6 and 8. email, url and tags

A cool guy/gal like yourself remembers that there are lots of datatypes in Rebol. So you have these three.
Before anything, type the following and see for yourself: 

Ok. What just happened? You probably know that you can get the type of something with type?. So, with a minor change, you can find out if something was of a particular type. You can not quite simply use a function for it:

So now,

But wait. What it this?

First of all, an intelligent programmer like yourself never codes what already has been implemented. The same results could have been provided by the following:

And it also begs the same question.

Secondly, as a Rebol programmer, you do not face the same old only letters, numbers and underscore for variable names problem, and you can afford really readible names. Your variable names can be like is-email? instead of isemail. Again, as a Rebol programmer, you know that functions in Rebol are just variables (that is, they are first class objects). You can even implement a function with a name as one of the Rebol functions, and use it. Say, print for example.

Thirdly (and this is the main issue), you need to find a way to return false when the input is not an email! or tag! or url! as the case may be. What if you could check if a value is a valid Rebol data type or not? (I know you knew it already, but...) turns out you can! So, what you are gonna do is return false if the input is not a valid rebol data type, and return true only if it is an email. Something like:

get gets the value of something, and the /any extension makes sure that it can be anything. unset?  removes the value of anything. You can always check out the help for any Rebol function. (>> help get)  So here's what I said about unset 

You will want to check it.

For one-line ninjas, 

I know you don't wanna bore yourself with the functions about the other two things. It is as simple as replacing the word "email" in the definition by tag and url (so its is-url? and url? etc., doesn't really need to be is-url?, any function name you can think will be fine :-) )

7. A Tuple An IP Address

Question: What be common between a color value, a coordinate point (2D or 3D, does not matter), and the version number of your latest build?
Unenlightened soul's answer: You be crazy!!
Rebol programmer's answer: They be tuples.

Something you might already know about Rebol is that it takes similar forms of data in a similar manner. So instead of having different datatypes/classes/modules like other languages, it can, and does handle similar data in similar ways. For things that handle data as different numbers, Rebol said, Let there be Tuples. (I mean, if it could speak, this is what it would :-) )

Detecting a tuple is as simple as asking if the data you have is a tuple.

But then again, you will not want to hit your head against a wall just because of the issues similar to the last (3) point(s) and the last example. So, you will make it something like:

And then,

One thing you might have noticed, is that a few values, which are not proper Rebol compliant values do not go through our type-detection because these create errors. If you are using Rebol 2, you can simply consider using only string values as input to your functions and use parse the way we did for usernames, passwords, hex values and slugs. If you are using Rebol 3, you can use something called transcode. You can see the discussion here. Since Rebol 3 is under development, you can very easily chat with the developers and let them know if something does not seem right to you. As someone in the Rebol room once said (or maybe still says), these are exciting times for Rebol!

(A bit of background.)