Introduction

CodePlex is full of fantastic ideas. One of them is Irony - a framework that allows to create new languages. The scanner, parser, and interpreter are coded in C#, and all one needs is to define the grammar (also in C#) and provide the implementation of new keywords, functions, etc. There are some introductory articles regarding Irony on CodeProject, e.g.: Writing Your First Domain Specific Language, Part 1 of 2, JSBasic - A BASIC to JavaScript Compiler, Writing Your First Visual Studio Language Service, and Irony - .NET Compiler Construction Kit.

This work shows some of the possibilities of Irony. The point is to automate interactions with a WWW server (create some kind of web robot). It seems that the best way to achieve that is through a simple domain specific language.

Grammar

The main points of the grammar necessary to realize basic WWW operations are defined as follows:

program ::= <stmt>*

stmt ::= 
   getStmt
 | postStmt
 | matchStmt
 | caseStmt
 | gotoStmt
 | labelStmt
 | assignmentStmt
 | expr

getStmt ::= "get" <strArg> "into" <variable> <suite>

postStmt ::= "post" <strArg> "referer" "=" <strArg> <postDataStmt> "into" <variable> <suite>
postDataStmt ::= "postdata" <postDataItemStmt>* "end"
postDataItemStmt ::= <strArg> "=" <strArg>

matchStmt ::= "match" <variable> "using" <matchregex> <suite>

caseStmt ::= "switch" <strArg> ":" <caseStmt>+ [<defaultStmt>] "end"
caseStmt ::= "case" <matchregex> ":" <stmt>+ "end"
defaultStmt ::= "default" <stmt>+

gotoStmt ::= "goto" <identifier>

labelStmt ::= ":" <identifier>

assignmentStmt ::= <variable> "=" <expr>

expr ::= <term> | <unExpr> | <binExpr>

variable ::= "@" <identifier>

suite ::= : <stmt>+ [<suiteError>] "end"
suiteError ::= ":error" <stmt>+

Irony allows for easy transformation from this BNF-like form into C# code, e.g.:

getStmt.Rule = Symbol("get") + strArg + "into" + variable + suite;

stmt.Rule = assignmentStmt | expr | matchStmt | getStmt | 
            postStmt | switchStmt | labelStmt | gotoStmt;
program.Rule = MakeStarRule(program, stmt); 

The most important grammar elements are the "get", "post", and "match" statements. "get" and "post" allow to send a request (GET or POST) to a web server and store the result in a variable, e.g.:

get "http://www.google.com/" into @variable 
   log(@variable) 
end

log is one of the functions in the WWW DSL; as the name suggests, it allows to put info into some storage (file, or console). Note that the variables are not explicitly declared. The address can also come from a variable:

@addr = "http://www.google.com/"
get @addr into @variable 
   log(@variable) 
end

Error handling is also possible:

@addr = "http://www.google.com/"
get @addr into @variable
   log(@variable) 
:error
   log("Error appear!")
end

Similarly, you can post a request to a server (e.g., logging to Gmail):

@addr = "https://www.google.com/accounts/ClientLogin"
post @addr 
referer = "https://www.google.com/accounts/ClientLogin"
postdata
   "accountType"="GOOGLE"
   "Email"="account@gmail.com"
   "Passwd"="putpasswordhere"
   "service"="mail"
   "source"="Pol-WWWDSL-1.0"
end
   into @variable
   log(@variable) 
:error
   log("Error appear!")
end

After an answer is obtained from the server it shall be processed. The "match" statement can be used to perform some operations when the response fits a Regular Expression provided, e.g.:

@addr = "https://www.google.com/accounts/ClientLogin"
post @addr referer = "https://www.google.com/accounts/ClientLogin"
postdata
   "accountType"="GOOGLE"
   "Email"="account@gmail.com"
   "Passwd"="putpasswordhere"
   "service"="mail"
   "source"="Pol-WWWDSL-1.0"
end
into @variable 
  log(@variable)
  match @variable using
>>>
SID=(?<sid>[^\\s]+)\\s+LSID=(?<lsid>[^\\s]+)\\s+Auth=(?<auth>[^\\s]+)
>>>
    log(@sid)
    log(@lsid)
    log(@auth)
  :error
    log("Match failed :(")
  end 
:error
  log("Error appear!")
end

Regular Expression definition starts and ends with a triple ">" character. Variables inside the expression can be declared using regex named groups. One small problem with Irony is the usage of "\" in regex - it must be provided as "\\".

When there are multiple choices, a "switch" statement is better than "match", e.g.:

get "http://www.google.pl/search?q=Irony" into @variable
  switch @variable
  case
    >>>
    Wikipedia, the free encyclopedia
    >>>
      log("Wikipedia result")
    end
    case
    >>>
    definition | Dictionary.com
    >>>
      log("Dictionary result, no wikipedia result")
    end
    default
      log("Other results, no wikipedia nor dictionary result")
  end
end

The last syntax element is the "goto" jump. Such "ugly" constructions are very convenient in simple scripts like the one presented here. "goto" transfers execution to a point in code marked with a label used as the jump destination, e.g.:

:restart
get "http://www.google.pl/search?q=Irony" into @variable
  switch @variable
  case
    >>>
    Wikipedia, the free encyclopedia
    >>>
      log("Wikipedia result")
      goto restart
    end
    default
      log("Other results, no wikipedia result")
  end
end

The version of Irony used in this project lacks a "goto" implementation. Therefore, a simple workaround was prepared to provide this functionality.

It works as follows:

The DoEvaluate method throws a GotoJumpException exception.

protected override void DoEvaluate(EvaluationContext context)
{
  throw new GotoJumpException(labelNode_);
}

This allows to return from the invocations stack. Then, execution is restarted, but from the point that is appropriate to the jump destination label.

while (nodes != null)
{
    try
    {
        foreach (var node in nodes)
        {
            node.Evaluate(evalContext);
        }
        nodes = null;
    }
    catch (GotoJumpException e)
    {
        nodes = m_gotoNodes[e.LabelId.ValueString];
    }
}

These execution points are prepared prior to script execution. The syntax tree is trimmed at label points, and the resulting branches are stored in the m_gotoNodes dictionary with the label names.

foreach (var labelStmt in labels)
{
    var nodes = labelStmt.Parent.ChildNodes.Skip(
                  labelStmt.Parent.ChildNodes.IndexOf(labelStmt));
    var parent = labelStmt.Parent;
    while (parent.Parent != null)
    {
        var upper = parent.Parent;
        if ((upper.Term.Name == "stmt+") || (upper.Term.Name == "program"))
        {
            var upperNextNodes = 
              upper.ChildNodes.Skip(upper.ChildNodes.IndexOf(parent) + 1);
            nodes = nodes.Concat(upperNextNodes);
        }
        parent = upper;
    }
    m_gotoNodes.Add(((Token)labelStmt.ChildNodes[0]).ValueString, nodes);
}

Functions

The "Standard library" of WWW DSL consists of four functions.

The "wait" function allows to wait for a specified number of seconds, e.g.:

get @addr into @variable
  match @variable using
  >>>
  you have to wait (?<minutes>\\d+) minutes
  >>>
    wait(int(@minutes)*60)
  end
end

The above example uses an additional function "int" that allows to convert from string to int. Basic numerical manipulations (like multiplication above) are provided by Irony itself.

The last two functions are "log" which is shown in the previous examples, and "download", a function that allows downloading files from a WWW server. The following example is the simplest file downloader created with the WWW DSL:

download(@arg1, @arg2)

In this example, we have two variables (@arg1 and @arg2) that are equivalent to "args" in the "main" function of a program. @arg1 is the first and @arg2 is the second argument passed to the WWW DSL script.

Irony Grammar Explorer

After a script is prepared, it is checked for correctness. Irony has a very nice application to check grammar and scripts, the Irony Grammar Explorer.

Fig.1. Irony Grammar Explorer with sample WWW DSL script

You can load grammar from a DLLl assembly and paste the script to test for correctness. A sample is presented in figure 1.

Example

The project attached to this article is a simple download application. It allows to automate the operations necessary to download files from file sharing sites. The script in rs.wwwdsl is prepared as a recipe for the most known (at least for me) file sharing site.

cmd.JPG

Fig.2. RS.WWWDSL in action

To invoke download processing, you have to prepare a file with the list of links and pass it to testwwwdsl.exe as the second argument. The first argument is the name of the processing script. Figure 2 shows some processing action.

Remark

The version of Irony used and that is necessary to run this project is the alpha release from Nov. 5 2008.

TODO

In a following article, I will present a WinForms application to automate download from different web sources.