The first issue to address when writing CGI scripts is how to
pre-process the input.
HTTP specifies an encoding of the data that is not naturally amenable
to writing regular expressions.
For example, space characters are replaced by the plus sign (`+').
Many of the special characters including plus (`+'), the angle
brackets (`<', `>') and the ampersand (`&')
are replaced by special character sequences.
While there is nothing in this encoding that would give technical
difficulty to Lex, it is not comfortable for the human author
of the regular expressions.
To deal with this, we pre-process the characters on their way into the scanner Lex generates. We do so by redefining a macro that reads characters and places them into an array which is scanned by the state machine created by Lex. The new macro is shown below.
%{
#define YY_INPUT(buf,result,max_size) \
{ int digit ; \
int c = getc( yyin ) ; \
result = c == EOF ? 0 : 1 ; \
if( c == '%' ) { \
c = getc( yyin ) ; \
digit = c >= 'A' ? (( c & 0xdf )
- 'A' ) + 10 \
: c - '0' ; \
digit *= 16 ; \
c = getc( yyin ) ; \
digit += c >= 'A' ? (( c & 0xdf )
- 'A' ) + 10 \
: c - '0' ; \
buf[0] = (char) digit ; \
} \
else if( c == '+' ) \
buf[0] = ' ' ; \
else \
buf[0] = (char) c ; \
}
%}
The basic motivation behind using Lex for the CGI scripts is that we can express both the right answers, and in some cases some wrong answers, with regular expressions. The code fragment:
WS [ \t]*
ARG1 {WS}target{WS}
ARG2 {WS}names{WS}\[{WS}n{WS}\}{WS}
ARGS {ARG1},{ARG2}|{ARG2},{ARG1}
COND1 {WS}strcmp{WS}\({ARGS}\){WS}=={WS}0{WS}
COND2 {WS}0{WS}=={WS}strcmp{WS}\({ARGS}\){WS}
COND3 {WS}!{WS}strcmp{WS}\({ARGS}\){WS}
COND {COND1}|{COND2}|{COND3}
%%
answer1={WS}if{WS}\({COND}\){WS} {
send_file( "ans5-7.right" ) ;
return( 0 ) ;
}
[^\r]* {
send_file( "ans5-7.wrong" ) ;
return( 0 ) ;
}
shows the Lex code that looks for
the expected answer of:
if( strcmp( target, names[n] ) == 0 )and most likely correct variations. The strategy here is to break the expected answer down into its components and define regular expressions for all valid versions of each component. For example, here we have an
if statement with some
condition inside the parentheses.
We'll allow any amount of whitespace around the if
and around the parentheses.
The condition itself might be expressed either as
strcmp( ... ) == 0 or as 0 == strcmp( .... ).
It might also be !strcmp( ... ) (though the students
haven't seen that particular idiom and would be unlikely to
use it).
So we allow the condition to be either COND1,
COND2 or COND3 where each describes one of the options.
A similar organization is used for the arguments of the
function strcmp().
If we recognize the string we're looking for (in any of
its acceptable variations), we'll copy the file
ans5-7.right out to the client.
This file is an HTML file that contains a message that the
answer was right and a link to the next part of the tutorial.
If any other string was seen, we send the file ans5-7.wrong
back to the client.
For a one-of-many multiple choice question, the Lex code is much simpler. Here we just identify which of a few fixed strings is being sent and then respond with the appropriate file. This is illustrated in this example.
%%
"answer5-5=A"\&? {
send_file( "ans5-5.wronga" ) ; return( 0 ) ;
}
"answer5-5=B"\&? {
send_file( "ans5-5.wrongb" ) ; return( 0 ) ;
}
"answer5-5=C"\&? {
send_file( "ans5-5.wrongc" ) ; return( 0 ) ;
}
"answer5-5=D"\&? {
send_file( "ans5-5.wrongd" ) ; return( 0 ) ;
}
"answer5-5=E"\&? {
send_file( "ans5-5.right" ) ; return( 0 ) ;
}
%%
The many-of-many multiple choice question is more interesting. For this question, there are many combinations of right and wrong selections and we would like to send a response appropriate to each one. If we create an HTML file for each possibility, there will be many cases of portions begin repeated in a large subset of the files. Instead, we generate the response on the fly by chosing to include or exclude each of several responses that address what's wrong with one of the selections. If we have five possible answers, we will have 32 possible response pages that can be generated. Lex code that does this processing is exemplified in Figure 5. (This code fragment has been abbreviated showing only three of the five response components in order to make it fit on the page.)
Figure 5: Example Lex Code for Scanning a Many-of-Many
Multiple Choice Question
Next: Results Up: Implementation of the Tutorial Previous: The Server (CGI) Side
Brian L. Stuart