Your company develops a variety of web-based products. Many tools for transforming HTML (the language of web pages) require those pages to be written in a ?disciplined? dialect of HTML called XHTML. One of the key factors distinguishing XHTML from ordinary HTML is that all opening and closing tags in XHTML must be properly balanced.

You have been asked to write a validator that can examine an XHTML file for this balancing.

HTML is a ?mark-up? language in which ordinary text is interspersed with ?tags? that introduce formatting properties. Tags are written inside < > brackets. There are three forms of tags:

  • opening tags: ?<whatever?>? The ... may be empty or may contain one or more attributes, separated from the tag name by at least one whitespace character. Attributes will be ignored in this assignment.
  • closing tags: ?</whatever?>? The ... may be empty or whitespace characters.
  • singleton tags: ?<whatever?/>? The ... may be empty or may contain one or more attributes, separated from the tag name by at least one whitespace character.

For example, here is a snippet of HTML in which I have highlighted the opening tags, closing tags, and singleton tags:

<p>
For example, here is a snippet of HTML in which
I have highlighted the <span class="high1">opening
tags</span>, <span class="high2">closing
tags</span>, and
<span class="high3">singleton tags</span>:
</p>
<hr/>

Most tags occur in pairs, in which an opening tag <whatever...> is closed by a similar tag with a / at the start of its name: </whatever>. For example, web pages can get bold text this way: <b>bold</b> and italic text this this way: <i>italic</i>. Notice that the tag names are case sensitive in XHTML. Pairs of tags may be nested: <i>like <b>this</b></i> to produce text _like this _, but may not overlap <b>like <big>this</b></big>. Consequently, a tag with a / must always match the most recently seen non-/ paired tag. That rule suggests the use of stacks.

Some tags are singletons, not occurring in pairs. These are signaled by a ?/? just before the closing >. For example, <hr/> produces a horizontal rule:

  1. Files for this assignment may be found here or, if you are logged in to one of the CS Dept. Linux machines, in ~zeil/Assignments/cs361/stack_xhtml/.
  2. You are given a driver for the XHTML validator. The code you are provided with will read an XHTML file and scan for any tags. It passes each tag it encounters to a Balancer whose job is to make sure that the tags are occurring in properly balanced pairs.
    The driver is designed to take the path to the file to be checked as a command line parameter.
    ./xmlcheck test0.html 
  3. You must supply the Balancer class. This class must use a std::stack as its primary data member. (You may have additional simple data members, bools or ints, but no containers other than the stack.) The two most critical operations the Balancer class must support are
    • tag(string atag): indicates that the driver has encountered a tag. The full text of the tag is passed as the parameter atag. This may be an opening tag, a closing tag, or a singleton tag.
    • status(): returns a single int indicating the status of what has been observed so far:
      • -1 indicates that an error has been detected.
      • 0 indicates that no error has been detected, but that one or more opening tags have yet to be matched
      • 1 indicates that no error has been detected and there are no opening tags that have yet to be matched against a closing tag.
  4. As in the prior assignments, you also will be provided with unit tests for the class that you are working on (Balancer);
    There is also a pair of test files test0.dat and test1.dat that you can use to test the full xmlcheck program. test0.html should be reported as OK, but test1.html has mismatched tags. As always, you should conduct additional systems testing with your own test cases.

FlexLexer.h

// -*-C*-
// FlexLexer.h — define interfaces for lexical analyzer classes generated
// by flex

// Copyright (c) 1993 The Regents of the University of California.
// All rights reserved.
//
// This code is derived from software contributed to Berkeley by
// Kent Williams and Tom Epperly.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:

// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.

// Neither the name of the University nor the names of its contributors
// may be used to endorse or promote products derived from this software
// without specific prior written permission.

// THIS SOFTWARE IS PROVIDED “AS IS” AND WITHOUT ANY EXPRESS OR
// IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE.

// This file defines FlexLexer, an abstract class which specifies the
// external interface provided to flex C lexer objects, and yyFlexLexer,
// which defines a particular lexer class.
//
// If you want to create multiple lexer classes, you use the -P flag
// to rename each yyFlexLexer to some other xxFlexLexer. You then
// include <FlexLexer.h> in your other sources once per lexer class:
//
// #undef yyFlexLexer
// #define yyFlexLexer xxFlexLexer
// #include <FlexLexer.h>
//
// #undef yyFlexLexer
// #define yyFlexLexer zzFlexLexer
// #include <FlexLexer.h>
// …

#ifndef __FLEX_LEXER_H
// Never included before – need to define base class.
#define __FLEX_LEXER_H

#include <iostream>

extern “C” {

struct yy_buffer_state;
typedef int yy_state_type;

class FlexLexer
{
public:
virtual ~FlexLexer() { }

const char* YYText() const { return yytext; }
int YYLeng() const { return yyleng; }

virtual void
yy_switch_to_buffer( yy_buffer_state* new_buffer ) = 0;
virtual yy_buffer_state* yy_create_buffer( std::istream* s, int size ) = 0;
virtual yy_buffer_state* yy_create_buffer( std::istream& s, int size ) = 0;
virtual void yy_delete_buffer( yy_buffer_state* b ) = 0;
virtual void yyrestart( std::istream* s ) = 0;
virtual void yyrestart( std::istream& s ) = 0;

virtual int yylex() = 0;

// Call yylex with new input/output sources.
int yylex( std::istream& new_in, std::ostream& new_out )
{
switch_streams( new_in, new_out );
return yylex();
}

int yylex( std::istream* new_in, std::ostream* new_out = 0)
{
switch_streams( new_in, new_out );
return yylex();
}

// Switch to new input/output streams. A nil stream pointer
// indicates “keep the current one”.
virtual void switch_streams( std::istream* new_in,
std::ostream* new_out ) = 0;
virtual void switch_streams( std::istream& new_in,
std::ostream& new_out ) = 0;

int lineno() const { return yylineno; }

int debug() const { return yy_flex_debug; }
void set_debug( int flag ) { yy_flex_debug = flag; }

protected:
char* yytext;
int yyleng;
int yylineno; // only maintained if you use %option yylineno
int yy_flex_debug; // only has effect with -d or “%option debug”
};

}
#endif // FLEXLEXER_H

#if defined(yyFlexLexer) || ! defined(yyFlexLexerOnce)
// Either this is the first time through (yyFlexLexerOnce not defined),
// or this is a repeated include to define a different flavor of
// yyFlexLexer, as discussed in the flex manual.
# define yyFlexLexerOnce

extern “C” {

class yyFlexLexer : public FlexLexer {
public:
// arg_yyin and arg_yyout default to the cin and cout, but we
// only make that assignment when initializing in yylex().
yyFlexLexer( std::istream& arg_yyin, std::ostream& arg_yyout );
yyFlexLexer( std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0 );
private:
void ctor_common();

public:

virtual ~yyFlexLexer();

void yy_switch_to_buffer( yy_buffer_state* new_buffer );
yy_buffer_state* yy_create_buffer( std::istream* s, int size );
yy_buffer_state* yy_create_buffer( std::istream& s, int size );
void yy_delete_buffer( yy_buffer_state* b );
void yyrestart( std::istream* s );
void yyrestart( std::istream& s );

void yypush_buffer_state( yy_buffer_state* new_buffer );
void yypop_buffer_state();

virtual int yylex();
virtual void switch_streams( std::istream& new_in, std::ostream& new_out );
virtual void switch_streams( std::istream* new_in = 0, std::ostream* new_out = 0 );
virtual int yywrap();

protected:
virtual int LexerInput( char* buf, int max_size );
virtual void LexerOutput( const char* buf, int size );
virtual void LexerError( const char* msg );

void yyunput( int c, char* buf_ptr );
int yyinput();

void yy_load_buffer_state();
void yy_init_buffer( yy_buffer_state* b, std::istream& s );
void yy_flush_buffer( yy_buffer_state* b );

int yy_start_stack_ptr;
int yy_start_stack_depth;
int* yy_start_stack;

void yy_push_state( int new_state );
void yy_pop_state();
int yy_top_state();

yy_state_type yy_get_previous_state();
yy_state_type yy_try_NUL_trans( yy_state_type current_state );
int yy_get_next_buffer();

std::istream yyin; // input source for default LexerInput
std::ostream yyout; // output sink for default LexerOutput

// yy_hold_char holds the character lost when yytext is formed.
char yy_hold_char;

// Number of characters read into yy_ch_buf.
int yy_n_chars;

// Points to current character in buffer.
char* yy_c_buf_p;

int yy_init; // whether we need to initialize
int yy_start; // start state number

// Flag which is used to allow yywrap()’s to do buffer switches
// instead of setting up a fresh yyin. A bit of a hack …
int yy_did_buffer_switch_on_eof;

size_t yy_buffer_stack_top; /**< index of top of stack. */
size_t yy_buffer_stack_max; /**< capacity of stack. */
yy_buffer_state ** yy_buffer_stack; /**< Stack as an array. */
void yyensure_buffer_stack(void);

// The following are not always needed, but may be depending
// on use of certain flex features (like REJECT or yymore()).

yy_state_type yy_last_accepting_state;
char* yy_last_accepting_cpos;

yy_state_type* yy_state_buf;
yy_state_type* yy_state_ptr;

char* yy_full_match;
int* yy_full_state;
int yy_full_lp;

int yy_lp;
int yy_looking_for_trail_begin;

int yy_more_flag;
int yy_more_len;
int yy_more_offset;
int yy_prev_more_offset;
};

}

#endif // yyFlexLexer || ! yyFlexLexerOnce

bin/Linux/xmlcheck

bin/Windows/xmlcheck.exe

lex.yy.cpp

lex.yy.cpp




#line
 
3
 
"lex.yy.cc"




#define
  YY_INT_ALIGNED 
short
 
int




/* A lexical scanner generated by flex */




#define
 FLEX_SCANNER

#define
 YY_FLEX_MAJOR_VERSION 
2


#define
 YY_FLEX_MINOR_VERSION 
6


#define
 YY_FLEX_SUBMINOR_VERSION 
4


#if
 YY_FLEX_SUBMINOR_VERSION 
>
 
0


#define
 FLEX_BETA

#endif




    
/* The c scanner is a mess. The FlexLexer.h header file relies on the

     * following macro. This is required in order to pass the cmultiple-scanners

     * test in the regression suite. We get reports that it breaks inheritance.

     * We will address this in a future release of flex, or omit the C scanner

     * altogether.

     */


    
#define
 yyFlexLexer yyFlexLexer



/* First, we deal with  platform-specific or compiler-specific issues. */




/* begin standard C headers. */




/* end standard C headers. */




/* flex integer type definitions */




#ifndef
 FLEXINT_H

#define
 FLEXINT_H



/* C99 systems have <inttypes.h>. Non-C99 systems may or may not. */




#if
 defined 
(
__STDC_VERSION__
)
 
&&
 __STDC_VERSION__ 
>=
 
199901L




/* C99 says to define __STDC_LIMIT_MACROS before including stdint.h,

 * if you want the limit (max/min) macros for int types. 

 */


#ifndef
 __STDC_LIMIT_MACROS

#define
 __STDC_LIMIT_MACROS 
1


#endif




#include
 
<
inttypes
.
h
>


typedef
 int8_t flex_int8_t
;


typedef
 uint8_t flex_uint8_t
;


typedef
 int16_t flex_int16_t
;


typedef
 uint16_t flex_uint16_t
;


typedef
 int32_t flex_int32_t
;


typedef
 uint32_t flex_uint32_t
;


#else


typedef
 
signed
 
char
 flex_int8_t
;


typedef
 
short
 
int
 flex_int16_t
;


typedef
 
int
 flex_int32_t
;


typedef
 
unsigned
 
char
 flex_uint8_t
;
 

typedef
 
unsigned
 
short
 
int
 flex_uint16_t
;


typedef
 
unsigned
 
int
 flex_uint32_t
;




/* Limits of integral types. */


#ifndef
 INT8_MIN

#define
 INT8_MIN               
(
-
128
)


#endif


#ifndef
 INT16_MIN

#define
 INT16_MIN              
(
-
32767
-
1
)


#endif


#ifndef
 INT32_MIN

#define
 INT32_MIN              
(
-
2147483647
-
1
)


#endif


#ifndef
 INT8_MAX

#define
 INT8_MAX               
(
127
)


#endif


#ifndef
 INT16_MAX

#define
 INT16_MAX              
(
32767
)


#endif


#ifndef
 INT32_MAX

#define
 INT32_MAX              
(
2147483647
)


#endif


#ifndef
 UINT8_MAX

#define
 UINT8_MAX              
(
255U
)


#endif


#ifndef
 UINT16_MAX

#define
 UINT16_MAX             
(
65535U
)


#endif


#ifndef
 UINT32_MAX

#define
 UINT32_MAX             
(
4294967295U
)


#endif




#ifndef
 SIZE_MAX

#define
 SIZE_MAX               
(
~
(
size_t
)
0
)


#endif




#endif
 
/* ! C99 */




#endif
 
/* ! FLEXINT_H */




/* begin standard C headers. */


#include
 
<
iostream
>


#include
 
<
errno
.
h
>


#include
 
<
cstdlib
>


#include
 
<
cstdio
>


#include
 
<
cstring
>


/* end standard C? headers. */




/* TODO: this is always defined, so inline it */


#define
 yyconst 
const




#if
 defined
(
__GNUC__
)
 
&&
 __GNUC__ 
>=
 
3


#define
 yynoreturn __attribute__
((
__noreturn__
))


#else


#define
 yynoreturn

#endif




/* Returned upon end-of-file. */


#define
 YY_NULL 
0




/* Promotes a possibly negative, possibly signed char to an

 *   integer in range [0..255] for use as an array index.

 */


#define
 YY_SC_TO_UI
(
c
)
 
((
YY_CHAR
)
 
(
c
))




/* Enter a start condition.  This macro really ought to take a parameter,

 * but we do it the disgusting crufty way forced on us by the ()-less

 * definition of BEGIN.

 */


#define
 BEGIN 
(
yy_start
)
 
=
 
1
 

 
2
 
*


/* Translate the current start state into a value that can be later handed

 * to BEGIN to return to the state.  The YYSTATE alias is for lex

 * compatibility.

 */


#define
 YY_START 
(((
yy_start
)
 
-
 
1
)
 
/
 
2
)


#define
 YYSTATE YY_START

/* Action number for EOF rule of a given start state. */


#define
 YY_STATE_EOF
(
state
)
 
(
YY_END_OF_BUFFER 

 state 

 
1
)


/* Special action meaning "start processing a new file". */


#define
 YY_NEW_FILE yyrestart
(
 yyin  
)


#define
 YY_END_OF_BUFFER_CHAR 
0




/* Size of default input buffer. */


#ifndef
 YY_BUF_SIZE

#ifdef
 __ia64__

/* On IA-64, the buffer size is 16k, not 8k.

 * Moreover, YY_BUF_SIZE is 2*YY_READ_BUF_SIZE in the general case.

 * Ditto for the __ia64__ case accordingly.

 */


#define
 YY_BUF_SIZE 
32768


#else


#define
 YY_BUF_SIZE 
16384


#endif
 
/* __ia64__ */


#endif




/* The state buf must be large enough to hold one state per character in the main buffer.

 */


#define
 YY_STATE_BUF_SIZE   
((
YY_BUF_SIZE 

 
2
)
 
*
 
sizeof
(
yy_state_type
))




#ifndef
 YY_TYPEDEF_YY_BUFFER_STATE

#define
 YY_TYPEDEF_YY_BUFFER_STATE

typedef
 
struct
 yy_buffer_state 
*
YY_BUFFER_STATE
;


#endif




#ifndef
 YY_TYPEDEF_YY_SIZE_T

#define
 YY_TYPEDEF_YY_SIZE_T

typedef
 size_t yy_size_t
;


#endif




extern
 
int
 yyleng
;




#define
 EOB_ACT_CONTINUE_SCAN 
0


#define
 EOB_ACT_END_OF_FILE 
1


#define
 EOB_ACT_LAST_MATCH 
2


    

    
#define
 YY_LESS_LINENO
(
n
)


    
#define
 YY_LINENO_REWIND_TO
(
ptr
)


    

/* Return all but the first "n" matched characters back to the input stream. */


#define
 yyless
(
n
)
 

    
do
 

        
{
 

        
/* Undo effects of setting up yytext. */
 

        
int
 yyless_macro_arg 
=
 
(
n
);
 

        YY_LESS_LINENO
(
yyless_macro_arg
);


        
*
yy_cp 
=
 
(
yy_hold_char
);
 

        YY_RESTORE_YY_MORE_OFFSET 

        
(
yy_c_buf_p
)
 
=
 yy_cp 
=
 yy_bp 

 yyless_macro_arg 
-
 YY_MORE_ADJ
;
 

        YY_DO_BEFORE_ACTION
;
 
/* set up yytext again */
 

        
}
 

    
while
 
(
 
0
 
)


#define
 unput
(
c
)
 yyunput
(
 c
,
 
(
yytext_ptr
)
  
)




#ifndef
 YY_STRUCT_YY_BUFFER_STATE

#define
 YY_STRUCT_YY_BUFFER_STATE

struct
 yy_buffer_state

    
{




    std
::
streambuf
*
 yy_input_file
;




    
char
 
*
yy_ch_buf
;
        
/* input buffer */


    
char
 
*
yy_buf_pos
;
       
/* current position in input buffer */




    
/* Size of input buffer in bytes, not including room for EOB

     * characters.

     */


    
int
 yy_buf_size
;




    
/* Number of characters read into yy_ch_buf, not including EOB

     * characters.

     */


    
int
 yy_n_chars
;




    
/* Whether we "own" the buffer - i.e., we know we created it,

     * and can realloc() it to grow it, and should free() it to

     * delete it.

     */


    
int
 yy_is_our_buffer
;




    
/* Whether this is an "interactive" input source; if so, and

     * if we're using stdio for input, then we want to use getc()

     * instead of fread(), to make sure we stop fetching input after

     * each newline.

     */


    
int
 yy_is_interactive
;




    
/* Whether we're considered to be at the beginning of a line.

     * If so, '^' rules will be active on the next match, otherwise

     * not.

     */


    
int
 yy_at_bol
;




    
int
 yy_bs_lineno
;
 
/**< The line count. */


    
int
 yy_bs_column
;
 
/**< The column count. */




    
/* Whether to try to fill the input buffer when we reach the

     * end of it.

     */


    
int
 yy_fill_buffer
;




    
int
 yy_buffer_status
;




#define
 YY_BUFFER_NEW 
0


#define
 YY_BUFFER_NORMAL 
1


    
/* When an EOF's been seen but there's still some text to process

     * then we mark the buffer as YY_EOF_PENDING, to indicate that we

     * shouldn't try reading from the input source any more.  We might

     * still have a bunch of tokens to match, though, because of

     * possible backing-up.

     *

     * When we actually see the EOF, we change the status to "new"

     * (via yyrestart()), so that the user can continue scanning by

     * just pointing yyin at a new input file.

     */


#define
 YY_BUFFER_EOF_PENDING 
2




    
};


#endif
 
/* !YY_STRUCT_YY_BUFFER_STATE */




/* We provide macros for accessing buffer states in case in the

 * future we want to put the buffer states in a more general

 * "scanner state".

 *

 * Returns the top of the stack, or NULL.

 */


#define
 YY_CURRENT_BUFFER 
(
 
(
yy_buffer_stack
)
 

                          
?
 
(
yy_buffer_stack
)[(
yy_buffer_stack_top
)]
 

                          
:
 NULL
)


/* Same as previous macro, but useful when we know that the buffer stack is not

 * NULL or when we need an lvalue. For internal use only.

 */


#define
 YY_CURRENT_BUFFER_LVALUE 
(
yy_buffer_stack
)[(
yy_buffer_stack_top
)]




void
 
*
yyalloc 
(
 yy_size_t  
);


void
 
*
yyrealloc 
(
 
void
 
*
,
 yy_size_t  
);


void
 yyfree 
(
 
void
 
*
  
);




#define
 yy_new_buffer yy_create_buffer

#define
 yy_set_interactive
(
is_interactive<