Your company develops a variety of web-based products. Many tools for transforming HTML (the language of web pages) require those pages to be written in a ?disciplined? dialect of HTML called XHTML. One of the key factors distinguishing XHTML from ordinary HTML is that all opening and closing tags in XHTML must be properly balanced.
You have been asked to write a validator that can examine an XHTML file for this balancing.
HTML is a ?mark-up? language in which ordinary text is interspersed with ?tags? that introduce formatting properties. Tags are written inside < >
brackets. There are three forms of tags:
- opening tags: ?<whatever?>? The
...
may be empty or may contain one or more attributes, separated from the tag name by at least one whitespace character. Attributes will be ignored in this assignment. - closing tags: ?</whatever?>? The
...
may be empty or whitespace characters. - singleton tags: ?<whatever?/>? The
...
may be empty or may contain one or more attributes, separated from the tag name by at least one whitespace character.
For example, here is a snippet of HTML in which I have highlighted the opening tags, closing tags, and singleton tags:
<p>
For example, here is a snippet of HTML in which
I have highlighted the <span class="high1">opening
tags</span>, <span class="high2">closing
tags</span>, and
<span class="high3">singleton tags</span>:
</p>
<hr/>
Most tags occur in pairs, in which an opening tag <whatever...>
is closed by a similar tag with a /
at the start of its name: </whatever>
. For example, web pages can get bold text this way: <b>bold</b>
and italic text this this way: <i>italic</i>
. Notice that the tag names are case sensitive in XHTML. Pairs of tags may be nested: <i>like <b>this</b></i>
to produce text _like this _, but may not overlap <b>like <big>this</b></big>
. Consequently, a tag with a /
must always match the most recently seen non-/
paired tag. That rule suggests the use of stacks.
Some tags are singletons, not occurring in pairs. These are signaled by a ?/? just before the closing >
. For example, <hr/>
produces a horizontal rule:
- Files for this assignment may be found here or, if you are logged in to one of the CS Dept. Linux machines, in
~zeil/Assignments/cs361/stack_xhtml/
. - You are given a driver for the XHTML validator. The code you are provided with will read an XHTML file and scan for any tags. It passes each tag it encounters to a
Balancer
whose job is to make sure that the tags are occurring in properly balanced pairs.
The driver is designed to take the path to the file to be checked as a command line parameter.
./xmlcheck test0.html
- You must supply the
Balancer
class. This class must use astd::stack
as its primary data member. (You may have additional simple data members,bool
s orint
s, but no containers other than the stack.) The two most critical operations theBalancer
class must support aretag(string atag)
: indicates that the driver has encountered a tag. The full text of the tag is passed as the parameteratag
. This may be an opening tag, a closing tag, or a singleton tag.status()
: returns a singleint
indicating the status of what has been observed so far:- -1 indicates that an error has been detected.
- 0 indicates that no error has been detected, but that one or more opening tags have yet to be matched
- 1 indicates that no error has been detected and there are no opening tags that have yet to be matched against a closing tag.
- As in the prior assignments, you also will be provided with unit tests for the class that you are working on (
Balancer
);
There is also a pair of test filestest0.dat
andtest1.dat
that you can use to test the fullxmlcheck
program.test0.html
should be reported as OK, buttest1.html
has mismatched tags. As always, you should conduct additional systems testing with your own test cases.
FlexLexer.h
// -*-C*-
// FlexLexer.h — define interfaces for lexical analyzer classes generated
// by flex
// Copyright (c) 1993 The Regents of the University of California.
// All rights reserved.
//
// This code is derived from software contributed to Berkeley by
// Kent Williams and Tom Epperly.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
// Neither the name of the University nor the names of its contributors
// may be used to endorse or promote products derived from this software
// without specific prior written permission.
// THIS SOFTWARE IS PROVIDED “AS IS” AND WITHOUT ANY EXPRESS OR
// IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE.
// This file defines FlexLexer, an abstract class which specifies the
// external interface provided to flex C lexer objects, and yyFlexLexer,
// which defines a particular lexer class.
//
// If you want to create multiple lexer classes, you use the -P flag
// to rename each yyFlexLexer to some other xxFlexLexer. You then
// include <FlexLexer.h> in your other sources once per lexer class:
//
// #undef yyFlexLexer
// #define yyFlexLexer xxFlexLexer
// #include <FlexLexer.h>
//
// #undef yyFlexLexer
// #define yyFlexLexer zzFlexLexer
// #include <FlexLexer.h>
// …
#ifndef __FLEX_LEXER_H
// Never included before – need to define base class.
#define __FLEX_LEXER_H
#include <iostream>
extern “C” {
struct yy_buffer_state;
typedef int yy_state_type;
class FlexLexer
{
public:
virtual ~FlexLexer() { }
const char* YYText() const { return yytext; }
int YYLeng() const { return yyleng; }
virtual void
yy_switch_to_buffer( yy_buffer_state* new_buffer ) = 0;
virtual yy_buffer_state* yy_create_buffer( std::istream* s, int size ) = 0;
virtual yy_buffer_state* yy_create_buffer( std::istream& s, int size ) = 0;
virtual void yy_delete_buffer( yy_buffer_state* b ) = 0;
virtual void yyrestart( std::istream* s ) = 0;
virtual void yyrestart( std::istream& s ) = 0;
virtual int yylex() = 0;
// Call yylex with new input/output sources.
int yylex( std::istream& new_in, std::ostream& new_out )
{
switch_streams( new_in, new_out );
return yylex();
}
int yylex( std::istream* new_in, std::ostream* new_out = 0)
{
switch_streams( new_in, new_out );
return yylex();
}
// Switch to new input/output streams. A nil stream pointer
// indicates “keep the current one”.
virtual void switch_streams( std::istream* new_in,
std::ostream* new_out ) = 0;
virtual void switch_streams( std::istream& new_in,
std::ostream& new_out ) = 0;
int lineno() const { return yylineno; }
int debug() const { return yy_flex_debug; }
void set_debug( int flag ) { yy_flex_debug = flag; }
protected:
char* yytext;
int yyleng;
int yylineno; // only maintained if you use %option yylineno
int yy_flex_debug; // only has effect with -d or “%option debug”
};
}
#endif // FLEXLEXER_H
#if defined(yyFlexLexer) || ! defined(yyFlexLexerOnce)
// Either this is the first time through (yyFlexLexerOnce not defined),
// or this is a repeated include to define a different flavor of
// yyFlexLexer, as discussed in the flex manual.
# define yyFlexLexerOnce
extern “C” {
class yyFlexLexer : public FlexLexer {
public:
// arg_yyin and arg_yyout default to the cin and cout, but we
// only make that assignment when initializing in yylex().
yyFlexLexer( std::istream& arg_yyin, std::ostream& arg_yyout );
yyFlexLexer( std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0 );
private:
void ctor_common();
public:
virtual ~yyFlexLexer();
void yy_switch_to_buffer( yy_buffer_state* new_buffer );
yy_buffer_state* yy_create_buffer( std::istream* s, int size );
yy_buffer_state* yy_create_buffer( std::istream& s, int size );
void yy_delete_buffer( yy_buffer_state* b );
void yyrestart( std::istream* s );
void yyrestart( std::istream& s );
void yypush_buffer_state( yy_buffer_state* new_buffer );
void yypop_buffer_state();
virtual int yylex();
virtual void switch_streams( std::istream& new_in, std::ostream& new_out );
virtual void switch_streams( std::istream* new_in = 0, std::ostream* new_out = 0 );
virtual int yywrap();
protected:
virtual int LexerInput( char* buf, int max_size );
virtual void LexerOutput( const char* buf, int size );
virtual void LexerError( const char* msg );
void yyunput( int c, char* buf_ptr );
int yyinput();
void yy_load_buffer_state();
void yy_init_buffer( yy_buffer_state* b, std::istream& s );
void yy_flush_buffer( yy_buffer_state* b );
int yy_start_stack_ptr;
int yy_start_stack_depth;
int* yy_start_stack;
void yy_push_state( int new_state );
void yy_pop_state();
int yy_top_state();
yy_state_type yy_get_previous_state();
yy_state_type yy_try_NUL_trans( yy_state_type current_state );
int yy_get_next_buffer();
std::istream yyin; // input source for default LexerInput
std::ostream yyout; // output sink for default LexerOutput
// yy_hold_char holds the character lost when yytext is formed.
char yy_hold_char;
// Number of characters read into yy_ch_buf.
int yy_n_chars;
// Points to current character in buffer.
char* yy_c_buf_p;
int yy_init; // whether we need to initialize
int yy_start; // start state number
// Flag which is used to allow yywrap()’s to do buffer switches
// instead of setting up a fresh yyin. A bit of a hack …
int yy_did_buffer_switch_on_eof;
size_t yy_buffer_stack_top; /**< index of top of stack. */
size_t yy_buffer_stack_max; /**< capacity of stack. */
yy_buffer_state ** yy_buffer_stack; /**< Stack as an array. */
void yyensure_buffer_stack(void);
// The following are not always needed, but may be depending
// on use of certain flex features (like REJECT or yymore()).
yy_state_type yy_last_accepting_state;
char* yy_last_accepting_cpos;
yy_state_type* yy_state_buf;
yy_state_type* yy_state_ptr;
char* yy_full_match;
int* yy_full_state;
int yy_full_lp;
int yy_lp;
int yy_looking_for_trail_begin;
int yy_more_flag;
int yy_more_len;
int yy_more_offset;
int yy_prev_more_offset;
};
}
#endif // yyFlexLexer || ! yyFlexLexerOnce
bin/Linux/xmlcheck
bin/Windows/xmlcheck.exe
lex.yy.cpp
lex.yy.cpp
#line
3
"lex.yy.cc"
#define
YY_INT_ALIGNED
short
int
/* A lexical scanner generated by flex */
#define
FLEX_SCANNER
#define
YY_FLEX_MAJOR_VERSION
2
#define
YY_FLEX_MINOR_VERSION
6
#define
YY_FLEX_SUBMINOR_VERSION
4
#if
YY_FLEX_SUBMINOR_VERSION
>
0
#define
FLEX_BETA
#endif
/* The c scanner is a mess. The FlexLexer.h header file relies on the
* following macro. This is required in order to pass the cmultiple-scanners
* test in the regression suite. We get reports that it breaks inheritance.
* We will address this in a future release of flex, or omit the C scanner
* altogether.
*/
#define
yyFlexLexer yyFlexLexer
/* First, we deal with platform-specific or compiler-specific issues. */
/* begin standard C headers. */
/* end standard C headers. */
/* flex integer type definitions */
#ifndef
FLEXINT_H
#define
FLEXINT_H
/* C99 systems have <inttypes.h>. Non-C99 systems may or may not. */
#if
defined
(
__STDC_VERSION__
)
&&
__STDC_VERSION__
>=
199901L
/* C99 says to define __STDC_LIMIT_MACROS before including stdint.h,
* if you want the limit (max/min) macros for int types.
*/
#ifndef
__STDC_LIMIT_MACROS
#define
__STDC_LIMIT_MACROS
1
#endif
#include
<
inttypes
.
h
>
typedef
int8_t flex_int8_t
;
typedef
uint8_t flex_uint8_t
;
typedef
int16_t flex_int16_t
;
typedef
uint16_t flex_uint16_t
;
typedef
int32_t flex_int32_t
;
typedef
uint32_t flex_uint32_t
;
#else
typedef
signed
char
flex_int8_t
;
typedef
short
int
flex_int16_t
;
typedef
int
flex_int32_t
;
typedef
unsigned
char
flex_uint8_t
;
typedef
unsigned
short
int
flex_uint16_t
;
typedef
unsigned
int
flex_uint32_t
;
/* Limits of integral types. */
#ifndef
INT8_MIN
#define
INT8_MIN
(
-
128
)
#endif
#ifndef
INT16_MIN
#define
INT16_MIN
(
-
32767
-
1
)
#endif
#ifndef
INT32_MIN
#define
INT32_MIN
(
-
2147483647
-
1
)
#endif
#ifndef
INT8_MAX
#define
INT8_MAX
(
127
)
#endif
#ifndef
INT16_MAX
#define
INT16_MAX
(
32767
)
#endif
#ifndef
INT32_MAX
#define
INT32_MAX
(
2147483647
)
#endif
#ifndef
UINT8_MAX
#define
UINT8_MAX
(
255U
)
#endif
#ifndef
UINT16_MAX
#define
UINT16_MAX
(
65535U
)
#endif
#ifndef
UINT32_MAX
#define
UINT32_MAX
(
4294967295U
)
#endif
#ifndef
SIZE_MAX
#define
SIZE_MAX
(
~
(
size_t
)
0
)
#endif
#endif
/* ! C99 */
#endif
/* ! FLEXINT_H */
/* begin standard C headers. */
#include
<
iostream
>
#include
<
errno
.
h
>
#include
<
cstdlib
>
#include
<
cstdio
>
#include
<
cstring
>
/* end standard C? headers. */
/* TODO: this is always defined, so inline it */
#define
yyconst
const
#if
defined
(
__GNUC__
)
&&
__GNUC__
>=
3
#define
yynoreturn __attribute__
((
__noreturn__
))
#else
#define
yynoreturn
#endif
/* Returned upon end-of-file. */
#define
YY_NULL
0
/* Promotes a possibly negative, possibly signed char to an
* integer in range [0..255] for use as an array index.
*/
#define
YY_SC_TO_UI
(
c
)
((
YY_CHAR
)
(
c
))
/* Enter a start condition. This macro really ought to take a parameter,
* but we do it the disgusting crufty way forced on us by the ()-less
* definition of BEGIN.
*/
#define
BEGIN
(
yy_start
)
=
1
2
*
/* Translate the current start state into a value that can be later handed
* to BEGIN to return to the state. The YYSTATE alias is for lex
* compatibility.
*/
#define
YY_START
(((
yy_start
)
-
1
)
/
2
)
#define
YYSTATE YY_START
/* Action number for EOF rule of a given start state. */
#define
YY_STATE_EOF
(
state
)
(
YY_END_OF_BUFFER
state
1
)
/* Special action meaning "start processing a new file". */
#define
YY_NEW_FILE yyrestart
(
yyin
)
#define
YY_END_OF_BUFFER_CHAR
0
/* Size of default input buffer. */
#ifndef
YY_BUF_SIZE
#ifdef
__ia64__
/* On IA-64, the buffer size is 16k, not 8k.
* Moreover, YY_BUF_SIZE is 2*YY_READ_BUF_SIZE in the general case.
* Ditto for the __ia64__ case accordingly.
*/
#define
YY_BUF_SIZE
32768
#else
#define
YY_BUF_SIZE
16384
#endif
/* __ia64__ */
#endif
/* The state buf must be large enough to hold one state per character in the main buffer.
*/
#define
YY_STATE_BUF_SIZE
((
YY_BUF_SIZE
2
)
*
sizeof
(
yy_state_type
))
#ifndef
YY_TYPEDEF_YY_BUFFER_STATE
#define
YY_TYPEDEF_YY_BUFFER_STATE
typedef
struct
yy_buffer_state
*
YY_BUFFER_STATE
;
#endif
#ifndef
YY_TYPEDEF_YY_SIZE_T
#define
YY_TYPEDEF_YY_SIZE_T
typedef
size_t yy_size_t
;
#endif
extern
int
yyleng
;
#define
EOB_ACT_CONTINUE_SCAN
0
#define
EOB_ACT_END_OF_FILE
1
#define
EOB_ACT_LAST_MATCH
2
#define
YY_LESS_LINENO
(
n
)
#define
YY_LINENO_REWIND_TO
(
ptr
)
/* Return all but the first "n" matched characters back to the input stream. */
#define
yyless
(
n
)
do
{
/* Undo effects of setting up yytext. */
int
yyless_macro_arg
=
(
n
);
YY_LESS_LINENO
(
yyless_macro_arg
);
*
yy_cp
=
(
yy_hold_char
);
YY_RESTORE_YY_MORE_OFFSET
(
yy_c_buf_p
)
=
yy_cp
=
yy_bp
yyless_macro_arg
-
YY_MORE_ADJ
;
YY_DO_BEFORE_ACTION
;
/* set up yytext again */
}
while
(
0
)
#define
unput
(
c
)
yyunput
(
c
,
(
yytext_ptr
)
)
#ifndef
YY_STRUCT_YY_BUFFER_STATE
#define
YY_STRUCT_YY_BUFFER_STATE
struct
yy_buffer_state
{
std
::
streambuf
*
yy_input_file
;
char
*
yy_ch_buf
;
/* input buffer */
char
*
yy_buf_pos
;
/* current position in input buffer */
/* Size of input buffer in bytes, not including room for EOB
* characters.
*/
int
yy_buf_size
;
/* Number of characters read into yy_ch_buf, not including EOB
* characters.
*/
int
yy_n_chars
;
/* Whether we "own" the buffer - i.e., we know we created it,
* and can realloc() it to grow it, and should free() it to
* delete it.
*/
int
yy_is_our_buffer
;
/* Whether this is an "interactive" input source; if so, and
* if we're using stdio for input, then we want to use getc()
* instead of fread(), to make sure we stop fetching input after
* each newline.
*/
int
yy_is_interactive
;
/* Whether we're considered to be at the beginning of a line.
* If so, '^' rules will be active on the next match, otherwise
* not.
*/
int
yy_at_bol
;
int
yy_bs_lineno
;
/**< The line count. */
int
yy_bs_column
;
/**< The column count. */
/* Whether to try to fill the input buffer when we reach the
* end of it.
*/
int
yy_fill_buffer
;
int
yy_buffer_status
;
#define
YY_BUFFER_NEW
0
#define
YY_BUFFER_NORMAL
1
/* When an EOF's been seen but there's still some text to process
* then we mark the buffer as YY_EOF_PENDING, to indicate that we
* shouldn't try reading from the input source any more. We might
* still have a bunch of tokens to match, though, because of
* possible backing-up.
*
* When we actually see the EOF, we change the status to "new"
* (via yyrestart()), so that the user can continue scanning by
* just pointing yyin at a new input file.
*/
#define
YY_BUFFER_EOF_PENDING
2
};
#endif
/* !YY_STRUCT_YY_BUFFER_STATE */
/* We provide macros for accessing buffer states in case in the
* future we want to put the buffer states in a more general
* "scanner state".
*
* Returns the top of the stack, or NULL.
*/
#define
YY_CURRENT_BUFFER
(
(
yy_buffer_stack
)
?
(
yy_buffer_stack
)[(
yy_buffer_stack_top
)]
:
NULL
)
/* Same as previous macro, but useful when we know that the buffer stack is not
* NULL or when we need an lvalue. For internal use only.
*/
#define
YY_CURRENT_BUFFER_LVALUE
(
yy_buffer_stack
)[(
yy_buffer_stack_top
)]
void
*
yyalloc
(
yy_size_t
);
void
*
yyrealloc
(
void
*
,
yy_size_t
);
void
yyfree
(
void
*
);
#define
yy_new_buffer yy_create_buffer
#define
yy_set_interactive
(
is_interactive<