for DC Perl Mongers, 4 February 2003
Documentation | Code | Samples |
|
|
|
If you are looking at these on your own, you'll need to understand that several of the samples files refer to older versions of this module, even all the way back to its original incarnation as a helper built on SGML::SPGrove (which is now XML::Grove). The main differences are in accessing the parsed data, but there are some cosmetic changes here and there (and of course inability to use newer capabilities). You can get the SGML::ElementMap module and its respective submodules, examples and documentation from this directory. Don't forget to get the highest version. That directory also contains these notes and files compressed into a single archive. |
Mention:
The processing model
A code sample for OmniMark and ElementMap. This fragment takes elements from "<ext.xref pointer='ARTICLE_ID'>content</ext.xref>
" to "<ext.xref vol.no='NUM' collection='NUM'>content</ext.xref>
"
;; omnimark code
element ext.xref
local counter junk
and stream volnum
and stream colnum
and switch successful
output "<%lq"
repeat over specified attributes as spec-attr
output " "
output key of attribute spec-attr
output "=%"%hv(spec-attr)%""
again
activate successful
reset junk to system-call "%g(idcommand) --format='vol.no=%%v col.no=%%c' --save-output=%g(TempFile) %v(pointer)"
do unless file "%g(TempFile)" exists
deactivate successful
put #error "Warning: auto-generated file %g(TempFile) not found%n"
increment ErrorCount
else
do scan file "%g(TempFile)"
match "vol.no=" (letter or digit)+ => vol white-space+ "col.no=" (letter or digit)+ => col
set buffer volnum to "%x(vol)"
set buffer colnum to "%x(col)"
else
deactivate successful
put #error "Warning: auto-generated file %g(TempFile) is invalid: [%v(pointer)]%n"
increment ErrorCount
done
;reset junk to system-call "rm %g(TempFile)"
done
do when not active successful
set buffer volnum to "unknown"
set buffer colnum to "unknown"
done
output " vol.no=%"%g(volnum)%" collection=%"%g(colnum)%""
output ">%c"
output "%lq%/>"
Omnimark
# this is just thrown together from the omnimark code: there might be errors
$e->element('EXT.XREF', sub {
my ($engine, $element) = @_;
my ($attrs, $successful, $line, $volnum, $colnum);
$output->print("<".$element->{'Name'}." ");
$attrs = $element->{'Attributes'}
foreach (@$attrs) {
$output->print(" " . $_ . '="' . $attrs->{$_} . '"');
}
system($idcommand, "--format='vol.no=%v col.no=%c'",
"--save-output=".$TempFile, $attrs->{'pointer'});
OK: {
$successful = 0;
if (! -f $TempFile) {
warn "Warning: auto-generated file ".$TempFile." not found\n";
$ErrorCount += 1;
last OK;
}
$line = <$TempFile>;
if ($line && $line =~ m/vol\.no=(\w+)\s+col\.no=(\w+)/) {
$volnum = $1;
$colnum = $2;
} else {
warn "Warning: auto-generated file " . $TempFile .
" is invalid: [" . $attrs->{'pointer'} . "]\n";
$ErrorCount += 1;
last OK;
}
$successful = 1;
}
if (!$successful) {
$volnum = $colnum = 'unknown';
}
$output->print(' vol.no="'.$volnum.'" collection="'.$colnum.'">');
$engine->process_content;
$output->print("".$element->{'Name'}.">");
});
Other processors
Should read the SGML::ElementMap documentation and start looking at the ElementMap.pm code
Why use constants for object data reference?
What do we do with handlers?
What do the main objects look like? (Notice the colons. This is kind of structure describing pseudo-perl. Nothing formal or correct.)
mode : {
'handler_type' => handler_set : {
'NAME' or '' => handler_pair : [ pattern, handler_ref ]
$mode = { '_ MODENAME ' => 'FOO',
'_ FINALIZE ' => '',
'Element' => {
'PARA' => [ '.*/SECTION/.*', \§ion_para ],
'' => [ '', \&no_handler_warning ] },
'CData' => {
'' => [ '', \&data_accumulate ] },
};
Mode
$main = [
$state_data,
$all_modes,
$global_vars,
$stack_vars
];
$state_data = [
driver : SGML::ElementMap::Driver
node_path : ''
handler_modes : [ $mode, $mode_2, $mode_3, ... ]
handler_mode_stack : [ $mode_set_1, $mode_set_2, ... ]
named_handlers : { 'NAME' => \&handler }
last_gen_name : 'aaa'
];
$all_modes = { 'MODE_NAME_1' => $mode,
'MODE_NAME_2' => $mode_2 };
$global_vars = { 'NAME' => $some_value };
$stack_vars = Hash::Layered;
Why global variable support?
Why stack variable support?
Different processors need different interfaces to work with them
# these can default to Driver methods
$d->input($type); # 'file' 'literal' 'handle' etc.
$d->markup($type); # 'xml' or 'sgml'
$d->parser($parser_object);
$d->process_xml_file($elementmap, $file, @handler_args);
$d->process_sgml_file($elementmap, $file, @handler_args);
# these must be implemented in Driver sub-classes
$d->process(...);
$d->reparent_current_subtree($new_el_name, @attribute_pairss);
$d->reparent_subtree($new_el_name, @attribute_pairss);
$d->dispatch_subtrees($elementmap, $pattern, @handler_args);
$d->skip_subtrees();
$d->context_path();
Some of the drivers have a lot in common: the simple event based ones. So we have Driver::EventQueue
Sample execution:
$h->set_default('cascade');
$h->{'a'} = 31;
$h->{'b'} = 32;
$h->{'c'} = 33;
$h->push;
cascade | a | b | c | d | e |
default | 31 | 32 | 33 | ||
default |
assert($h->{'a'} == 31);
$h->set_layer('opaque');
assert(! defined $h->{'a'});
$h->{'c'} = 34;
$h->{'d'} = 35;
cascade | a | b | c | d | e |
default | 31 | 32 | 33 | ||
opaque | 34 | 35 |
$h->set_layer('default')
assert($h->{'a'} == 31);
assert($h->{'b'} == 32);
assert($h->{'c'} == 34);
assert($h->{'d'} == 35);
$h->push;
$h->{'a'} = 36
$h->set_layer('oneway');
$h->{'e'} = 37
$h->{'a'} = 38
cascade | a | b | c | d | e |
default | 36 | 32 | 33 | ||
default | 34 | 35 | |||
oneway | 38 | 37 |
assert $h->{'a'} == 38
assert $h->{'b'} == 32
assert $h->{'e'} == 37
$h->pop
assert !defined $h->{'e'}
$h->pop
assert $h->{'a'} 36
assert $h->{'b'} 32
assert $h->{'c'} 33
assert !defined $h->{'d'}
Want to use the object as a hash reference, but still have access to object methods. I initially tried this with a single object; however, that did not work. I don't have notes, unfortunately, but I think the issue was getting the data structure out to work with. Using a single object, it's more difficult to tell when a method is called if it needs to call tied (note that the hash ref and the object ref will be blessed to the same object, so ref() won't help). Using two objects makes this very easy.
(Note: Haven't converted to use sub constants for object fields.)
Have two places for behavior settings
Layers have IDs
intervening_layer($target_index, $is_write)
behaviors for the hash
OK, OK, how does it work?
$layered_hash = [
$default_layer_state : 'cascade'
$layer_data_count : -1
$layer_data_list : [ $layer_data_1, $layer_data_2, ... ]
$var_val_stack_hash : { }
$iter_data : [ [ keys], key_index, intervening_layer_for_reads]
];
$layer_data = [
$sub_id,
$behavior,
'VAR1', # list of all variables that have values in this layer
'VAR2',
...
];
$var_val_stack_hash = {
'VAR1' => [ $layer_index_1, $val_1, $layer_index_2, $val_2, ... ]
...
};
Huh?
Lookup of a key
Iteration