Copyright © 2003, 2004, 2005, 2006, 2007 Igor Russkih (Cail Lomecb)
Abstract
Example 1. Common HRC file
<?xml version="1.0"?>
<!DOCTYPE hrc PUBLIC "-//Cail Lomecb//DTD Colorer HRC take5//EN"
"http://colorer.sf.net/2003/hrc.dtd">
<hrc version="take5" xmlns="http://colorer.sf.net/2003/hrc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://colorer.sf.net/2003/hrc
http://colorer.sf.net/2003/hrc.xsd">
<annotation>
<documentation>
your documentation...
</documentation>
</annotation>
your definitions...
</hrc>
<import type='def'/>to import all definitions from the 'def' type. Note, that if several imported types have some identical local names, they are resolved in order of import statements, i.e. the first one is used.
Example 4. Sample type definition
<type name="somelang">
<region name="Keyword" description="This language's keyword"/>
<scheme name="somelang">
<keywords region="Keyword">
<word name='word1'/><word name='word2'/>
<word name='otherkeyword'/>
</keywords>
<regexp match="/other(keyword)?/i" region="Keyword"/>
</scheme>
</type>
Table A.1. Metacharacters
| ^ | Match the beginning of the line |
| $ | Match the end of the line |
| . | Match any character (except \r\n) |
| [...] | Match any character in set |
| [^...] | Match any character that is not in set.
None of RE operators works here, but you some metacharacters and range operator
are possible: a-z stands for all alphabet chars between a and z,
Unicode class reference could be used from RE in form of
[{ASSIGNED}-[{Lu}]-[{Ll}]].
Additional boolean operations: -[] - Class substraction.
&&[] - Class intersection.
See Unicode RE TR#18 for more information.
|
| \# | The symbol '#' after slash (except a-z and 1-9) |
| \b | Word break at this point |
| \B | No word break at this point |
| \xHH, \x{HHHH} | HH, HHHH - character code (hex) |
| \n | 0x10 (lf) |
| \r | 0x13 (cr) |
| \t | 0x09 (tab) |
| \s | Whitespace character (tab/space/cr/lf) |
| \S | Not whitespace |
| \w | Word symbol (chars, digits, _) |
| \W | Not word symbols |
| \d | Digit |
| \D | Not Digit |
| \u | Uppercase symbol |
| \l | Lowercase symbol |
Table A.2. Extended Metacharacters
| \c | Means 'not word' before |
| \N | Reference from inside of regexp to one of its brackets. N - the number of brackets pair. This operator works only with non-operator symbols in a bracket. |
Table A.3. Colorer-take5 Parsing Metacharacters
| ~ | Matches for the start of parent scheme (end of <start> tag). |
| \m | Changes start of regexp |
| \M | Changes end of regexp |
| \yN \YN \y{name} \Y{name} | Link to the external regexp (in <end> token to <start> token param). N - required bracket pair, name - named bracket. |
Table A.4. Operators
| ( ) | Group and remember characters for later use. |
| (?{name} ) | Group and remember characters using named group. |
| (?{} ) or (?: ) | Group characters, but don't remember (unnamed group). |
| (?{} ) | Group and remember characters using unnamed uncounted group. |
| | | Alternative. Match previous or next pattern. |
| * | Match preceeding pattern 0 or more times. |
| + | Match preceeding pattern 1 or more times. |
| ? | Match preceeding pattern 0 or 1 time. |
| {n} | Repeat n times. |
| {n,} | Repeat n or more times. |
| {n,m} | Repeat from n to m times. |
Table A.5. Extended Operators
| ?#N | Look-behind. N - symbol number to look behind. |
| ?~N | Negative look-behind. |
| ?= | Look-ahead. |
| ?! | Negative Look-ahead. |
Example A.1. RE examples
<schema targetNamespace="http://colorer.sf.net/2003/catalog" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <element name="catalog" type="catalog"/> <complexType name="catalog"> <sequence> <element name="hrc-sets" type="hrc-sets"/> <element name="hrd-sets" type="hrd-sets"/> </sequence> </complexType> <complexType name="hrc-sets"> <sequence> <element name="location" type="location" maxOccurs="unbounded"/> </sequence> <attribute name="log-location" type="xs:string"> </attribute> </complexType> <complexType name="hrd-sets"> <sequence> <element name="hrd" type="hrd-entry" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> <complexType name="hrd-entry"> <sequence> <element name="location" type="location" maxOccurs="unbounded"/> </sequence> <attribute name="class" type="xs:NMTOKEN" use="required"> </attribute> <attribute name="name" type="xs:NMTOKEN" use="required"> </attribute> <attribute name="description" type="xs:string"> </attribute> </complexType> <complexType name="location"> <attribute name="link" type="xs:string" use="required"/> </complexType> </schema>
<schema targetNamespace="http://colorer.sf.net/2003/hrd" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <element name="hrd" type="hrd"/> <complexType name="hrd"> <sequence> <element name="documentation" type="documentation" minOccurs="0"/> <sequence minOccurs="0" maxOccurs="unbounded"> <element name="assign" type="assign"/> </sequence> </sequence> </complexType> <complexType name="documentation" mixed="true"> <sequence minOccurs="0" maxOccurs="unbounded"> <any namespace="##other" processContents="skip"/> </sequence> </complexType> <complexType name="assign"> <attribute name="name" use="required" type="region-name"> </attribute> <attribute name="fore" type="color"> </attribute> <attribute name="back" type="color"> </attribute> <attribute name="style" type="style"> </attribute> <attribute name="stext" type="xs:string"> </attribute> <attribute name="etext" type="xs:string"> </attribute> <attribute name="sback" type="xs:string"> </attribute> <attribute name="eback" type="xs:string"> </attribute> </complexType> <simpleType name="region-name"> <restriction base="xs:string"> <pattern value="\i\c*\:\i\c*"/> </restriction> </simpleType> <simpleType name="color"> <restriction base="xs:string"> <pattern value="#?[\dA-Fa-f]{1,6}"/> </restriction> </simpleType> <simpleType name="style"> <restriction base="xs:string"> <pattern value="\d"/> </restriction> </simpleType> </schema>
<schema targetNamespace="http://colorer.sf.net/2003/hrc" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <simpleType name="REstring"> <restriction base="xs:string"> <whiteSpace value="collapse"/> <pattern value="/.*/[ix]*"/> </restriction> </simpleType> <simpleType name="REworddiv"> <restriction base="xs:string"> <whiteSpace value="collapse"/> <pattern value="\[.*\]|%.*;"/> </restriction> </simpleType> <simpleType name="REentity"> <restriction base="xs:string"> <whiteSpace value="collapse"/> <pattern value=".*"/> </restriction> </simpleType> <simpleType name="REstring-or-null"> <union memberTypes="REstring"> <simpleType> <restriction base="xs:string"> <enumeration value=""/> </restriction> </simpleType> </union> </simpleType> <simpleType name="QName"> <restriction base="xs:QName"> <pattern value="(\i\c*:)?\i\c*"/> </restriction> </simpleType> <attributeGroup name="regionX"> <attribute name="region" type="QName"/> <attribute name="region0" type="QName"/> <attribute name="region1" type="QName"/> <attribute name="region2" type="QName"/> <attribute name="region3" type="QName"/> <attribute name="region4" type="QName"/> <attribute name="region5" type="QName"/> <attribute name="region6" type="QName"/> <attribute name="region7" type="QName"/> <attribute name="region8" type="QName"/> <attribute name="region9" type="QName"/> <attribute name="regiona" type="QName"/> <attribute name="regionb" type="QName"/> <attribute name="regionc" type="QName"/> <attribute name="regiond" type="QName"/> <attribute name="regione" type="QName"/> <attribute name="regionf" type="QName"/> </attributeGroup> <element name="hrc" type="hrc"/> <complexType name="hrc"> <sequence> <element name="annotation" type="annotation" minOccurs="0"/> <choice minOccurs="0" maxOccurs="unbounded"> <element name="prototype" type="prototype"/> <element name="package" type="package"/> <element name="type" type="type"/> </choice> </sequence> <attribute name="version" type="xs:NMTOKEN" use="required"> </attribute> </complexType> <complexType name="annotation"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="appinfo"> <complexType mixed="true"> <sequence minOccurs="0" maxOccurs="unbounded"> <any namespace="##other" processContents="lax"/> </sequence> </complexType> </element> <element name="documentation"> <complexType mixed="true"> <sequence minOccurs="0" maxOccurs="unbounded"> <any namespace="##other" processContents="skip"/> </sequence> </complexType> </element> <element name="contributors"> <complexType mixed="true"> <sequence minOccurs="0" maxOccurs="unbounded"> <any namespace="##other" processContents="lax"/> </sequence> </complexType> </element> </choice> </complexType> <complexType name="package"> <sequence> <element name="annotation" type="annotation" minOccurs="0"/> <element name="location" type="location" minOccurs="0"/> </sequence> <attribute name="name" type="xs:NCName" use="required"> </attribute> <attribute name="description" type="xs:string" use="required"> </attribute> <attribute name="targetNamespace" type="xs:anyURI"> </attribute> </complexType> <complexType name="prototype"> <sequence> <element name="annotation" type="annotation" minOccurs="0"/> <element name="location" type="location" minOccurs="0"/> <element name="filename" type="filename" minOccurs="0" maxOccurs="unbounded"/> <element name="firstline" type="firstline" minOccurs="0" maxOccurs="unbounded"/> <element name="parameters" type="parameters" minOccurs="0"/> </sequence> <attribute name="name" type="xs:NCName" use="required"> </attribute> <attribute name="description" type="xs:string" use="required"> </attribute> <attribute name="group" type="xs:Name"> </attribute> <attribute name="targetNamespace" type="xs:anyURI"> </attribute> </complexType> <complexType name="location"> <attribute name="link" type="xs:anyURI" use="required"/> </complexType> <complexType name="filename"> <simpleContent> <extension base="REstring"> <attribute name="weight" type="xs:decimal" default="2"> </attribute> </extension> </simpleContent> </complexType> <complexType name="firstline"> <simpleContent> <extension base="REstring"> <attribute name="weight" type="xs:decimal" default="1"> </attribute> </extension> </simpleContent> </complexType> <complexType name="parameters"> <sequence minOccurs="0" maxOccurs="unbounded"> <element name="param"> <complexType> <attribute name="name" type="xs:string" use="required"/> <attribute name="value" type="xs:string" use="required"/> <attribute name="description" type="xs:string" use="optional"/> </complexType> </element> </sequence> </complexType> <complexType name="type"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="annotation" type="annotation"/> <element name="import" type="import"/> <element name="region" type="region"/> <element name="entity" type="entity"/> <element name="scheme" type="scheme"/> </choice> <attribute name="name" type="xs:NCName" use="required"> </attribute> </complexType> <complexType name="scheme"> <sequence> <element name="annotation" type="annotation" minOccurs="0"/> <choice minOccurs="0" maxOccurs="unbounded"> <element name="regexp" type="regexp"/> <element name="block" type="block"/> <element name="keywords" type="keywords"/> <element name="inherit" type="inherit"/> </choice> </sequence> <attribute name="name" type="xs:NCName" use="required"> </attribute> <attribute name="if" type="xs:NCName" use="optional"> </attribute> <attribute name="unless" type="xs:NCName" use="optional"> </attribute> </complexType> <complexType name="import"> <attribute name="type" type="xs:NCName" use="required"/> </complexType> <complexType name="entity"> <attribute name="name" type="xs:NCName" use="required"> </attribute> <attribute name="value" type="REentity" use="required"> </attribute> </complexType> <complexType name="region"> <attribute name="name" type="xs:NCName" use="required"> </attribute> <attribute name="parent" type="QName"> </attribute> <attribute name="description" type="xs:string"> </attribute> </complexType> <complexType name="regexp"> <complexContent> <extension base="blockInner"> <attribute name="region" type="QName"/> <attribute name="priority" type="priority" default="normal"/> </extension> </complexContent> </complexType> <simpleType name="priority"> <restriction base="xs:string"> <enumeration value="low"/> <enumeration value="normal"/> </restriction> </simpleType> <complexType name="block"> <sequence minOccurs="0"> <element name="start" type="blockInner"/> <element name="end" type="blockInner"/> </sequence> <attribute name="start" type="REstring"/> <attribute name="end" type="REstring"/> <attribute name="scheme" type="QName" use="required"/> <attribute name="priority" type="priority" default="normal"/> <attribute name="content-priority" type="priority" default="normal"/> <attribute name="inner-region" default="no"> <simpleType> <restriction base="xs:string"> <enumeration value="yes"/> <enumeration value="no"/> </restriction> </simpleType> </attribute> <attributeGroup ref="regionXX"/> </complexType> <attributeGroup name="regionXX"> <attribute name="region" type="QName"/> <attribute name="region00" type="QName"/> <attribute name="region01" type="QName"/> <attribute name="region02" type="QName"/> <attribute name="region03" type="QName"/> <attribute name="region04" type="QName"/> <attribute name="region05" type="QName"/> <attribute name="region06" type="QName"/> <attribute name="region07" type="QName"/> <attribute name="region08" type="QName"/> <attribute name="region09" type="QName"/> <attribute name="region0a" type="QName"/> <attribute name="region0b" type="QName"/> <attribute name="region0c" type="QName"/> <attribute name="region0d" type="QName"/> <attribute name="region0e" type="QName"/> <attribute name="region0f" type="QName"/> <attribute name="region10" type="QName"/> <attribute name="region11" type="QName"/> <attribute name="region12" type="QName"/> <attribute name="region13" type="QName"/> <attribute name="region14" type="QName"/> <attribute name="region15" type="QName"/> <attribute name="region16" type="QName"/> <attribute name="region17" type="QName"/> <attribute name="region18" type="QName"/> <attribute name="region19" type="QName"/> <attribute name="region1a" type="QName"/> <attribute name="region1b" type="QName"/> <attribute name="region1c" type="QName"/> <attribute name="region1d" type="QName"/> <attribute name="region1e" type="QName"/> <attribute name="region1f" type="QName"/> </attributeGroup> <complexType name="blockInner"> <simpleContent> <extension base="REstring"> <attributeGroup ref="regionX"/> <attribute name="match" type="REstring"> </attribute> </extension> </simpleContent> </complexType> <complexType name="inherit"> <sequence> <element name="virtual" type="virtual" minOccurs="0" maxOccurs="unbounded"/> </sequence> <attribute name="scheme" type="QName" use="required"> </attribute> </complexType> <complexType name="virtual"> <attribute name="scheme" type="QName" use="required"> </attribute> <attribute name="subst-scheme" type="QName" use="required"> </attribute> </complexType> <complexType name="keywords"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="word" type="word"/> <element name="symb" type="symb"/> </choice> <attribute name="ignorecase" default="yes"> <simpleType> <restriction base="xs:string"> <enumeration value="yes"/> <enumeration value="no"/> </restriction> </simpleType> </attribute> <attribute name="region" type="QName"> </attribute> <attribute name="priority" type="priority" default="low"/> <attribute name="worddiv" type="REworddiv"> </attribute> </complexType> <complexType name="symb"> <attribute name="name" type="xs:string" use="required"/> <attribute name="region" type="QName"/> </complexType> <complexType name="word"> <attribute name="name" type="xs:string" use="required"/> <attribute name="region" type="QName"/> </complexType> </schema>
take5.be5, 26 April 2007
* disambiquation concerning language features and conventions + finish rewriting - file still contains some other ideas for improvement in comments + rephrasings and fixes to chapter 5 and more + proofreading fixes up to part 5 - coding conventions + rephrase regex notes and document region features of the block * minor clarifications core syntax -> basics + Igor remarks quickly go with edit fixes till the end of 3rd chapter rewrote keyword lists and RE descriptions process attributes derived from extended types spellcheck reduce amount of text to be placed in reader's buffer to gain an idea of connection between scheme and region name space clarifications (not complete yet) consistent description of what type is rewrap a package story continues - expanded prototypes language (confusing, eh?) - simplify core syntax - mess with the rest core syntax - two parts explanations simplified reworded introduction rephrased abstract + ids in examples and tables to fix compiler warnings replace confusing entities with meaningful names display element type only if it is different from element name add angle brackets to make XML nature of HRC elements obvious it doesn't make sense to output complexType element name twice consistent ids also for xsd reference generate consistent ids to be referenced from manual, add x:hrc reference element move 'parameters' complexType to be found by xslt for reference
take5.beta4, 28 April 2005
[XML 1.0] Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), 2000.
[XSLT 1.0] James Clark, editor. XSL Transformations (XSLT) 1.0. W3C (World Wide Web Consortium), 1999.
[W3C XML Schema Structures] Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn, editors. XML Schema Part 1: Structures. W3C (World Wide Web Consortium), 2001.
[W3C XML Schema Datatypes] Paul V. Biron, Ashok Malhotra, editors. XML Schema Part 2: Datatypes. W3C (World Wide Web Consortium), 2001.