OpenXRI:Syntax 3
From I-names Development Wiki
The OpenXRI Syntax library offers methods for parsing XRIs. It basically allows an application to parse a string identifier, and obtain information about the identifier (e.g. the number of segments it consists of, or the value of the path component, cross-references, etc.).
All operations performed by the Syntax library are purely string-based, i.e. there is no network access or other external operation.
The current OpenXRI Syntax library is based on the XRI Syntax 2.0 specification:
http://www.oasis-open.org/committees/download.php/15377
In order to support upcoming XRI resolution efforts as well as XDI implementations, the library needs to be updated to reflect the new XRI Syntax 3.0 specification. This is not yet a formal OASIS specification, but a proposed ABNF exists:
http://wiki.oasis-open.org/xri/XriThree/SyntaxAbnf
This ABNF is the basis for the new OpenXRI Syntax library.
Contents |
Features
- The library’s main purpose is to take a string as input, parse it, and return a Java object that represents the parsed XRI or XRI component.
- Therefore, the library offers classes for XRIs and the main components of XRIs. These classes roughly correspond to rule names in the ABNF.
- List of classes:
- XRI3
- XRI3Reference
- XRI3Authority
- XRI3Segment
- XRI3SubSegment
- XRI3Path
- XRI3Query
- XRI3Fragment
- XRI3Literal
- XRI3XRef
- For each of these classes, an instance can be constructed from a string using a constructor.
- For each of these classes, are convenient methods for obtaining relevant information about the XRI or XRI component, and for listing / getting sub-components (if relevant). Examples are:
- XRI.getAuthority()
- XRI.getPath()
- XRI.getQuery()
- XRI.getFragment()
- XRIAuthority.listSubSegments()
- XRISubSegment.isGlobal()
- XRISubSegment.isLocal()
- XRISubSegment.hasLiteral()
- XRISubSegment.hasReference()
- XRIPath.listSegments()
- The library is able to construct new instances of XRIs and XRI components not only from a string, but also from other XRI component instances. For example, given an XRI and a relative XRI reference, it is possible to construct a new XRI.
- The library is able to convert XRIs to IRI normal form and URI normal form. Note: IRI normal form and URI normal form are string representations of an XRI that conform to special rules (e.g. in URI normal form international characters are encoded).
- All classes representing XRIs or XRI components implement Serializable and Comparable, and they implement meaningful equals(), hashCode(), toString() and compareTo() methods, which simply operate on the underlying string of the object.
- The library is able to check if an XRI is a valid i-name or i-number as defined by the rules in sections 4.2 and 4.3 of the XDI.org Global Services Specifications (XDI.org is the public trust organization responsible for the XRI global registry services (GRS)). I-names and i-numbers are XRIs that conform to a small number of character restrictions for usability and security.
- The library can cast a URI to an XRI (using a constructor).
- The library contains some constants (static object instances) that represent well-known XRI components that are inherent to XRI infrastructure.
- If a string cannot be parsed, an exception is thrown.
The following functions are candidates for inclusion:
- Support creation of random local i-number subsegments like !B7BD.2A1D.1040.58CD
- Provide compatibility support for 2.0 syntax, e.g. a method that attempts to convert a 2.0 XRI to a 3.0 XRI and vice versa
Test cases
This section contains a number of sample XRIs in 3.0 syntax plus a list of information the OpenXRI Syntax library is able to infer.
- =peacekeeper
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is = and whose value is a literal
- is an i-name
- xri:@cordance
- is an XRI with scheme
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is @ and whose value is a literal
- is an i-name
- @cordance*drummond
- authority consists of 2 subsegments
- 1st subsegment is a global subsegment, whose gcs is @ and whose value is a literal
- 2nd subsegment is a local subsegment, whose lcs is * and whose value is a literal
- is an i-name
- @cordance/+hr
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is @ and whose value is a literal
- path consists of 1 segment
- 1st segment of path consists of 1 subsegment
- 1st subsegment of 1st segment of path is a global subsegment, whose gcs is + and whose value is a literal
- @cordance/(+hr)
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is @ and whose value is a literal
- path consists of 1 segment
- 1st segment of path consists of 1 subsegment
- 1st subsegment of 1st segment of path is a cross reference
- @cordance/documentation/xri?page=overview#introduction
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is @ and whose value is a literal
- path consists of 2 segments
- 1st segment of path consists of 1 subsegment
- 1st subsegment of 1st segment of path is a literal
- 2nd segment of path consists of 1 subsegment
- 1st subsegment of 2nd segment of path is a literal
- has query "page=overview"
- has fragment "introduction"
- +!123
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is + and whose value is a local subsegment, whose lcs is ! and whose value is a literal
- is an i-number
- =!B7BD.2A1D.1040.58CD!2000
- authority consists of 2 subsegments
- 1st subsegment is a global subsegment, whose gcs is = and whose value is a local subsegment, whose lcs is ! and whose value is a literal
- 2nd subsegment is a local subsegment, whose lcs is ! and whose value is a literal
- is an i-number
- +person
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is + and whose value is a literal
- +person+name
- authority consists of 2 subsegments
- 1st subsegment is a global subsegment, whose gcs is + and whose value is a literal
- 2nd subsegment is a global subsegment, whose gcs is + and whose value is a literal
- +person+address+street
- authority consists of 3 subsegments
- 1st subsegment is a global subsegment, whose gcs is + and whose value is a literal
- 2nd subsegment is a global subsegment, whose gcs is + and whose value is a literal
- 3rd subsegment is a global subsegment, whose gcs is + and whose value is a literal
- +person/$has/+name
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is + and whose value is a literal
- path consists of 2 segments
- 1st segment of path consists of 1 subsegment
- 1st subsegment of 1st segment of path is a global subsegment, whose gcs is $ and whose value is a literal
- 2nd segment of path consists of 1 subsegment
- 1st subsegment of 2nd segment of path is a global subsegment, whose gcs is + and whose value is a literal
- =markus/$is$a/+person
- authority consists of 1 subsegment
- 1st subsegment is a global subsegment, whose gcs is = and whose value is a literal
- path consists of 2 segments
- 1st segment of path consists of 2 subsegments
- 1st subsegment of 1st segment of path is a global subsegment, whose gcs is $ and whose value is a literal
- 2nd subsegment of 1st segment of path is a global subsegment, whose gcs is $ and whose value is a literal
- 2nd segment of path consists of 1 subsegment
- 1st subsegment of 2nd segment of path is a global subsegment, whose gcs is + and whose value is a literal
- +!15+!16$v!3
- authority consists of 4 subsegments
- 1st subsegment is a global subsegment, whose gcs is + and whose value is a local subsegment, whose lcs is ! and whose value is a literal
- 2nd subsegment is a global subsegment, whose gcs is + and whose value is a local subsegment, whose lcs is ! and whose value is a literal
- 3rd subsegment is a global subsegment, whose gcs is $ and whose value is a literal
- 4th subsegment is a local subsegment, whose lcs is ! and whose value is a literal
- $type*mime+text
- authority consists of 3 subsegments
- 1st subsegment is a global subsegment, whose gcs is $ and whose value is a literal
- 2nd subsegment is a local subsegment, whose lcs is * and whose value is a literal
- 3rd subsegment is a global subsegment, whose gcs is + and whose value is a literal
- $is$type+(http://schemas.xmlsoap.org)
- authority consists of 3 subsegments
- 1st subsegment is a global subsegment, whose gcs is $ and whose value is a literal
- 2nd subsegment is a global subsegment, whose gcs is $ and whose value is a literal
- 3rd subsegment is a global subsegment, whose gcs is + and whose value is a cross reference, which is an IRI
- +name and +first
- +name is a valid XRI
- +first is a valid XRI reference
- --> new XRI +name+first can be constructed
- = and http://markus.myopenid.net
- = is a valid gcs
- http://markus.myopenid.net is a valid URI
- --> new XRI =(http://markus.myopenid.net) can be constructed
- =!B7BD.2A1D.1040.58CD and !2000
- =!B7BD.2A1D.1040.58CD is a valid XRI authority
- !2000 is a valid XRI subsegment
- --> new XRI authority =!B7BD.2A1D.1040.58CD!2000 can be constructed
- = and !B7BD.2A1D.1040.58CD
- = is a valid gcs
- !B7BD.2A1D.1040.58CD is a valid local subsegment
- --> new global subsegment =!B7BD.2A1D.1040.58CD can be constructed
Implementation
The JAVA ABNF library [aParse] is used to generate a parser based on the XRI ABNF rules. The ABNF from http://wiki.oasis-open.org/xri/XriCd02/XriAbnf2dot1 was used, with the following adaptations:
# 1) aParse requires that ; is used to signal the end of a rule.
# 2) aParse requires that # is used for comments.
# 3) aParse doesn't support prose-val. the "ipath-empty" rule has been adjusted accordingly.
# 4) aParse requires that the case of rule names and rule references must match. References
# to the IRI rule have been adjusted accordingly.
# 5) aParse doesn't support "backtracking". As soon as a lower level rule succeeds in one
# way, aParse never tries to match it in a different way, even if that may be required
# for successfully matching a higher level rule. This causes a problem in the xref and
# xref-value rules. The xref rule has therefore been adjusted, and the xref-value rule
# has been replaced with the three rules xref-empty, xref-xri-reference and xref-IRI.
# This is only a change to the ABNFs internal structure and does not affect parsing results.
# 6) Due to the same aParse problem, the "(" and ")" characters have been removed from
# the sub-delims rule. Otherwise parsing of the xref rule would not work correctly if
# the xref is an IRI. Unfortunately this DOES affect parsing results (IRIs are not allowed
# to contain parens).
# 7) Java regular expressions only support the BMP of unicode. the "ucschar" and "iprivate"
# rules have been adjusted accordingly.
Packaging
Not sure about this. My current instinct is that the new 3.0 functionality should extend the current OpenXRI Syntax package by adding one more Java package to the project. This way users of the 2.0 syntax classes will still be able to use them.
Alternatives would be to completely replace the existing implementation, or to create a completely new package for the new 3.0 functionality.
