XProc 2.0: Standard Step Library

W3C Editor's Draft 11 June2 September 2015 at 13:1915:09 UTC (build 173168)

This Version:: https://xproc.github.iondw.github.io/specification/langspec
Latest Version:: http://www.w3.org/TR/xproc20-steps/
Editors:: Norman Walsh, MarkLogic Corporation<norman.walsh@marklogic.com>; Alex Milowski, Invited expert<alex@milowski.org>; Henry S. Thompson, University of Edinburgh<ht@inf.ed.ac.uk>
Repository:: This specification on GitHub; Report an issue
Changes:: Diff against current “status quo” draft; Commits for this specification

This document is also available in these non-normative formats: XML, automatic change markup from the previous draft courtesy of DeltaXML.

Abstract

This specification describes the standard step vocabulary of XProc 2.0: An XML Pipeline Language.

Status of this Document

This document is an editor's draft that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document is a product of the XML Processing Model Working Group as part of the W3C XML Activity. This draft is a first attempt to address some of the requirements of [XProc V2.0 Requirements]. It is in many ways substantially incomplete. The Working Group is publishing it in order to establish an intended direction and to provide an official opportunity for comment.

Please report errors in this document by raising issues on the specification repository. Alternatively, you may report errors in this document to the public mailing list public-xml-processing-model-comments@w3.org (public archives are available).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 14 October 2005 W3C Process Document.

1 Introduction
- 1.1 Required Steps
  - 1.1.1 p:add-attribute
  - 1.1.2 p:add-xml-base
  - 1.1.3 p:cast-content-type
  - 1.1.4 p:compare
  - 1.1.5 p:count
  - 1.1.6 p:delete
  - 1.1.7 p:directory-list
  - 1.1.8 p:error
  - 1.1.9 p:escape-markup
  - 1.1.10 p:filter
  - 1.1.11 p:http-request
  - 1.1.12 p:identity
  - 1.1.13 p:insert
  - 1.1.14 p:label-elements
  - 1.1.15 p:load
  - 1.1.16 p:make-absolute-uris
  - 1.1.17 p:namespace-rename
  - 1.1.18 p:pack
  - 1.1.19 p:parameters
  - 1.1.20 p:rename
  - 1.1.21 p:replace
  - 1.1.22 p:set-attributes
  - 1.1.23 p:set-properties
  - 1.1.24 p:sink
  - 1.1.25 p:split-sequence
  - 1.1.26 p:store
  - 1.1.27 p:string-replace
  - 1.1.28 p:unescape-markup
  - 1.1.29 p:unwrap
  - 1.1.30 p:wrap
  - 1.1.31 p:wrap-sequence
  - 1.1.32 p:xinclude
  - 1.1.33 p:xslt
- 1.2 Optional Steps
  - 1.2.1 p:exec
  - 1.2.2 p:hash
  - 1.2.3 p:in-scope-names
  - 1.2.4 p:template
  - 1.2.5 p:uuid
  - 1.2.6 p:validate-with-relax-ng
  - 1.2.7 p:validate-with-schematron
  - 1.2.8 p:validate-with-xml-schema
  - 1.2.9 p:www-form-urldecode
  - 1.2.10 p:www-form-urlencode
  - 1.2.11 p:xquery
  - 1.2.12 p:xsl-formatter
- 1.3 Serialization Options
2 Errors
- 2.1 Static Errors
- 2.2 Dynamic Errors
- 2.3 Step Errors
A Step Errors
B References
- 1 Normative References
- 2 Informative References

1 Introduction

This specification describes the standard, atomic XProc steps of [XProc 2.0].

Some aspects of documents are generally unchanged by steps:

When a step in this library produces an output document, the base URI of the output is the base URI of the step's primary input document unless the step's process explicitly sets an xml:base attribute or the step's description explicitly states how the base URI is constructed.
Unless otherwise specified, steps in this library do not modify the document properties^XP of the documents that flow through them.

Also, in this specification, several steps use this element for result information:

<c:result>
    string
</c:result>

When a step uses an XPath to compute an option value, the XPath context is as defined in Section 2.7, “XPaths in XProc”^XP.

When a step specifies a particular version of a technology, implementations must implement that version or a subsequent version that is backwards compatible with that version. At user-option, they may implement other non-backwards compatible versions.

1.1 Required Steps

This section describes standard steps that must be supported by any conforming processor.

1.1.1 p:add-attribute

The p:add-attribute step adds a single attribute to a set of matching elements. The input document specified on the source is processed for matches specified by the match pattern in the match option. For each of these matches, the attribute whose name is specified by the attribute-name option is set to the attribute value specified by the attribute-value option.

The resulting document is produced on the result output port and consists of a exact copy of the input with the exception of the matched elements. Each of the matched elements is copied to the output with the addition of the specified attribute with the specified value.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if the match pattern does not match an element.

The value of the attribute-name option must be a QName. If the lexical value does not contain a colon, then the attribute-namespace may be used to specify the namespace of the attribute. In that case, the attribute-prefix may be specified to suggest a prefix for the attribute name. It is a dynamic error (err:XD0034^XP) to specify a new namespace or prefix if the lexical value of the specified name contains a colon. The corresponding expanded name is used to construct the attribute.

The value of the attribute-value option must be a legal attribute value according to XML.

If an attribute with the same name as the expanded name from the attribute-name option exists on the matched element, the value specified in the attribute-value option is used to set the value of that existing attribute. That is, the value of the existing attribute is changed to the attribute-value value.

Note

If multiple attributes need to be set on the same element(s), the p:set-attributes step can be used to set them all at once.

This step cannot be used to add namespace declarations. It is a dynamic error (err:XC0059) if the QName value in the attribute-name option uses the prefix “xmlns” or any other prefix that resolves to the namespace name http://www.w3.org/2000/xmlns/. Note, however, that while namespace declarations cannot be added explicitly by this step, adding an attribute whose name is in a namespace for which there is no namespace declaration in scope on the matched element may result in a namespace binding being added by Section 2.5.1, “Namespace Fixup on XML Outputs”^XP.

If an attribute named xml:base is added or changed, the base URI of the element must also be amended accordingly.

1.1.2 p:add-xml-base

The p:add-xml-base step exposes the base URI via explicit xml:base attributes. The input document from the source port is replicated to the result port with xml:base attributes added to or corrected on each element as specified by the options on this step.

</p:declare-step>

The value of the all option must be a boolean.

The value of the relative option must be a boolean.

It is a dynamic error (err:XC0058) if the all and relative options are bothtrue.

The p:add-xml-base step modifies its input as follows:

For the document element: force the element to have an xml:base attribute with the document's [base URI] property's value as its value.
For other elements:
- If the all option has the value true, force the element to have an xml:base attribute with the element's [base URI] value as its value.
- If the element's [base URI] is different from the its parent's [base URI], force the element to have an xml:base attribute with the following value: if the value of the relative option is true, a string which, when resolved against the parent's [base URI], will give the element's [base URI], otherwise the element's [base URI].
- Otherwise, if there is an xml:base attribute present, remove it.

1.1.3 p:cast-content-type

The p:cast-content-type step changes the media type of its input.

</p:declare-step>

The input document is transformed from one media type to another. It is a dynamic error (err:XC1002) if the supplied content-type is not a valid media type of the form “type/subtype+ext”.

Casting from one XML media type to another simply changes the “content-type” document property^XP.
Casting from a non-XML media type to an XML media type produces an XML document with a c:data document element. The original media type will be preserved in the content-type attribute on the c:data element.
<c:data
  content-type? = string  content-type = ContentType  content-type? = stringContentType
  charset? = string
  encoding? = string>
    string
</c:data>
The content of the c:data element is the base64 encoded representation of the non-XML content.
Casting from an XML media type to a non-XML media type must support the case where the input document is a c:data document. The resulting document will have the specified media type and a representation^XP that is the content of the c:data element after decoding the base64 encoded content.
It is a dynamic error (err:XC1004) if the c:data contains content is not a valid base64 string.
It is a dynamic error (err:XC1005) if the c:data element does not have a content-type attribute.
It is a dynamic error (err:XC1006) if the content-type is supplied and is not the same as the content-type specified on the c:data element.
Casting from an XML media type to a non-XML media type when the input document is not a c:data document is implementation-defined^XP.
What happens when one non-XML media type is cast to another non-XML media type is implementation-defined^XP.It is a dynamic error (err:XC1003) if the p:cast-content-type step cannot perform the requested cast.

In all cases except when the input document is a c:data element, it is a dynamic error (err:XC1007) if the content-type is not supplied.

1.1.4 p:compare

The p:compare step compares two documents for equality.

</p:declare-step>

The value of the fail-if-not-equal option must be a boolean.

This step takes single documents on each of two ports and compares them using the fn:deep-equal (as defined in [XPath 2.0 Functions and Operators]). It is a dynamic error (err:XC0019) if the documents are not equal, and the value of the fail-if-not-equal option is true. If the documents are equal, or if the value of the fail-if-not-equal option is false, a c:result document is produced with contents true if the documents are equal, otherwise false.

1.1.5 p:count

The p:count step counts the number of documents in the source input sequence and returns a single document on result containing that number. The generated document contains a single c:result element whose contents is the string representation of the number of documents in the sequence.

</p:declare-step>

If the limit option is specified and is greater than zero, the p:count step will count at most that many documents. This provides a convenient mechanism to discover, for example, if a sequence consists of more than 1 document, without requiring every single document to be buffered before processing can continue.

1.1.6 p:delete

The p:delete step deletes items specified by a match pattern from the source input document and produces the resulting document, with the deleted items removed, on the result port.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. A match pattern may match multiple items to be deleted.

If an element is selected by the match option, the entire subtree rooted at that element is deleted.

This step cannot be used to remove namespaces. It is a dynamic error (err:XC0062) if the match option matches a namespace node. Also, note that deleting an attribute named xml:base does not change the base URI of the element on which it occurred.

1.1.7 p:directory-list

The p:directory-list step produces a list of the contents of a specified directory.

</p:declare-step>

The value of the path option must be an anyURI. It is interpreted as an IRI reference. If it is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option^XP or p:directory-list in the case of a syntactic shortcut^XP value).

It is a dynamic error (err:XC0017) if the absolute path does not identify a directory. It is a dynamic error (err:XC0012) if the contents of the directory path are not available to the step due to access restrictions in the environment in which the pipeline is run.

Conformant processors must support directory paths whose scheme is file. It is implementation-defined^XP what other schemes are supported by p:directory-list, and what the interpretation of 'directory', 'file' and 'contents' is for those schemes.

If present, the value of the include-filter or exclude-filter option must be a regular expression as specified in [XPath 2.0 Functions and Operators], section 7.61 “Regular Expression Syntax”.

If the include-filter pattern matches a directory entry's name, the entry is included in the output. If the exclude-filter pattern matches a directory entry's name, the entry is excluded in the output. If both options are provided, the include filter is processed first, then the exclude filter.

The result document produced for the specified directory path has a c:directory document element whose base URI is the directory path and whose name attribute is the last segment of the directory path (that is, the directory's (local) name).

<c:directory
  name = string>
    (c:file | 
     c:directory | 
     c:other)*
</c:directory>

Its contents are determined as follows, based on the entries in the directory identified by the directory path. For each entry in the directory, if either no filter was specified, or the (local) name of the entry matches the filter pattern, a c:file, a c:directory, or a c:other element is produced, as follows:

A c:directory is produced for each subdirectory not determined to be special.
A c:file is produced for each file not determined to be special.
<c:file
name = string />
Any file or directory determined to be special by the p:directory-list step may be output using a c:other element but the criteria for marking a file as special are implementation-defined^XP.
<c:other
name = string />

When a directory entry is a subdirectory, that directory's entries are not output as part of that entry's c:directory. A user must apply this step again to the subdirectory to list subdirectory contents.

Each of the elements c:file, c:directory, and c:other has a name attribute when it appears within the top-level c:directory element, whose value is a relative IRI reference, giving the (local) file or directory name.

Any attributes other than name on c:file, c:directory, or c:other are implementation-defined^XP.

1.1.8 p:error

The p:error step generates a dynamic error using the input provided to the step.

</p:declare-step>

The value of the code option must be a QName. If the lexical value does not contain a colon, then the code-namespace may be used to specify the namespace of the code. In that case, the code-prefix may be specified to suggest a prefix for the code. It is a dynamic error (err:XD0034^XP) to specify a new namespace or prefix if the lexical value of the specified name contains a colon.

This step uses the document provided on its input as the content of the error raised. An instance of the c:errors^XP element will be produced on the error output port, as is always the case for dynamic errors. The error generated can be caught by a p:try^XP just like any other dynamic error.

For authoring convenience, the p:error step is declared with a single, primary output port. With respect to connections^XP, this port behaves like any other output port even though nothing can ever appear on it since the step always fails.

For example, given the following invocation:

<p:error xmlns:my="http://www.example.org/error"
         name="bad-document" code="my:unk12">
   <p:input port="source">
     <p:inline>
       <message>The document element is unknown.</message>
     </p:inline>
   </p:input>
</p:error>

The error vocabulary element (and document) generated on the error output port would be:

<c:errors xmlns:c="http://www.w3.org/ns/xproc-step"
          xmlns:p="http://www.w3.org/ns/xproc"
          xmlns:my="http://www.example.org/error">
 <c:error name="bad-document" type="p:error"
          code="my:unk12"><message>The document element is unknown.</message>
</c:error>
</c:errors>

The href, line and column, or offset, might also be present on the c:error to identify the location of the p:error element in the pipeline.

1.1.9 p:escape-markup

The p:escape-markup step applies XML serialization to the children of the document element and replaces those children with their serialization. The outcome is a single element with text content that represents the "escaped" syntax of the children as they were serialized.

</p:declare-step>

This step supports the standard serialization options as specified in Section 1.3, “Serialization Options”. These options control how the output markup is produced before it is escaped.

For example, the input:

<description>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is a chunk of XHTML.</p>
</div>
</description>

produces:

<description>
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;
&lt;p>This is a chunk of XHTML.&lt;/p&gt;
&lt;/div&gt;
</description>

Note

The result of this step is an XML document that contains the Unicode characters that are the characters that result from escaping the input. It is not encoded characters in a serialized octet stream, therefore, the serialization options related to encoding characters (byte-order-mark, encoding, and normalization-form) do not apply. They are omitted from the standard serialization options on this step.

By default, this step must not generate an XML declaration in the escaped result.

1.1.10 p:filter

The p:filter step selects portions of the source document based on a (possibly dynamically constructed) XPath select expression.

</p:declare-step>

This step behaves just like an p:input^XP with a select expression except that the select expression is computed dynamically.

1.1.11 p:http-request

The p:http-request step provides for interaction with resources over HTTP or related protocols. The input document provided on the source port specifies a request by a single c:request element. This element specifies the method, resource, and other request properties as well as possibly including an entity body (content) for the request.

</p:declare-step>

The standard serialization options are provided to control the serialization of any XML content which is sent as part of the request. The effect of these options is as specified in Section 1.3, “Serialization Options”. See Section 1.1.11.3, “Request Entity body conversion” for a discussion of when serialization occurs in constructing a request.

It is a dynamic error (err:XC0040) if the document element of the document that arrives on the source port is not c:request.

Editorial Note

Can the input document be JSON?

1.1.11.1 Specifying a request

An HTTP request is represented by a c:request element.

<c:request
  method = NCName
  href? = anyURI
  detailed? = boolean
  status-only? = boolean
  username? = string
  password? = string
  auth-method? = string
  send-authorization? = boolean
  override-content-type? = string>  override-content-type? = ContentType>  override-content-type? = stringContentType>
    (c:header*,
     (c:multipart | 
      c:body)?)
</c:request>

It is a dynamic error (err:XC0006) if the method is not specified on a c:request. It is a dynamic error (err:XC0005) if the request contains a c:body or c:multipart but the method does not allow for an entity body being sent with the request.

It is a dynamic error (err:XC0004) if the status-only attribute has the value true and the detailed attribute does not have the value true.

The method attribute specifies the method to be used against the IRI specified by the href attribute, e.g. GET or POST (the value is not case-sensitive). If the href attribute is not absolute, it will be resolved against the base URI of the element on which it is occurs.

Note

In the case of simple “GET” requests, implementors are encouraged to support as many protocols as practical. In particular, pipeline authors may attempt to use p:http-request to load documents with computed URIs using the file: scheme.

If the username attribute is specified, the username, password, auth-method, and send-authorization attributes are used to handle authentication according to the selected authentication method.

For the purposes of avoiding an authentication challenge, if the send-authorization attribute has the value true and the authentication method specified by the auth-method supports generation of an Authorization header without a challenge, then an Authorization header is generated and sent on the first request. If the send-authorization attribute is absent or has the value false, then the first request is sent without an Authorization header.

If the initial response to the request is an authentication challenge, the auth-method, username, password and any relevant data from the challenge are used to generate an Authorization header and the request is sent again. If that authorization fails, the request is not retried.

Appropriate values for the auth-method attribute are “Basic” or “Digest” but other values are allowed. If the authentication method is “Basic” or “Digest”, authentication is handled as per [RFC 2617]. The interpretation of auth-method values on c:request other than “Basic” or “Digest” is implementation-defined^XP.

It is a dynamic error (err:XC0003) if a username or password is specified without specifying an auth-method, if the requested auth-method isn't supported, or the authentication challenge contains an authentication method that isn't supported. All implementations are required to support "Basic" and "Digest" authentication per [RFC 2617].

The c:header element specifies a header name and value, either for inclusion in a request, or as received in a response.

<c:header
  name = string
  value = string />

The request is formulated from the attribute values on the c:request element and its c:header and c:multipart or c:body children, if present, and transmitted to the host (and port, if present) specified by the href attribute. The details of how the request entity body, if any, is constructed are given in Section 1.1.11.5, “Converting Response Entity Bodies”.

When the request is formulated, the step and/or protocol implementation may add headers as necessary to either complete the request or as appropriate for the content specified (e.g. transfer encodings). A user of this step is guaranteed that their requested headers and content will be sent with the exception of any conflicts with protocol-related headers.

The p:http-request step allows users to specify independently values that are not always independent. For example, some combinations of c:header values (e.g., Content-Type) may be inconsistent with values that the step and/or protocol implementation must set. In a few cases, the step provides more than one mechanism to specify what is actually a single value (e.g., the boundary string in multipart messages). It is a dynamic error (err:XC0020) if the the user specifies a value or values that are inconsistent with each other or with the requirements of the step or protocol.

1.1.11.2 Filename globbing

Implementations that support file: URIs, should support “globbing”. For example, the URI file:///path/to/dir/*.xml should return all of the XML documents in the directory /path/to/dir.

Note

Must define the globbing rules!

1.1.11.3 Request Entity body conversion

The c:multipart element specifies a multi-part body, per [RFC 1521], either for inclusion in a request or as received in a response.

<c:multipart
  content-type = string  content-type = ContentType  content-type = stringContentType
  boundary = string>
    c:body+
</c:multipart>

In the context of a request, the media type of the c:multipartmust be a multipart media type (i.e. have a main type of 'multipart'). If the content-type attribute is not specified, a value of "“multipart/mixed"” will be assumed. (Whether or not, and to what extent, “multipart/byte-ranges” responses are supported is implementation-defined^XP.)

The boundary attribute is required and is used to provide a multipart boundary marker. The implementation must use this boundary marker and must prefix the value with the string “--” when formulating the multipart message. It is a dynamic error (err:XC0002) if the value starts with the string “--”.

If the boundary is also specified as a parameter in the content-type option, then the parameter value specified and the boundary value specified must be the same. If the boundary is specified in both the boundary option and the content-type option then the two values must be the same.

The c:body element holds the body or body part of the message. Each of the attributes holds controls some aspect of the encoding the request body or decoding the body element's content when the request is formulated. These are specified as follows:

<c:body
  content-type = string  content-type = ContentType  content-type = stringContentType
  encoding? = string
  id? = string
  description? = string
  disposition? = string>
    anyElement*
</c:body>

The content-type attribute specifies the media type of the body or body part, that is, the value of its Content-Type header. If the media type is not an XML type or text, the content must already be base64-encoded.

The encoding attribute controls the decoding of the element content for formulating the body. A value of base64 indicates the element's content is a base64 encoded string whose byte stream should be sent as the message body. An implementation may support encodings other than base64 but these encodings and their names are implementation-defined^XP.It is a dynamic error (err:XC0052) if the encoding specified is not supported by the implementation.

Note

The p:http-request step provides only a single set of serialization options for XML media types. There's no direct support for sending a multipart message with two XML parts encoded differently.

For each body or body part, the id attribute specifies the value of the Content-ID header; the description attribute specifies the value of the Content-Description header; and the disposition attribute specifies the value of the Content-Disposition header.

If an entity body is to be sent as part of a request (e.g. a POST), either a c:body element, specifying the request entity body, or a c:multipart element, specifying multiple entity body parts, may be used. When c:multipart is used it may contain multiple c:body children. A c:body specifies the construction of a body or body part as follows:

If the content-type attribute does not specify an XML media type, or the encoding attribute is “base64”, then it is a dynamic error (err:XC0028) if the content of the c:body element does not consist entirely of characters, and the entity body or body part will consist of exactly those characters.

Otherwise (the content-type attribute does specify an XML media type and the encoding attribute is not 'base64'), it is a dynamic error (err:XC0022) if the content of the c:body element does not consist of exactly one element, optionally preceded and/or followed by any number of processing instructions, comments or whitespace characters, and the entity body or body part will consist of the serialization of a document node containing that content. The serialization of that document is controlled by the serialization options on the p:http-request step itself.

For example, the following input to a p:http-request step will POST a small XML document:

<c:request method="POST" href="http://example.com/someservice">
<c:body xmlns:c="http://www.w3.org/ns/xproc-step" content-type="application/xml">
<doc>
<title>My document</title>
</doc>
</c:body>
</c:request>

The corresponding request should look something like this:

POST http://example.com/someservice HTTP/1.1
Host: example.com
Content-Type: application/xml; charset="utf-8"

<?xml version='1.0'?>
<doc>
<title>My document</title>
</doc>

1.1.11.4 Managing the response

Note

Where do we say that ffor URI schemes (such as file: and ftp:) where a content type is not provided by the underlying request, the content type is implementation-dependent^XP?

The handling of the response to the request and the generation of the step's result document is controlled by the status-only, override-content-type and detailed attributes on the c:request input.

The override-content-type attribute controls interpretation of the response's Content-Type header. If this attribute is present, the response will be treated as if it returned the Content-Type given by its value. This original Content-Type header will however be reflected unchanged as a c:header in the result document. It is a dynamic error (err:XC0030) if the override-content-type value cannot be used (e.g. text/plain to override image/png).

If the override-content-type includes an encoding parameter, then that encoding must be used to read the document.

If the status-only attribute has the value true, the result document will contain only header information. The entity of the response will not be processed to produce a c:body or c:multipart element.

The c:response element represents an HTTP response. The response's status code is encoded in the status attribute and the headers and entity body are processing into c:header and c:multipart or c:body content.

<c:response
  status? = integer>
    (c:header*,
     (c:multipart | 
      c:body)?)
</c:response>

The value of the detailed attribute determines the content of the result document. If it is true, the response to the request is handled as follows:

A single c:response element is produced with the status attribute containing the status of the response received.
Each response header is translated into a c:header element.
Unless the status-only attribute has a value true, the entity body of the response is converted into a c:body or c:multipart element via the rules given in Section 1.1.11.5, “Converting Response Entity Bodies”.

Otherwise (the detailed attribute is not specified or its value is false), the response to the request is handled as follows:

If the media type (as determined by the override-content-type attribute or the Content-Type response header) is an XML media type, the entity is decoded if necessary, then parsed as an XML document:
- The parser which p:http-request employs must process the external subset; all general and external parsed entities must be fully expanded.
  Editorial Note
  The requirement to process the external subset comes from p:load, we probably don't want to impose that on all p:http-request calls. Need a way to control it?
- It may perform xml:id processing, but it must not perform any other processing, such as expanding XIncludes.
- The parser must be conformant to Namespaces in XML.
- Parsing the document must not fail due to validation errors.
The resulting XML document is produced on the result output port as the entire output of the step.
Otherwise, the entity body of the response is converted into a c:body or c:multipart element via the rules given in Section 1.1.11.5, “Converting Response Entity Bodies”.

In either case the base URI of the output document is the resolved value of the href attribute from the input c:request.

1.1.11.4.1 Redirects

One possible response from an HTTP request is a redirect, indicated by a status code in the three-hundred range. The precise semantics of the 3xx return codes are laid out by section 10.3 Redirection 3xx in [RFC 2616].

The p:http-request step should follow redirect requests (in a manner consistent with [RFC 2616]) if they are returned by the server.

1.1.11.4.2 Cookies

With one exception, in version 1.0 of XProc, the p:http-request step does not provide any standard mechanisms for managing cookies. Pipeline authors that need to preserve cookies across several p:http-request calls in the same pipeline or across multiple invocations of the same or different pipelines will have to rely on implementation-defined^XP mechanisms.

The exception arises in the case of redirection. If a redirect response includes cookies, those cookies should be forwarded as appropriate to the redirected location when the redirection is followed.

This behavior will allow the p:http-request step to interoperate with web services that use cookies as part of an authentication protocol.

1.1.11.5 Converting Response Entity Bodies

The entity of a response may be multipart per [RFC 1521]. In those situations, the result document will be a c:multipart element that contains multiple c:body elements inside.

Note

Although it is technically possible for any of the individual parts of a multipart message to also be multipart, XProc does not provide a standard representation for such messages. The interpretation of a multipart message inside another multipart message is implementation-dependent^XP.

The result of the p:http-request step is an XML document. For media types (images, binaries, etc.) that can't be represented as a sequence of Unicode characters, the response is encoded as base64 and then returned as text children of the c:body element. If the content is base64-encoded, the encoding attribute on c:body must be set to “base64”.

Editorial Note

This section hasn't been updated to reflect the fact that non-XML documents are now possible. It should probably say something like:

If the document identified has a non-XML content type, no extra processing is mandated. The number and variety of media types that an implementation can load is implementation-defined^XP.

If the media type of the response is a text type with a charset parameter that is a Unicode character encoding (per [Unicode TR#17]) or is recognized as a non-XML media type whose contents are encoded as a sequence of Unicode characters (e.g. it has a character parameter or the definition of the media type is such that it requires Unicode), the content of the constructed c:body element is the translation of the text into a sequence of Unicode characters.

If the response is an XML media type, the content of the constructed c:body element is the result of decoding the body as necessary, then parsing it with an XML parser.

The parser which p:http-request employs must process the external subset; all general and external parsed entities must be fully expanded.
Editorial Note
The requirement to process the external subset comes from p:load, we probably don't want to impose that on all p:http-request calls. Need a way to control it?
It may perform xml:id processing, but it must not perform any other processing, such as expanding XIncludes.
The parser must be conformant to Namespaces in XML.
Parsing the document must not fail due to validation errors.

If the content is not well-formed, the step fails.

Editorial Note

This prose should be consolidated into a single place.

In a c:body in a response, the content-type attribute must be an exact copy of the value returned in the Content-Type header. That is, it must reflect the content type actually returned, not any override value that may have been specified, and it must include any parameters returned by the server.

In the case of a multipart response, the same rules apply when constructing a c:body element for each body part encountered.

Note

Given the above description, any content identified as text/html will be encoded as (escaped) text or base64-encoded in the c:body element, as HTML isn't always well-formed XML. A user can attempt to convert such content into XML using the p:unescape-markup step.

1.1.11.6 HTTP Request Example

A simple form might be posted as follows:

<c:request method="POST" href="http://www.example.com/form-action" xmlns:c="http://www.w3.org/ns/xproc-step">
<c:body content-type="application/x-www-form-urlencoded">
name=W3C&amp;spec=XProc
</c:body>
</c:request>

and if the response was an XHTML document, the result document would be:

<c:response status="200" xmlns:c="http://www.w3.org/ns/xproc-step">
<c:header name="Date" value=" Wed, 09 May 2007 23:12:24 GMT"/>
<c:header name="Server" value="Apache/1.3.37 (Unix) PHP/4.4.5"/>
<c:header name="Vary" value="negotiate,accept"/>
<c:header name="TCN" value="choice"/>
<c:header name="P3P" value="policyref='http://www.w3.org/2001/05/P3P/p3p.xml'"/>
<c:header name="Cache-Control" value="max-age=600"/>
<c:header name="Expires" value="Wed, 09 May 2007 23:22:24 GMT"/>
<c:header name="Last-Modified" value="Tue, 08 May 2007 16:10:49 GMT"/>
<c:header name="ETag" value="'4640a109;42380ddc'"/>
<c:header name="Accept-Ranges" value="bytes"/>
<c:header name="Keep-Alive" value="timeout=2, max=100"/>
<c:header name="Connection" value="Keep-Alive"/>
<c:body content-type="application/xhtml+xml">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>OK</title></head>
<body><p>OK!</p></body>
</html>
</c:body>
</c:response>

1.1.12 p:identity

The p:identity step makes a verbatim copy of its input available on its output.

</p:declare-step>

If the implementation supports passing PSVI annotations between steps, the p:identity step must preserve any annotations that appear in the input.

1.1.13 p:insert

The p:insert step inserts the insertion port's document into the source port's document relative to the matching elements in the source port's document.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if that pattern matches anything other than element, text, processing-instruction, or comment nodes. Multiple matches are allowed, in which case multiple copies of the insertion documents will occur. If no elements match, then the document is unchanged.

The value of the position option must be an NMTOKEN in the following list:

“first-child” - the insertion is made as the first child of the match;
“last-child” - the insertion is made as the last child of the match;
“before” - the insertion is made as the immediate preceding sibling of the match;
“after” - the insertion is made as the immediate following sibling of the match.

It is a dynamic error (err:XC0025) if the match pattern matches anything other than an element node and the value of the position option is “first-child” or “last-child”.

As the inserted elements are part of the output of the step they are not considered in determining matching elements. If an empty sequence appears on the insertion port, the result will be the same as the source.

1.1.14 p:label-elements

The p:label-elements step generates a label for each matched element and stores that label in the specified attribute.

</p:declare-step>

The value of the attribute option must be a QName. If the lexical value does not contain a colon, then the attribute-namespace may be used to specify the namespace of the attribute name. In that case, the attribute-prefix may be specified to suggest a prefix for the attribute name. It is a dynamic error (err:XD0034^XP) to specify a new namespace or prefix if the lexical value of the specified name contains a colon.

The value of the label option is an XPath expression used to generate the value of the attribute label.

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if that expression matches anything other than element nodes.

The value of the replacemust be a boolean value and is used to indicate whether existing attribute values are replaced.

This step operates by generating attribute labels for each element matched. For every matched element, the expression is evaluated with the context node set to the matched element. An attribute is added to the matched element using the attribute name is specified the attribute option and the string value of result of evaluating the expression. If the attribute already exists on the matched element, the value is replaced with the string value only if the replace option has the value of true.

If this step is used to add or change the value of an attribute named “xml:base”, the base URI of the element must also be amended accordingly.

An implementation must bind the variable “p:index” in the static context of each evaluation of the XPath expression to the position of the element in the sequence of matched elements. In other words, the first element (in document order) matched gets the value “1”, the second gets the value “2”, the third, “3”, etc.

The result of the p:label-elements step is the input document with the attribute labels associated with matched elements. All other non-matching content remains the same.

1.1.15 p:load

The p:load step has no inputs but produces as its result a document (or documents) specified by an IRI.

</p:declare-step>

The value of the href option must be an anyURI. It is interpreted as an IRI reference. If it is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option^XP or p:load in the case of a syntactic shortcut^XP value).

The document or documents identified by the URI is loaded and returned. If the URI protocol supports redirection, then redirects must be followed.

If dtd-validate is false, the p:load step is equivalent to performing the following p:http-request:

<p:http-request>
  <p:input port="source">
    <p:inline>
      <c:request method="GET"
                 href="{HREF}"
                 detailed="false"
                 status-only="false"
                 override-content-type="{OVERRIDE}"/>
    </p:inline>
  </p:input>
</p:http-request>

Where the “{HREF}” value is the value of the href option made absolute and the “{OVERRIDE} value is the value of the override-content-type option. If no value is provided for the override-content-type option, then the override-content-type attribute is not present on the c:request.

If dtd-validate is true, the p:load step is equivalent to performing the following pipeline:

<p:declare-step>
  <p:output port="result" sequence="false"/>
  <p:option name="href" required="true"/>
  <p:option name="override-content-type"/>

  <p:http-request>
    <p:input port="source">
      <p:inline expand-text="true">
        <c:request method="GET"
                   href="{$href}"
                   detailed="false"
                   status-only="false"
                   override-content-type="text/plain"/>
      </p:inline>
    </p:input>
  </p:http-request>

  <p:xml-parse dtd-validate="true"/>

  <p:choose>
    <p:when test="p:value-availalle('override-content-type')">
      <p:cast-content-type content-type="{$override-content-type}"/>
    </p:when>
    <p:otherwise>
      <p:identity/>
    </p:otherwise>
  </p:choose>
</p:declare-step>

The retrieved document or documents are produced on the result port. For single part responses, the base URI of the result is the (absolute) IRI used to retrieve it. For multipart responses, the base URI of each part is the (absolute) IRI used to retrieve it unless the content-disposition header indicates a URI. If the content-disposition header indicates a relative URI, it is made absolute agains the (absolute) IRI used to retreive it.

Editorial Note

How does the preceding paragraph jibe with what p:http-request says about multipart responses?

1.1.16 p:make-absolute-uris

The p:make-absolute-uris step makes an element or attribute's value in the source document an absolute IRI value in the result document.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if the pattern matches anything other than element or attribute nodes.

The value of the base-uri option must be an anyURI. It is interpreted as an IRI reference. If it is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option^XP or p:make-absolute-uris in the case of a syntactic shortcut^XP value).

For every element or attribute in the input document which matches the specified pattern, its XPath string-value is resolved against the specified base URI and the resulting absolute IRI is used as the matched node's entire contents in the output.

The base URI used for resolution defaults to the matched attribute's element or the matched element's base URI unless the base-uri option is specified. When the base-uri option is specified, the option value is used as the base URI regardless of any contextual base URI value in the document. This option value is resolved against the base URI of the p:option^XP element used to set the option.

If the IRI reference specified by the base-uri option on p:make-absolute-uris is not valid, or if it is absent and the input document has no base URI, the results are implementation-dependent^XP.

1.1.17 p:namespace-rename

The p:namespace-rename step renames any namespace declaration or use of a namespace in a document to a new IRI value.

</p:declare-step>

The value of the from option must be an anyURI. It should be either empty or absolute, but will not be resolved in any case.

The value of the to option must be an anyURI. It should be empty or absolute, but will not be resolved in any case.

The value of the apply-to option must be one of “all”, “elements”, or “attributes”. If the value is “elements”, only elements will be renamed, if the value is “attributes”, only attributes will be renamed, if the value is “all”, both elements and attributes will be renamed.

It is a dynamic error (err:XC0014) if the XML namespace (http://www.w3.org/XML/1998/namespace) or the XMLNS namespace (http://www.w3.org/2000/xmlns/) is the value of either the from option or the to option.

If the value of the from option is the same as the value of the to option, the input is reproduced unchanged on the output. Otherwise, namespace bindings, namespace attributes and element and attribute names are changed as follows:

Namespace bindings: If the from option is present and its value is not the empty string, then every binding of a prefix (or the default namespace) in the input document whose value is the same as the value of the from option is
- replaced in the output with a binding to the value of the to option, provided it is present and not the empty string;
- otherwise (the to option is not specified or has an empty string as its value) absent from the output.
If the from option is absent, or its value is the empty string, then no bindings are changed or removed.
Elements and attributes: If the from option is present and its value is not the empty string, for every element and attribute, as appropriate, in the input whose namespace name is the same as the value of the from option, in the output its namespace name is
- replaced with the value of the to option, provided it is present and not the empty string;
- otherwise (the to option is not specified or has an empty string as its value) changed to have no value.
If the from option is absent, or its value is the empty string, then for every element and attribute, as appropriate, whose namespace name has no value, in the output its namespace name is set to the value of the to option.
Namespace attributes: If the from option is present and its value is not the empty string, for every namespace attribute in the input whose value is the same as the value of the from option, in the output
- the namespace attribute's value is replaced with the value of the to option, provided it is present and not the empty string;
- otherwise (the to option is not specified or has an empty string as its value) the namespace attribute is absent.

Note

The apply-to option is primarily intended to make it possible to avoid renaming attributes when the from option specifies no namespace, since many attributes are in no namespace.

Care should be taken when specifying no namespace with the to option. Prefixed names in content, for example QNames and XPath expressions, may end up with no appropriate namespace binding.

1.1.18 p:pack

The p:pack step merges two document sequences in a pair-wise fashion.

</p:declare-step>

The value of the wrapper option must be a QName. If the lexical value does not contain a colon, then the wrapper-namespace may be used to specify the namespace of the wrapper. In that case, the wrapper-prefix may be specified to suggest a prefix for the wrapper element. It is a dynamic error (err:XD0034^XP) to specify a new namespace or prefix if the lexical value of the specified name contains a colon.

The step takes each pair of documents, in order, one from the source port and one from the alternate port, wraps them with a new element node whose QName is the value specified in the wrapper option, and writes that element to the result port as a document.

If the step reaches the end of one input sequence before the other, then it simply wraps each of the remaining documents in the longer sequence.

Note

In the common case, where the document element of a document in the result sequence has two element children, any comments, processing instructions, or white space text nodes that occur between them may have come from either of the input documents; this step does not attempt to distinguish which one.

1.1.19 p:parameters

The p:parameters step exposes a set of parameters as a c:param-set document.

</p:declare-step>

Each parameter in the parameters map is converted into a c:param element. The resulting c:param elements are wrapped in a c:param-set and the parameter set document is written to the result port. The order in which c:param elements occur in the c:param-set is implementation-dependent^XP.

For consistency and user convenience, if any of the parameters have names that are in a namespace, the namespace attribute on the c:param element must be used. Each namemust be an NCName.

The base URI of the output document is the URI of the pipeline document that contains the step.

1.1.19.1 The c:param element

A c:param represents a parameter on a parameter input.

<c:param
  name = QName
  namespace? = anyURI
  value = string />

The name attribute of the c:param must have the lexical form of a QName.

If the namespace attribute is specified, then the expanded name of the parameter is constructed from the specified namespace and the name value. It is a dynamic error (err:XD0025^XP) if the namespace attribute is specified, the name contains a colon, and the specified namespace is not the same as the in-scope namespace binding for the specified prefix.

If the namespace attribute is not specified, and the name contains a colon, then the expanded name of the parameter is constructed using the name value and the namespace declarations in-scope on the c:param element.

If the namespace attribute is not specified, and the name does not contain a colon, then the expanded name of the parameter is in no namespace.

Any namespace-qualified attribute names that appear on the c:param element are ignored. It is a dynamic error (err:XD0014^XP) for any unqualified attribute names other than “name”, “namespace”, or “value” to appear on a c:param element.

1.1.19.2 The c:param-set element

A c:param-set represents a set of parameters on a parameter input.

<c:param-set>
    c:param*
</c:param-set>

The c:param-set contains zero or more c:param elements. It is a dynamic error (err:XD0018^XP) if the parameter list contains any elements other than c:param.

Any namespace-qualified attribute names that appear on the c:param-set element are ignored. It is a dynamic error (err:XD0014^XP) for any unqualified attribute names to appear on a c:param-set element.

1.1.20 p:rename

The p:rename step renames elements, attributes, or processing-instruction targets in a document.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if the pattern matches anything other than element, attribute or processing instruction nodes.

The value of the new-name option must be a QName. If the lexical value does not contain a colon, then the new-namespace may be used to specify the namespace of the new name. In that case, the new-prefix may be specified to suggest a prefix for the new name. It is a dynamic error (err:XD0034^XP) to specify a new namespace or prefix if the lexical value of the specified name contains a colon.

Each element, attribute, or processing-instruction in the input matched by the match pattern specified in the match option is renamed in the output to the name specified by the new-name option.

If the match option matches an attribute and if the element on which it occurs already has an attribute whose expanded name is the same as the expanded name of the specified new-name, then the results is as if the current attribute named “new-name” was deleted before renaming the matched attribute.

With respect to attributes named “xml:base”, the following semantics apply: renaming an from “xml:base” to something else has no effect on the underlying base URI of the element; however, if an attribute is renamed from something else to “xml:base”, the base URI of the element must also be amended accordingly.

If the pattern matches processing instructions, then it is the processing instruction target that is renamed. It is a dynamic error (err:XC0013) if the pattern matches a processing instruction and the new name has a non-null namespace.

1.1.21 p:replace

The p:replace step replaces matching nodes in its primary input with the document element of the replacement port's document.

</p:declare-step>

Every node in the primary input matching the specified pattern is replaced in the output is replaced by the document element of the replacement document. Only non-nested matches are replaced. That is, once a node is replaced, its descendants cannot be matched.

1.1.22 p:set-attributes

The p:set-attributes step sets attributes on matching elements.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if that pattern matches anything other than element nodes.

Each attribute on the document element of the document that appears on the attributes port is copied to each element that matches the match expression.

If an attribute with the same name as one of the attributes to be copied already exists, the value specified on the attribute port's document is used. The result port of this step produces a copy of the source port's document with the matching elements' attributes modified.

The matching elements are specified by the match pattern in the match option. All matching elements are processed. If no elements match, the step will not change any elements.

This step must not copy namespace declarations. If the attributes copied from the attributes use namespaces, prefixes, or prefixes bound to different namespaces, the document produced on the result output port will require Section 2.5.1, “Namespace Fixup on XML Outputs”^XP.

If an attribute named xml:base is added or changed, the base URI of the element must also be amended accordingly.

1.1.23 p:set-properties

The p:set-properties step sets document properties^XP on the source document.

</p:declare-step>

The document properties^XP of the document on the source port are augmented with the values specified in the properties option. The document produced on the result port has the same representation but the adjusted property values.

It is a dynamic error (err:XC1001) if the properties map contains a key equal to the string “content-type”.

1.1.24 p:sink

The p:sink step accepts a sequence of documents and discards them. It has no output.

</p:declare-step>

1.1.25 p:split-sequence

The p:split-sequence step accepts a sequence of documents and divides it into two sequences.

</p:declare-step>

The value of the test option must be an XPathExpression.

The XPath expression in the test option is applied to each document in the input sequence. If the effective boolean value of the expression is true, the document is copied to the matched port; otherwise it is copied to the not-matched port.

If the initial-only option is true, then when the first document that does not satisfy the test expression is encountered, it and all the documents that follow it are written to the not-matched port. In other words, it only writes the initial series of matched documents (which may be empty) to the matched port. All other documents are written to the not-matched port, irrespective of whether or not they match.

The XPath context^XP for the test option changes over time. For each document that appears on the source port, the expression is evaluated with that document as the context document. The context position (position()) is the position of that document within the sequence and the context size (last()) is the total number of documents in the sequence.

Note

In principle, this component cannot stream because it must buffer all of the input sequence in order to find the context size. In practice, if the test expression does not use the last() function, the implementation can stream and ignore the context size.

If the implementation supports passing PSVI annotations between steps, the p:split-sequence step must preserve any annotations that appear in the input.

1.1.26 p:store

The p:store step stores (a possibly serialized version of) its input to a URI. This step outputs a reference to the location of the stored document.

</p:declare-step>

The value of the href option must be an anyURI. If it is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option^XP or p:store in the case of a syntactic shortcut^XP value).

The step attempts to store the XML document to the specified URI. It is a dynamic error (err:XC0050) if the URI scheme is not supported or the step cannot store to the specified location.

The output of this step is a document containing a single c:result element whose content is the absolute URI of the document stored by the step.

The standard serialization options are provided to control the serialization of XML content when it is stored. These options are as specified in Section 1.3, “Serialization Options”.

1.1.27 p:string-replace

The p:string-replace step matches nodes in the document provided on the source port and replaces them with the string result of evaluating an XPath expression.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern.

The value of the replace option must be an XPathExpression.

The matched nodes are specified with the match pattern in the match option. For each matching node, the XPath expression provided by the replace option is evaluated with the matching node as the XPath context node. The string value of the result is used in the output. Nodes that do not match are copied without change.

If the expression given in the match option matches an attribute, the string value of the replace expression is used as the new value of the attribute in the output. If the attribute is named “xml:base”, the base URI of the element must also be amended accordingly.

If the expression matches any other kind of node, the entire node (and not just its contents) is replaced by the string value of the replace expression.

1.1.28 p:unescape-markup

The p:unescape-markup step takes the string value of the document element and parses the content as if it was a Unicode character stream containing serialized XML. The output consists of the same document element with children that result from the parse. This is the reverse of the p:escape-markup step.

</p:declare-step>

The value of the namespace option must be an anyURI. It should be absolute, but will not be resolved.

When the string value is parsed, the original document element is preserved so that the result will be well-formed XML even if the content consists of multiple, sibling elements.

The namespace option specifies a default namespace. Elements that are in no namespace in the unescaped content will be placed into this namespace unless there is an in-scope namespace declaration that specifies a different namespace (or explicitly undeclares the default namespace).

The content-type option may be used to specify an alternate content type for the string value. An implementation may use a different parser to produce XML content depending on the specified content-type. For example, an implementation might provide an HTML to XHTML parser (e.g. [HTML Tidy] or [TagSoup]) for the content type 'text/html'.

All implementations must support the content type application/xml, and must use a standard XML parser for it. It is a dynamic error (err:XC0051) if the content-type specified is not supported by the implementation. Behavior of p:unescape-markup for content-types other than application/xml is implementation-defined^XP.

The encoding option specifies how the data is encoded. All implementations must support the base64 encoding (and the absence of an encoding option, which implies that the content is plain Unicode text). It is a dynamic error (err:XC0052) if the encoding specified is not supported by the implementation.

If an encoding is specified, a charset may also be specified. The character set may be specified as a parameter on the content-type or via the separate charset option. If it is specified in both places, the value of the charset option must be used.

If the specified encoding is base64, then the character set must be specified. It is a dynamic error (err:XC0010) if an encoding of base64 is specified and the character set is not specified or if the specified character set is not supported by the implementation.

The octet-stream that results from decoding the text must be interpreted using the character encoding named by the value of the charset option to produce a sequence of Unicode characters to parse.

If no encoding is specified, the character set is ignored, irrespective of where it was specified.

For example, with the 'namespace' option set to the XHTML namespace, the following input:

<description>
&lt;p>This is a chunk.&lt;/p>
&lt;p>This is a another chunk.&lt;/p>
</description>

would produce:

<description>
<p xmlns="http://www.w3.org/1999/xhtml">This is a chunk.</p>
<p xmlns="http://www.w3.org/1999/xhtml">This is a another chunk.</p>
</description>

1.1.29 p:unwrap

The p:unwrap step replaces matched elements with their children.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if that pattern matches anything other than element nodes.

Every element in the source document that matches the specified match pattern is replaced by its children, effectively “unwrapping” the children from their parent. Non-element nodes and unmatched elements are passed through unchanged.

Note

The matching applies to the entire document, not just the “top-most” matches. A pattern of the form h:div will replace allh:div elements, not just the top-most ones.

This step produces a single document; if the document element is unwrapped, the result might not be well-formed XML.

1.1.30 p:wrap

The p:wrap step wraps matching nodes in the source document with a new parent element.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. It is a dynamic error (err:XC0023) if the pattern matches anything other than document, element, text, processing instruction, and comment nodes.

The value of the group-adjacent option must be an XPathExpression.

If the node matched is the document node (match="/"), the result is a new document where the document element is a new element node whose QName is the value specified in the wrapper option. That new element contains copies of all of the children of the original document node.

When the match pattern does not match the document node, every node that matches the specified match pattern is replaced with a new element node whose QName is the value specified in the wrapper option. The content of that new element is a copy of the original, matching node. The p:wrap step performs a "deep" wrapping, the children of the matching node and their descendants are processed and wrappers are added to all matching nodes.

The group-adjacent option can be used to group adjacent matching nodes in a single wrapper element. The specified XPath expression is evaluated for each matching node with that node as the XPath context node. Whenever two or more adjacent matching nodes have the same “group adjacent” value, they are wrapped together in a single wrapper element.

Two matching nodes are considered adjacent if and only if they are siblings and either there are no nodes between them or all intervening, non-matching nodes are whitespace text, comment, or processing instruction nodes.

1.1.31 p:wrap-sequence

The p:wrap-sequence step accepts a sequence of documents and produces either a single document or a new sequence of documents.

</p:declare-step>

The value of the group-adjacent option must be an XPathExpression.

In its simplest form, p:wrap-sequence takes a sequence of documents and produces a single, new document by placing each document in the source sequence inside a new document element as sequential siblings. The name of the document element is the value specified in the wrapper option.

The group-adjacent option can be used to group adjacent documents. The XPath context^XP for the group-adjacent option changes over time. For each document that appears on the source port, the expression is evaluated with that document as the context document. The context position (position()) is the position of that document within the sequence and the context size (last()) is the total number of documents in the sequence. Whenever two or more sequentially adjacent documents have the same “group adjacent” value, they are wrapped together in a single wrapper element.

1.1.32 p:xinclude

The p:xinclude step applies [XInclude] processing to the source document.

</p:declare-step>

The value of the fixup-xml-base option must be a boolean. If it is true, base URI fixup will be performed as per [XInclude].

The value of the fixup-xml-lang option must be a boolean. If it is true, language fixup will be performed as per [XInclude].

The included documents are located with the base URI of the input document and are not provided as input to the step.

It is a dynamic error (err:XC0029) if an XInclude error occurs during processing.

1.1.33 p:xslt

The p:xslt step applies an [XSLT 1.0] or [XSLT 2.0] stylesheet to a document.

</p:declare-step>

If present, the value of the initial-mode option must be a QName.

If present, the value of the template-name option must be a QName.

If present, the value of the output-base-uri option must be an anyURI. If it is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option^XP or p:xslt in the case of a syntactic shortcut^XP value).

If the step specifies a version, then that version of XSLT must be used to process the transformation. It is a dynamic error (err:XC0038) if the specified version is not available. If the step does not specify a version, the implementation may use any version it has available and may use any means to determine what version to use, including, but not limited to, examining the version of the stylesheet.

The XSLT stylesheet provided on the stylesheet port is applied to the document on the source port. Any parameters passed in the parameters option are used to define top-level stylesheet parameters. The primary result document of the transformation, if there is one, appears on the result port. At most one document can appear on the result port. All other result documents appear on the secondary port. The order in which result documents appear on the secondary port is implementation-dependent^XP. If XSLT 1.0 is used, an empty sequence of documents must appear on the secondary port.

If a sequence of documents is provided on the source port, the first document is used as the primary input document. The whole sequence is also the default collection. If no documents are provided on the source port, the primary input document is undefined and the default collection is empty. It is a dynamic error (err:XC0039) if a sequence of documents (including an empty sequence) is provided to an XSLT 1.0 step.

A dynamic error occurs if the XSLT processor signals a fatal error. This includes the case where the transformation terminates due to a xsl:message instruction with a terminate attribute value of “yes”. How XSLT message termination errors are reported to the XProc processor is implementation-dependent^XP.

The invocation of the transformation is controlled by the initial-mode and template-name options that set the initial mode and/or named template in the XSLT transformation where processing begins. It is a dynamic error (err:XC0056) if the specified initial mode or named template cannot be applied to the specified stylesheet.

The output-base-uri option sets the context's output base URI per the XSLT 2.0 specification, otherwise the base URI of the result document is the base URI of the first document in the source port's sequence. If the value of the output-base-uri option is not absolute, it will be resolved using the base URI of its p:option^XP element. An XSLT 1.0 step should use the value of the output-base-uri as the base URI of its output, if the option is specified.

If XSLT 2.0 is used, the outputs of this step may include PSVI annotations.

The static and initial dynamic contexts of the XSLT processor are the contexts defined in Section 2.7.2, “Step XPath Context”^XP with the following adjustments.

The dynamic context is augmented as follows:

Context item: The first document that appears on the source port.
Variable values: Any parameters passed in the parameters option are available as variable bindings to the XSLT processor.
Function implementations: The function implementations provided by the XSLT processor.
Default collection: The sequence of documents provided on the source port.

1.2 Optional Steps

The following steps are optional. If they are supported by a processor, they must conform to the semantics outlined here, but a conformant processor is not required to support all (or any) of these steps.

1.2.1 p:exec

The p:exec step runs an external command passing the input that arrives on its source port as standard input, reading result from standard output, and errors from standard error.

</p:declare-step>

The values of the command, args, cwd, path-separator, and arg-separator options must be strings.

The values of the source-is-xml, result-is-xml, errors-is-xml, and fix-slashes options must be boolean.

The p:exec step executes the command passed on command with the arguments passed on args. The processor does not interpolate the values of the command or args (for example, expanding references to environment variables). It is a dynamic error (err:XC0033) if the command cannot be run.

If cwd is specified, then the current working directory is changed to the value of that option before execution begins. It is a dynamic error (err:XC0034) if the current working directory cannot be changed to the value of the cwd option. If cwd is not specified, the current working directory is implementation-defined^XP.

If the path-separator option is specified, every occurrence of the character identified as the path-separator character that occurs in the command, args, or cwd will be replaced by the platform-specific path separator character. It is a dynamic error (err:XC0063) if the path-separator option is specified and is not exactly one character long.

The value of the args option is a string. In order to support passing more than one argument to a command, the args string is broken into a sequence of values. The arg-separator option specifies the character that is used to separate values; by default it is a single space It is a dynamic error (err:XC0066) if the arg-separator option is specified and is not exactly one character long.

The following examples of p:exec are equivalent. The first uses the default arg-separator:

<p:exec command="someCommand" args="arg1 arg2 arg3"/>

The second specifies an alternate separator:

<p:exec command="someCommand" args="arg1,arg2,arg3"
	arg-separator=","/>

If one of the arguments contains a space (e.g., a filename that contains a space), then you must specify an alternate separator.

The source port is declared to accept a sequence so that it can be empty. If no document appears on the source port, then the command receives nothing on standard input. If a document does arrive on the source port, it will be passed to the command as its standard input. It is a dynamic error (err:XD0006^XP) if more than one document appears on the source port of the p:exec step. If source-is-xml is true, the serialization options are used to convert the input into serialized XML which is passed to the command, otherwise the XPath string-value of the document is passed.

The standard output of the command is read and returned on result; the standard error output is read and returned on errors. In order to assure that the result will be an XML document, each of the results will be wrapped in a c:result element.

If result-is-xml is true, the standard output of the program is assumed to be XML and will be parsed as a single document. If it is false, the output is assumed not to be XML and will be returned as escaped text.

If wrap-result-lines is true, a c:line element will be wrapped around each line of output.

<c:line>
    string
</c:line>

It is a dynamic error (err:XC0035) to specify both result-is-xml and wrap-result-lines.

The same rules apply to the standard error output of the program, with the errors-is-xml and wrap-error-lines options, respectively.

If either of the results are XML, they must be parsed with namespaces enabled and validation turned off, just like p:document^XP.

The exit-status port always returns a single c:result element which contains the system exit status that the process returned. The specific exit status values returned by a process invoked with p:exec are implementation-dependent^XP.

If a failure-threshold value is supplied, and the exit status is greater than that threshold, then the p:exec step must fail. It is a dynamic error (err:XC0064) if the exit code from the command is greater than the specified failure-threshold value. This failure, like any step failure, can be captured with a p:try^XP.

1.2.2 p:hash

The p:hash step generates a hash, or digital “fingerprint”, for some value and injects it into the source document.

</p:declare-step>

The value of the algorithm option must be a QName. If it does not have a prefix, then it must be one of the following values: “crc”, “md”, or “sha”.

If a version is not specified, the default version is algorithm-defined. For “crc” it is 32, for “md” it is 5, for “sha” it is 1.

A hash is constructed from the string specified in the value option using the specified algorithm and version. Implementations must support [CRC32], [MD5], and [SHA1] hashes. It is implementation-defined^XP what other algorithms are supported. The resulting hash should be returned as a string of hexadecimal characters.

The value of the match option must be an XSLTMatchPattern.

The hash of the specified value is computed using the algorithm and parameters specified. It is a dynamic error (err:XC0036) if the requested hash algorithm is not one that the processor understands or if the value or parameters are not appropriate for that algorithm.

The matched nodes are specified with the match pattern in the match option. For each matching node, the string value of the computed hash is used in the output (if more than one node matches, the same hash value is used in each match). Nodes that do not match are copied without change.

If the expression given in the match option matches an attribute, the hash is used as the new value of the attribute in the output. If the attribute is named “xml:base”, the base URI of the element must also be amended accordingly.

If the expression matches any other kind of node, the entire node (and not just its contents) is replaced by the hash.

1.2.3 p:in-scope-names

The p:in-scope-names step exposes all of the in-scope variables and options as a set of parameters in a c:param-set document.

</p:declare-step>

Each in-scope variable and option is converted into a c:param element. The resulting c:param elements are wrapped in a c:param-set and the parameter set document is written to the result port. The order in which c:param elements occur in the c:param-set is implementation-dependent^XP.

For consistency and user convenience, if any of the variables or options have names that are in a namespace, the namespace attribute on the c:param element must be used. Each namemust be an NCName.

The base URI of the output document is the URI of the pipeline document that contains the step.

For consistency with the p:parameters step, the result port is not primary.

1.2.3.1 Example

This unlikely pipeline demonstrates the behavior of p:in-scope-names:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                name="main" version="1.0">
<p:output port="result">
  <p:pipe step="vars" port="result"/>
</p:output>

<p:option name="username" required="true"/>
<p:option name="password" required="true"/>
<p:variable name="host" select="'http://example.com/'"/>

<p:in-scope-names name="vars"/>

</p:declare-step>

Assuming the values supplied for the username and password options are “user” and “pass”, respectively, the output would be:

<c:param-set xmlns:c="http://www.w3.org/ns/xproc-step">
  <c:param name="username" namespace="" value="user"/>
  <c:param name="host" namespace="" value="http://example.com/"/>
  <c:param name="password" namespace="" value="pass"/>
</c:param-set>

1.2.4 p:template

The p:template replaces each XPath expression, delimited with curly braces, in the template document with the result of evaluating that expression.

</p:declare-step>

While evaluating each expression, the names of any parameters passed to the step are available as variable values in the XPath dynamic context.

The step searches for XPath expressions in attribute values, text content (adjacent text nodes, if they occur in the data model, must be coalesced; this step always processes maximal length text nodes), processing instruction data, and comments. XPath expressions are identified by curly braces, similar to attribute value templates in XSLT or enclosed expressions in XQuery.

In order to allow curly braces to appear literally in content, they can be escaped by doubling them. In other words, where “{” would start an XPath expression, “{{” is simply a single, literal opening curly brace. The same applies for closing curly braces.

Inside an XPath expression, strings quoted by single (') or double (") quotes are treated literally. Outside of quoted text, it is an error for an opening curly brace to occur. A closing curly brace ends the XPath expression (whether or not it is followed immediately by another closing curly brace).

These parsing rules can be described by the following algorithm, though implementations are by no means required to implement the parsing in exactly this way, provided that they achieve the same results.

The parser begins in regular-mode at the start of each unit of content where expansion may occur. In regular-mode:
1. “{{” is replaced by a single “{”.
2. “}}” is replaced by a single “}”.
  Note: It is a dynamic error (err:XC0067) to encounter a single closing curly brace “}” that is not immediately followed by another closing curly brace.
3. A single opening curly brace “{” (not immediately followed by another opening curly brace) is discarded and the parser moves into xpath-mode. The inital expression is empty.
4. All other characters are copied without change.
In xpath-mode:
1. It is a dynamic error (err:XC0067) to encounter an opening curly brace “{”.
2. A closing curly brace “}” is discarded and ends the expression. The expression is evaluated and the result of that evaluation is copied to the output. The parser returns to regular-mode.
  Note: Braces cannot be escaped by doubling them in xpath-mode.
3. A single quote (') is added to the current expression and the parser moves to single-quote-mode.
4. A double quote (") is added to the current expression and the parser moves to double-quote-mode.
5. All other characters are appended to the current expression.
In single-quote-mode:
1. A single quote (') is added to the current expression and the parser moves to xpath-mode.
2. All other characters are appended to the current expression.
In double-quote-mode:
1. A double quote (") is added to the current expression and the parser moves to xpath-mode.
2. All other characters are appended to the current expression.

It is a dynamic error (err:XC0067) if the parser reaches the end of the unit of content and it is not in regular-mode.

The context node used for each expression is the document passed on the source port. It is a dynamic error (err:XC0068) if more than one document appears on the source port. In an XPath 1.0 implementation, if p:empty^XP is given or implied on the source port, an empty document node is used as the context node. In an XPath 2.0 implementation, the context item is undefined. It is a dynamic error (err:XC0026) if any XPath expression makes reference to the context node, size, or position when the context item is undefined.

In an attribute value, processing instruction, or comment, the string value of the XPath expression is used. In text content, an expression that selects nodes will cause those nodes to be copied into the template document.

Note

Depending on which version of XPath an implementation supports, and possibly on the xpath-version setting on the p:template, some implementations may report errors, or different results, than other implementations in those cases where the interpretation of an XPath expression differs between the versions of XPath.

1.2.4.1 Example

It's quite common to construct documents using values computed by the pipeline. This is particularly (but not exclusively) the case when the pipeline uses the p:http-request step. The input to p:http-request is a c:request document; attributes on the c:request element control most of the request parameters; the body of the document forms the body of request.

A typical example looks like this:

<c:request method="POST" href="http://example.com/post"
           username="user" password="password">
<c:body>
  <computed-content/>
</c:body>
</c:request>

If we assume that the href value and the computed content come from an input document, and the username and password are options, then a typical pipeline to compute the request becomes quite complex.

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
            xmlns:c="http://www.w3.org/ns/xproc-step"
            name="main" version="1.0">
<p:option name="username" required="true"/>
<p:option name="password" required="true"/>

<p:identity>
  <p:input port="source">
    <p:inline>
      <c:request method="POST"/>
    </p:inline>
  </p:input>
</p:identity>

<p:add-attribute match="/c:request" attribute-name="href">
  <p:with-option name="attribute-value" select="/doc/request/@uri">
    <p:pipe step="main" port="source"/>
  </p:with-option>
</p:add-attribute>

<p:add-attribute match="/c:request" attribute-name="username">
  <p:with-option name="attribute-value" select="$username"/>
</p:add-attribute>

<p:add-attribute match="/c:request" attribute-name="password">
  <p:with-option name="attribute-value" select="$password"/>
</p:add-attribute>

<p:insert position="first-child" match="/c:request">
  <p:input port="insertion" select="/doc/request">
    <p:pipe step="main" port="source"/>
  </p:input>
</p:insert>

<p:unwrap match="/c:request/request"/>

</p:pipeline>

There's nothing wrong with this pipeline, but it requires several steps to accomplish with the pipeline author probably considers a single operation. What's more, the result of these steps is not immediately obvious on casual inspection.

In order to make this simple construction case both literally and conceptually simpler, this note introduces two new XProc steps in the XProc namespace. Support for these steps is optional, but we strongly encourage implementors to provide them.

The new steps are p:in-scope-names and p:template. Taken together, they greatly simplify the pipeline:

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
            xmlns:c="http://www.w3.org/ns/xproc-step"
            name="main" version="1.0">
<p:option name="username" required="true"/>
<p:option name="password" required="true"/>

<p:in-scope-names name="vars"/>

<p:template>
  <p:input port="template">
    <p:inline>
      <c:request method="POST" href="{/doc/request/@uri}"
                 username="{$username}" password="{$password}">
        { /doc/request/node() }
      </c:request>
    </p:inline>
  </p:input>
  <p:input port="source">
    <p:pipe step="main" port="source"/>
  </p:input>
  <p:input port="parameters">
    <p:pipe step="vars" port="result"/>
  </p:input>
</p:template>

</p:pipeline>

The p:in-scope-names step provides all of the in-scope options and variables in a c:param-set (this operation is exactly analagous to what the p:parameters step does, except that it operates on the options and variables instead of on parameters).

The p:template step searches for XPath expressions, delimited by curly braces, in a template document and replaces each with the result of evaluating the expression. All of the parameters passed to the p:template step are available as in-scope variable names when evaluating each XPath expression.

Where the expressions occur in attribute values, their string value is used. Where they appear in text content, their node values are used.

1.2.5 p:uuid

The p:uuid step generates a [UUID] and injects it into the source document.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern. The value of the version option must be an integer.

If the version is specified, that version of UUID must be computed. It is a dynamic error (err:XC0060) if the processor does not support the specified version of the UUID algorithm. If the version is not specified, the version of UUID computed is implementation-defined^XP.

Implementations must support version 4 UUIDs. Support for other versions of UUID, and the mechanism by which the necessary inputs are made available for computing other versions, is implementation-defined^XP.

The matched nodes are specified with the match pattern in the match option. For each matching node, the generated UUID is used in the output (if more than one node matches, the same UUID is used in each match). Nodes that do not match are copied without change.

If the expression given in the match option matches an attribute, the UUID is used as the new value of the attribute in the output. If the attribute is named “xml:base”, the base URI of the element must also be amended accordingly.

If the expression matches any other kind of node, the entire node (and not just its contents) is replaced by the UUID.

1.2.6 p:validate-with-relax-ng

The p:validate-with-relax-ng step applies [RELAX NG] validation to the source document.

</p:declare-step>

The values of the dtd-attribute-values and dtd-id-idref-warnings options must be booleans.

If the schema document has an XML media type, then it must be interpreted as a RELAX NG Grammar. If the media type has a “text” type, then it must be interpreted as a [RELAX NG Compact Syntax] document for validation.

If the dtd-attribute-values option is true, then the attribute value defaulting conventions of [RELAX NG DTD Compatibility] are also applied.

If the dtd-id-idref-warnings option is true, then the validator should treat a schema that is incompatible with the ID/IDREF/IDREFs feature of [RELAX NG DTD Compatibility] as if the document was invalid.

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid.

The output from this step is a copy of the input, possibly augmented by application of the [RELAX NG DTD Compatibility]. The output of this step may include PSVI annotations.

Support for [RELAX NG DTD Compatibility] is implementation defined^XP.

1.2.7 p:validate-with-schematron

The p:validate-with-schematron step applies [Schematron] processing to the source document.

</p:declare-step>

It is a dynamic error (err:XC0054) if the assert-valid option is true and any Schematron assertions fail.

The value of the phase option identifies the Schematron validation phase with which validation begins.

The parameters option provides name/value pairs which correspond to Schematron external variables.

The result output from this step is a copy of the input.

Schematron assertions and reports, if any, must appear on the report port. The output should be in Schematron Validation Report Language (SVRL).

The output of this step may include PSVI annotations.

1.2.8 p:validate-with-xml-schema

The p:validate-with-xml-schema step applies [W3C XML Schema: Part 1] validity assessment to the source input.

</p:declare-step>

The values of the use-location-hints, try-namespaces, and assert-valid options must be boolean.

The value of the mode option must be an NMTOKEN whose value is either “strict” or “lax”.

Validation is performed against the set of schemas represented by the documents on the schema port. These schemas must be used in preference to any schema locations provided by schema location hints encountered during schema validation, that is, schema locations supplied for xs:import or xsi:schema-location, or determined by schema-processor-defined namespace-based strategies, for the namespaces covered by the documents available on the schemas port.

If xs:include elements occur within the supplied schema documents, they are treated like any other external documents^XP. It is implementation-defined^XP if the documents supplied on the schemas port are considered when resolving xs:include elements in the schema documents provided.

The use-location-hints and try-namespaces options allow the pipeline author to control how the schema processor should attempt to locate schema documents necessary but not provided on the schema port. Any schema documents provided on the schema port must be used in preference to schema documents located by other means.

If the use-location-hints option is “true”, the processor should make use of schema location hints to locate schema documents. If the option is “false”, the processor should ignore any such hints.

If the try-namespaces option is “true”, the processor should attempt to dereference the namespace URI to locate schema documents. If the option is “false”, the processor should not dereference namespace URIs.

The mode option allow the pipeline author to control how schema validation begins. The “strict” mode means that the document element must be declared and schema-valid, otherwise it will be treated as invalid. The “lax” mode means that the absence of a declaration for the document element does not itself count as an unsuccessful outcome of validation.

If the step specifies a version, then that version of XML Schema must be used to process the validation. It is a dynamic error (err:XC0038) if the specified version is not available. If the step does not specify a version, the implementation may use any version it has available and may use any means to determine what version to use, including, but not limited to, examining the version of the schema(s).

It is a dynamic error (err:XC0053) if the assert-valid option is true and the input document is not valid. If the assert-valid option is false, it is not an error for the document to be invalid. In this case, if the implementation does not support the PSVI, p:validate-with-xml-schema is essentially just an “identity” step, but if the implementation does support the PSVI, then the resulting document will have additional type information (at least for the subtrees that are valid).

When XML Schema validation assessment is performed, the processor is invoked in the mode specified by the mode option. It is a dynamic error (err:XC0055) if the implementation does not support the specified mode.

The result of the assessment is a document with the Post-Schema-Validation-Infoset (PSVI) ([W3C XML Schema: Part 1]) annotations, if the pipeline implementation supports such annotations. If not, the input document is reproduced with any defaulting of attributes and elements performed as specified by the XML Schema recommendation.

1.2.9 p:www-form-urldecode

The p:www-form-urldecode step decodes a x-www-form-urlencoded string into an XML representation.

</p:declare-step>

The value option is interpreted as a string of parameter values encoded using the x-www-form-urlencoded algorithm. Each name/value pair is written in a c:param element. The entire set of parameters is written (as a c:param-set) on the result output port.

It is a dynamic error (err:XC0037) if the value provided is not a properly x-www-form-urlencoded value. It is a dynamic error (err:XC0061) if the name of any encoded parameter name is not a valid xs:NCName. In other words, this step can only decode simple name/value pairs where the names do not contain colons or any characters that cannot be used in XML names.

The order of the c:param elements in the result is the same as the order of the encoded parameters, reading from left to right.

If any parameter name occurs more than once in the encoded string, the resulting parameter set will contain a c:param for each instance.

1.2.10 p:www-form-urlencode

The p:www-form-urlencode step encodes a set of parameter values as a x-www-form-urlencoded string and injects it into the source document.

</p:declare-step>

The value of the match option must be an XSLTMatchPattern.

The set of parameters is encoded as a single x-www-form-urlencoded string of name/value pairs. When parameters are encoded into name/value pairs, only the local name of each parameter is used. The namespace name is ignored and no prefix or colon appears in the name.

The order of the parameters is is implementation-dependent^XP.

The matched nodes are specified with the match pattern in the match option. For each matching node, the encoded string is used in the output. Nodes that do not match are copied without change.

If the expression given in the match option matches an attribute, the encoded string is used as the new value of the attribute in the output. If the expression matches any other kind of node, the entire node (and not just its contents) is replaced by the encoded string.

1.2.11 p:xquery

The p:xquery step applies an [XQuery 1.0] query to the sequence of documents provided on the source port.

</p:declare-step>

If a sequence of documents is provided on the source port, the first document is used as the initial context item. The whole sequence is also the default collection. If no documents are provided on the source port, the initial context item is undefined and the default collection is empty.

The query port must receive a single document:

If the document root element is c:query, the text descendants of this element are considered the query.
<c:query>
string
</c:query>
If the document root element is in the XQueryX namespace, the document is treated as an XQueryX-encoded query. Support for XQueryX is implementation-defined^XP.
If the query document has an XML media type, then the string value of the document must be treated as the query. If the media type has a “text” type, then it must be interpreted as the query.
Otherwise, the interpretation of the query is implementation-defined^XP.

If the step specifies a version, then that version of XQuery must be used to process the transformation. It is a dynamic error (err:XC0038) if the specified version is not available. If the step does not specify a version, the implementation may use any version it has available and may use any means to determine what version to use, including, but not limited to, examining the version of the query.

The result of the p:xquery step must be a sequence of documents. It is a dynamic error (err:XC0057) if the sequence that results from evaluating the XQuery contains items other than documents and elements. Any elements that appear in the result sequence will be treated as documents with the element as their document element.

For example:


<c:query>
declare namespace atom="http://www.w3.org/2005/Atom";
/atom:feed/atom:entry
</c:query>

The output of this step may include PSVI annotations.

The static context of the XQuery processor is augmented in the following way:

Statically known default collection type: document()*
Statically known namespaces:: Unchanged from the implementation defaults. No namespace declarations in the XProc pipeline are automatically exposed in the static context.

The dynamic context of the XQuery processor is augmented in the following way:

Context item: The first document that appears on the source port.
Context position: 1
Context size: 1
Variable values: Any parameters passed in the parameters option augment any implementation-defined variable bindings known to the XQuery processor.
Function implementations: The function implementations provided by the XQuery processor.
Current dateTime: The point in time returned as the current dateTime is implementation-defined^XP.
Implicit timezone: The implicit timezone is implementation-defined^XP.
Available documents: The set of available documents (those that may be retrieved with a URI) is implementation-dependent^XP.
Available collections: The set of available collections is implementation-dependent^XP.
Default collection: The sequence of documents provided on the source port.

1.2.11.1 Example

The following pipeline applies XInclude processing and schema validation before using XQuery:

Example 1. A Sample Pipeline Document

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc" version="1.0">

<p:xinclude/>

<p:validate-with-xml-schema name="validate">
  <p:input port="schema">
    <p:document href="http://example.com/path/to/schema.xsd"/>
  </p:input>
</p:validate-with-xml-schema>

<p:xquery>
   <p:input port="query">
      <p:data href="countp.xq" />
   </p:input>
</p:xquery>

</p:pipeline>

Where countp.xq might contain:

<count>{count(.//p)}</count>

1.2.12 p:xsl-formatter

The p:xsl-formatter step receives an [XSL 1.1] document and renders the content. The result of rendering is stored to the URI provided via the href option. A reference to that result is produced on the output port.

</p:declare-step>

The value of the href option must be an anyURI. If it is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option^XP or p:xsl-formatter in the case of a syntactic shortcut^XP value).

The content-type of the output is controlled by the content-type option. This option specifies a media type as defined by [IANA Media Types]. The option may include media type parameters as well (e.g. "application/someformat; charset=UTF-8"). The use of media type parameters on the content-type option is implementation-defined^XP.

If the content-type option is not specified, the output type is implementation-defined^XP. The default should be PDF.

A formatter may take any number of optional rendering parameters via the step's parameters; such parameters are defined by the XSL implementation used and are implementation-defined^XP.

The output of this step is a document containing a single c:result element whose content is the absolute URI of the document stored by the step.

1.3 Serialization Options

Several steps in this step library require serialization options to control the serialization of XML. These options are used to control serialization as in the [Serialization] specification.

The following options may be present on steps that perform serialization:

byte-order-mark

The value of this option must be a boolean. If it's not specified, the default varies by encoding: for UTF-16 it's true, for all others, it's false.

cdata-section-elements

The value of this option must be a list of QNames. They are interpreted as elements name.

doctype-public

The value of this option must be a string. The public identifier of the doctype.

doctype-system

The value of this option must be an anyURI. The system identifier of the doctype. It need not be absolute, and is not resolved.

encoding

A character set name. If no encoding is specified, the encoding used is implementation defined^XP. If the method is “xml” or “xhtml”, the implementation defined encoding must be either UTF-8 or UTF-16.

escape-uri-attributes

The value of this option must be a boolean. It is ignored unless the specified method is “xhtml” or “html”.

include-content-type

The value of this option must be a boolean. It is ignored unless the specified method is “xhtml” or “html”.

indent

The value of this option must be a boolean.

media-type

The value of this option must be a string. It specifies the media type (MIME content type). If not specified, the default varies according to the method:

xml: application/xml
html: text/html
xhtml: application/xhtml+xml
text: text/plain

For methods other than xml, html, xhtml, and text; the media-type is implementation defined^XP.

method

The value of this option must be a QName. It specifies the serialization method.

normalization-form

The value of this option must be an NMTOKEN, one of the enumerated values NFC, NFD, NFKC, NFKD, fully-normalized, none or an implementation-defined value.

omit-xml-declaration

The value of this option must be a boolean.

standalone

The value of this option must be an NMTOKEN, one of the enumerated values true, false, or omit.

undeclare-prefixes

The value of this option must be a boolean.

version

The value of this option must be a string.

In order to be consistent with the rest of this specification, boolean values for the serialization parameters must use one of the XML Schema lexical forms for boolean: "true", "false", "1", or "0". This is different from the [Serialization] specification which uses “yes” and “no”. No change in semantics is implied by this different spelling.

The method option controls the serialization method used by this component with standard values of 'html', 'xml', 'xhtml', and 'text' but only the 'xml' value is required to be supported. The interpretation of the remaining options is as specified in [Serialization].

Implementations may support other method values but their results are implementation-defined^XP.

A minimally conforming implementation must support the xml output method with the following option values:

The version must support the value 1.0.
The encoding must support the values UTF-8.
The omit-xml-declaration must be supported. If the value is not specified or has the value no, an XML declaration must be produced.

All other option values may be ignored for the xml output method.

If a processor chooses to implement an option for serialization, it must conform to the semantics defined in the [Serialization] specification.

Note

The use-character-maps parameter in [Serialization] specification has not been provided in the standard serialization options provided by this specification.

2 Errors

Errors in a pipeline can be divided into two classes: static errors and dynamic errors.

2.1 Static Errors

[Definition: A static error is one which can be detected before pipeline evaluation is even attempted.] Examples of static errors include cycles and incorrect specification of inputs and outputs.

Static errors are fatal and must be detected before any steps are evaluated.

For a complete list of static errors, see Section 1, “Static Errors”^XP.

2.2 Dynamic Errors

A [Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space).

If a step fails due to a dynamic error, failure propagates upwards until either a p:try^XP is encountered or the entire pipeline fails. In other words, outside of a p:try^XP, step failure causes the entire pipeline to fail.

For a complete list of dynamic errors, see Section 2, “Dynamic Errors”^XP.

2.3 Step Errors

Several of the steps in the standard and option step library can generate dynamic errors.

For a complete list of the dynamic errors raised by builtin pipeline steps, see Appendix A, Step Errors.

A Step Errors

The following dynamic errors can be raised by steps in this specification:

Step Errors

err:XC0002

It is a dynamic error if the value starts with the string “--”.

See: Request Entity body conversion

err:XC0003

It is a dynamic error if a username or password is specified without specifying an auth-method, if the requested auth-method isn't supported, or the authentication challenge contains an authentication method that isn't supported.

See: Specifying a request

err:XC0004

It is a dynamic error if the status-only attribute has the value true and the detailed attribute does not have the value true.

See: Specifying a request

err:XC0005

It is a dynamic error if the request contains a c:body or c:multipart but the method does not allow for an entity body being sent with the request.

See: Specifying a request

err:XC0006

It is a dynamic error if the method is not specified on a c:request.

See: Specifying a request

err:XC0010

It is a dynamic error if an encoding of base64 is specified and the character set is not specified or if the specified character set is not supported by the implementation.

See: p:unescape-markup

err:XC0012

It is a dynamic error if the contents of the directory path are not available to the step due to access restrictions in the environment in which the pipeline is run.

See: p:directory-list

err:XC0013

It is a dynamic error if the pattern matches a processing instruction and the new name has a non-null namespace.

See: p:rename

err:XC0014

It is a dynamic error if the XML namespace (http://www.w3.org/XML/1998/namespace) or the XMLNS namespace (http://www.w3.org/2000/xmlns/) is the value of either the from option or the to option.

See: p:namespace-rename

err:XC0017

It is a dynamic error if the absolute path does not identify a directory.

See: p:directory-list

err:XC0019

It is a dynamic error if the documents are not equal, and the value of the fail-if-not-equal option is true.

See: p:compare

err:XC0020

It is a dynamic error if the the user specifies a value or values that are inconsistent with each other or with the requirements of the step or protocol.

See: Specifying a request

err:XC0022

it is a dynamic error if the content of the c:body element does not consist of exactly one element, optionally preceded and/or followed by any number of processing instructions, comments or whitespace characters

See: Request Entity body conversion

err:XC0023

It is a dynamic error if the match pattern does not match an element.

See: p:add-attribute, p:insert, p:label-elements, p:make-absolute-uris, p:rename, p:replace, p:set-attributes, p:unwrap, p:wrap

err:XC0025

It is a dynamic error if the match pattern matches anything other than an element node and the value of the position option is “first-child” or “last-child”.

See: p:insert

err:XC0026

It is a dynamic error if any XPath expression makes reference to the context node, size, or position when the context item is undefined.

See: p:template

err:XC0028

it is a dynamic error if the content of the c:body element does not consist entirely of characters

See: Request Entity body conversion

err:XC0029

It is a dynamic error if an XInclude error occurs during processing.

See: p:xinclude

err:XC0030

It is a dynamic error if the override-content-type value cannot be used (e.g. text/plain to override image/png).

See: Managing the response

err:XC0033

It is a dynamic error if the command cannot be run.

See: p:exec

err:XC0034

It is a dynamic error if the current working directory cannot be changed to the value of the cwd option.

See: p:exec

err:XC0035

It is a dynamic error to specify both result-is-xml and wrap-result-lines.

See: p:exec

err:XC0036

It is a dynamic error if the requested hash algorithm is not one that the processor understands or if the value or parameters are not appropriate for that algorithm.

See: p:hash

err:XC0037

It is a dynamic error if the value provided is not a properly x-www-form-urlencoded value.

See: p:www-form-urldecode

err:XC0038

It is a dynamic error if the specified version is not available.

See: p:xslt, p:validate-with-xml-schema, p:xquery

err:XC0039

It is a dynamic error if a sequence of documents (including an empty sequence) is provided to an XSLT 1.0 step.

See: p:xslt

err:XC0040

It is a dynamic error if the document element of the document that arrives on the source port is not c:request.

See: p:http-request

err:XC0050

It is a dynamic error if the URI scheme is not supported or the step cannot store to the specified location.

See: p:store

err:XC0051

It is a dynamic error if the content-type specified is not supported by the implementation.

See: p:unescape-markup

err:XC0052

It is a dynamic error if the encoding specified is not supported by the implementation.

See: Request Entity body conversion, p:unescape-markup

err:XC0053

It is a dynamic error if the assert-valid option is true and the input document is not valid.

See: p:validate-with-relax-ng, p:validate-with-xml-schema

err:XC0054

It is a dynamic error if the assert-valid option is true and any Schematron assertions fail.

See: p:validate-with-schematron

err:XC0055

It is a dynamic error if the implementation does not support the specified mode.

See: p:validate-with-xml-schema

err:XC0056

It is a dynamic error if the specified initial mode or named template cannot be applied to the specified stylesheet.

See: p:xslt

err:XC0057

It is a dynamic error if the sequence that results from evaluating the XQuery contains items other than documents and elements.

See: p:xquery

err:XC0058

It is a dynamic error if the all and relative options are both true.

See: p:add-xml-base

err:XC0059

It is a dynamic error if the QName value in the attribute-name option uses the prefix “xmlns” or any other prefix that resolves to the namespace name http://www.w3.org/2000/xmlns/.

See: p:add-attribute

err:XC0060

It is a dynamic error if the processor does not support the specified version of the UUID algorithm.

See: p:uuid

err:XC0061

It is a dynamic error if the name of any encoded parameter name is not a valid xs:NCName.

See: p:www-form-urldecode

err:XC0062

It is a dynamic error if the match option matches a namespace node.

See: p:delete

err:XC0063

It is a dynamic error if the path-separator option is specified and is not exactly one character long.

See: p:exec

err:XC0064

It is a dynamic error if the exit code from the command is greater than the specified failure-threshold value.

See: p:exec

err:XC0066

It is a dynamic error if the arg-separator option is specified and is not exactly one character long.

See: p:exec

err:XC0067

It is a dynamic error to encounter a single closing curly brace “}” that is not immediately followed by another closing curly brace.

See: p:template, p:template, p:template

err:XC0068

It is a dynamic error if more than one document appears on the source port.

See: p:template

err:XC1001

It is a dynamic error if the properties map contains a key equal to the string “content-type”.

See: p:set-properties

err:XC1002

It is a dynamic error if the supplied content-type is not a valid media type of the form “type/subtype+ext”.

See: p:cast-content-type

err:XC1003

It is a dynamic error if the p:cast-content-type step cannot perform the requested cast.

See: p:cast-content-type

err:XC1004

It is a dynamic error if the c:data contains content is not a valid base64 string.

See: p:cast-content-type

err:XC1005

It is a dynamic error if the c:data element does not have a content-type attribute.

See: p:cast-content-type

err:XC1006

It is a dynamic error if the content-type is supplied and is not the same as the content-type specified on the c:data element.

See: p:cast-content-type

err:XC1007

In all cases except when the input document is a c:data element, it is a dynamic error if the content-type is not supplied.

See: p:cast-content-type

B References

1 Normative References

[XProc V2.0 Requirements] XProc V2.0 Requirements. Alex Milowski, James Fuller, and Norman Walsh editors. W3C Working Draft 5 November 2013.

[XProc 2.0] XProc 2.0: An XML Pipeline Language. Norman Walsh, Alex Milowski, and Henry Thompson, editors. W3C Working Draft 15 December 2014.

[XSLT 1.0] XSL Transformations (XSLT) Version 1.0. James Clark, editor. W3C Recommendation. 16 November 1999.

[XPath 2.0 Functions and Operators] XQuery 1.0 and XPath 2.0 Functions and Operators. Ashok Malhotra, Jim Melton, and Norman Walsh, editors. W3C Recommendation. 23 January 2007.

[XSLT 2.0] XSL Transformations (XSLT) Version 2.0. Michael Kay, editor. W3C Recommendation. 23 January 2007.

[XSL 1.1] Extensible Stylesheet Language (XSL) Version 1.1. Anders Berglund, editor. W3C Recommendation. 5 December 2006.

[XQuery 1.0] XQuery 1.0: An XML Query Language. Scott Boag, Don Chamberlin, Mary Fernández, et. al., editors. W3C Recommendation. 23 January 2007.

[RELAX NG] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-2:2008(E) Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG 2008.

[RELAX NG Compact Syntax] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-2:2003/Amd 1:2006 Document Schema Definition Languages (DSDL) — Part 2: Grammar-based validation — RELAX NG AMENDMENT 1 Compact Syntax 2006.

[RELAX NG DTD Compatibility] RELAX NG DTD Compatibility. OASIS Committee Specification. 3 December 2001.

[Schematron] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-3:2006(E) Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron 2006.

[W3C XML Schema: Part 1] XML Schema Part 1: Structures Second Edition. Henry S. Thompson, David Beech, Murray Maloney, et. al., editors. World Wide Web Consortium, 28 October 2004.

[XInclude] XML Inclusions (XInclude) Version 1.0 (Second Edition). Jonathan Marsh, David Orchard, and Daniel Veillard, editors. W3C Recommendation. 15 November 2006.

[Serialization] XSLT 2.0 and XQuery 1.0 Serialization. Scott Boag, Michael Kay, Joanne Tong, Norman Walsh, and Henry Zongaro, editors. W3C Recommendation. 23 January 2007.

[MD5] RFC 1321: The MD5 Message-Digest Algorithm. R. Rivest. Network Working Group, IETF, April 1992.

[RFC 1521] RFC 1521: MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies. N. Borenstein, N. Freed, editors. Internet Engineering Task Force. September, 1993.

[RFC 2616] RFC 2616: Hypertext Transfer Protocol — HTTP/1.1. R. Fielding, J. Gettys, J. Mogul, et. al., editors. Internet Engineering Task Force. June, 1999.

[RFC 2617] RFC 2617: HTTP Authentication: Basic and Digest Access Authentication. J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, L. Stewart. June, 1999 .

[Unicode TR#17] Unicode Technical Report #17: Character Encoding Model. Ken Whistler, Mark Davis, and Asmus Freytag, authors. The Unicode Consortium. 11 November 2008.

[IANA Media Types] IANA MIME Media Types. Internet Engineering Task Force.

[HTML Tidy] HTML Tidy Library Project. SourceForge project.

[TagSoup] TagSoup - Just Keep On Truckin'. John Cowan.

[UUID] ITU X.667: Information technology - Open Systems Interconnection - Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components. 2004.

[SHA1] Federal Information Processing Standards Publication 180-1: Secure Hash Standard. 1995.

2 Informative References

[CRC32] “32-Bit Cyclic Redundancy Codes for Internet Applications”, The International Conference on Dependable Systems and Networks: 459. 10.1109/DSN.2002.1028931. P. Koopman. June 2002.

XProc 2.0: Standard Step Library

W3C Editor's Draft 11 June2 September 2015 at 13:1915:09 UTC (build 173168)

Abstract

Status of this Document

Table of Contents

1 Introduction

1.1 Required Steps

1.1.1 p:add-attribute

Note

1.1.2 p:add-xml-base

1.1.3 p:cast-content-type

1.1.4 p:compare

1.1.5 p:count

1.1.6 p:delete

1.1.7 p:directory-list

1.1.8 p:error

1.1.9 p:escape-markup

Note

1.1.10 p:filter

1.1.11 p:http-request

Editorial Note

1.1.11.1 Specifying a request

Note

1.1.11.2 Filename globbing

Note

1.1.11.3 Request Entity body conversion

Note

1.1.11.4 Managing the response

Note

Editorial Note

1.1.11.4.1 Redirects

1.1.11.4.2 Cookies

1.1.11.5 Converting Response Entity Bodies

Note

Editorial Note

Editorial Note

Editorial Note

Note

1.1.11.6 HTTP Request Example

1.1.12 p:identity

1.1.13 p:insert

1.1.14 p:label-elements

1.1.15 p:load

Editorial Note

1.1.16 p:make-absolute-uris

1.1.17 p:namespace-rename

Note

1.1.18 p:pack

Note

1.1.19 p:parameters

1.1.19.1 The c:param element

1.1.19.2 The c:param-set element

1.1.20 p:rename

1.1.21 p:replace

1.1.22 p:set-attributes

1.1.23 p:set-properties

1.1.24 p:sink

1.1.25 p:split-sequence

Note

1.1.26 p:store

1.1.27 p:string-replace

1.1.28 p:unescape-markup

1.1.29 p:unwrap

Note

1.1.30 p:wrap

1.1.31 p:wrap-sequence

1.1.32 p:xinclude

1.1.33 p:xslt

1.2 Optional Steps

1.2.1 p:exec

1.2.2 p:hash

1.2.3 p:in-scope-names

1.2.3.1 Example

1.2.4 p:template

Note

1.2.4.1 Example

1.2.5 p:uuid

1.2.6 p:validate-with-relax-ng

1.2.7 p:validate-with-schematron

1.2.8 p:validate-with-xml-schema