Because I’m crazy, that’s why.
▼
Say something profound
Or say something reflective
XML in a non-XML world?
Non-XML in an XML world?
Hence the title
Why XML?
Why not XML?
It’s all XML, some of it is just weirdly serialized
And what if you could do more?
With fun demos!
I’ve trained myself to think in terms of structure
Structure is absolutely essential for reuse
Using the same content in different documents
Using the same content in different contexts
Everyone wants reuse
It’s too complex
It’s too verbose
It’s too draconian
It’s hard to produce
It’s hard to consume
Doesn’t map nicely to programming language data structures
It’s complex enough to capture rich document structures (mixed content)
It allows (requires) the author to speak with precision
It’s always better to be correct
It’s hard to produce/consume with non-XML tools
Patient: “Doctor, doctor, it hurts when I scratch my elbow.”
Doctor: “Then stop scratching your elbow.”
Tens of very big projects. Very big budgets. Very sensitive to error.
Hundreds of big projects. Big budgets. Sensitive to error.
Thousands of smaller projects. Small budgets. Less sensitive to error.
Millions of small projects. No budgets. Mostly insensitive to error.
This is what winning looks like, by the way.
If all you need are:
Atomic values
Lists of atomic values
Key/values pairs (associative arrays, hashes, “objects”)
▼
No elements
No attributes
No namespaces
No mixed content
A paucity of data types
“Just” objects, arrays, and scalar values
It maps directly to programming language concepts that are provided “natively” by most programming languages and well understood by most programmers.
{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
"required": ["firstName", "lastName"]
}
{
"@id": "http://store.example.com/",
"@type": "Store",
"name": "Links Bike Shop",
"description": "The most \"linked\" bike store on earth!",
"product": [
{
"@id": "p:links-swift-chain",
"@type": "Product",
"name": "Links Swift Chain",
"description": "A fine chain with many links.",
"category": ["cat:parts", "cat:chains"],
"price": "10.00",
"stock": 10
}
],
"@context": {
"Store": "http://ns.example.com/store#Store",
"Product": "http://ns.example.com/store#Product",
"product": "http://ns.example.com/store#product",
"category":
{
"@id": "http://ns.example.com/store#category",
"@type": "@id"
},
"price": "http://ns.example.com/store#price",
"stock": "http://ns.example.com/store#stock",
"name": "http://purl.org/dc/terms/title",
"description": "http://purl.org/dc/terms/description",
"p": "http://store.example.com/products/",
"cat": "http://store.example.com/category/"
}
}
But I digress.
Simplicity is rarely an absolute
Making some things simpler almost always makes other things more complicated
Complexity creeps up on you
AWS CloudFormation templates
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "MarkLogic Sample Template:: Build Date: NDW HVM 8.0.1",
"Parameters": {
"AdminUser": {
"Description": "The MarkLogic Administrator Username",
"Type": "String"
},
It all looks fairly reasonable, until…
"UserData": {"Fn::Base64": {"Fn::Join": [
"",
[
"#!/bin/bash\n",
"function error_exit\n",
"{\n",
" logger -t MarkLogic \"$1\"",
Funny key names with implied semantics
No multi-line strings
<object xmlns="http://marklogic.com/xdmp/json/basic"
xmlns:AWS="http://amazon.com/aws"
xmlns:MarkLogic="http://marklogic.com/ns">
<AWSTemplateFormatVersion>2010-09-09</AWSTemplateFormatVersion>
<Description>MarkLogic Sample Template:: Build Date: NDW HVM 8.0.1</Description>
<Parameters>
<AdminUser>
<Description>The MarkLogic Administrator Username</Description>
<Type>String</Type>
</AdminUser>
I totally made up this XML vocabulary
<UserData encoding="base64">
#!/bin/bash
function error_exit
{
logger -t MarkLogic "$1"
exit 1
}
[Error: Parse error on line 342:
...} } }}
-------------------^
Expecting '}', ',', got 'EOF']
Very, uhm, “helpful”
cftemp.xml:203: parser error :
Opening and ending tag mismatch: UserData line 173 and Properties
</Properties>
Missing end tag precisely identified
XML is too verbose
XML is too hard to type
Why do I have to quote attribute values?
Why do I have to have matching end tags?
▼
Simpler input formats are all the rage
What’s your favorite MarkDown flavor this week?
This is not a new concept
I wrote a JSON parser…
in SGML
First of all: YOU ARE NUTS! Creative, ingenious, and NUTS!
— B. Tommie Usdin
By the by, did I bother to tell you that this is ingenious. And yes, you can do it. Aaaaaaqaaaaaaaaargh! You are one very sick puppy
— Debbie Lapeyre
You must have a schema (a DTD)
You are allowed to omit (some) start and end tags
Structures can span across entity boundaries
You can write a state machines with SHORTREF
Parsing JSON with SGML
<!DOCTYPE doc SYSTEM "json.dtd">
<doc>
{
"object": {
"key": "value",
"key2": "value2",
"array": [
"a",
"b",
"c"
]
}
}
</doc>
<!ENTITY object-open "<object>">
<!ENTITY object-close "</object>">
<!ENTITY key-open "<pair><key>">
<!ENTITY key-close "</key>">
<!ENTITY value-open "<value>">
<!ENTITY value-close "</value></pair>">
<!ENTITY array-open "<array>">
<!ENTITY array-close "</array>">
<!ENTITY entry-open "<entry>">
<!ENTITY entry-close "</entry>">
<!ENTITY string-open "<string>">
<!ENTITY string-close "</string>">
<!SHORTREF start-map '{' object-open
'[' array-open
'"' string-open>
<!SHORTREF object-map '"' key-open
'}' object-close
':' value-open>
<!SHORTREF value-map '"' string-open
',' value-close
'{' object-open
'[' array-open
']' array-close
'}' object-close>
<!SHORTREF array-map '"' string-open
'{' object-open
',' entry-open
'[' array-open
']' array-close
'}' object-close>
<!SHORTREF key-map '"' key-close>
<!SHORTREF string-map '"' string-close>
<!USEMAP start-map doc>
<!USEMAP object-map object>
<!USEMAP key-map key>
<!USEMAP value-map value>
<!USEMAP string-map string>
<!USEMAP array-map array>
<!ELEMENT doc - - (object|array)+>
<!ELEMENT pair O O (key,value)>
<!ELEMENT object - - (pair)+>
<!ELEMENT key - - (#PCDATA)*>
<!ELEMENT value - O (object|array|string)*>
<!ELEMENT string - - (#PCDATA)*>
<!ELEMENT array - - (entry)+>
<!ELEMENT entry O O (object|string)>
<!DOCTYPE doc SYSTEM "json.dtd">
<doc>
<object>
<pair>
<key>object</key>
<value>
<object>
<pair>
<key>key</key>
<value>
<string>value</string>
</value>
</pair>
<pair>
<key>key2</key>
<value>
<string>value2</string>
</value>
</pair>
<pair>
<key>array</key>
<value>
<array>
<entry>
<string>a</string>
</entry>
<entry>
<string>b</string>
</entry>
<entry>
<string>c</string>
</entry>
</array>
</value>
</pair>
</object>
</value>
</pair>
</object>
</doc>
Some folks say you can’t write in XML:
Generally speaking, [NSFW –ed]
I assert that structure aids my writing
MarkDown
CommonMark
GitHub
Flavor of the week
AsciiDoc
AsciiDoctor
Org Mode
Simpler is better
HTML5 isn’t helping authors
HTML5
JavaScript
JQuery
Angular
CSS
Sass
Analytics
It’s frameworks all the way down
Write in DocBook, or some customization
Convert to HTML with CSS and JavaScript for presentation
Convert to PDF for print
Refactor, reuse, repurpose, restyle with ease
▼
Was written entirely in AsciiDoc
Formatted by AsciiDoctor into DocBook 5
Transformed by XSLT into HTML5
Displayed with Reveal.JS
For simple content, “markdown” is easier
But for complex content…
= Main Title
Norman Walsh <ndw@nwalsh.com>
:docinfo:
:imagesdir: img
== Section Title
image::world.png[World map,link=http://example.com/]
[#frag5]
* Simplicity is rarely an absolute
****
Speaker notes
****
[quote, B. Tommie Usdin]
First of all: YOU ARE NUTS! Creative, ingenious, and NUTS!
It’s really a step back to the pre-markup days
Is it easier to let a thousand flowers bloom these days?
Who recognizes these dot commands?
Description | Command |
Bidirectional print on/off |
|
Microjustify on/off |
|
Page offset |
|
Comment (not printed) |
|
And this markup?
@Heading(The Beginning)
@Begin(Quotation)
Let's start at the very beginning, a very good place to start
@End(Quotation)
JSON is structured markup
We have good tools for leveraging document structure
What if we could use them on JSON?
▼
{
"a": {
"b" : "v1",
"c1" : 1,
"d" : null,
"g" : ["s1", "s2", "s3"]
}
}
/a/b
= "v1"
/a/g
= ("s1", "s2", "s3")
/a/g[2]
= "s2"
❔
{
"a": {
"b" : "v1",
"c1" : 1,
"d" : null,
"g" : ["s1", "s2", "s3"]
}
}
/a/d
= null
❔
/a/g[1]/node-name()
= g
❔
/a/g[2]/node-name()
= g
❔
JSON in MarkLogic Server
var json = xdmp.documentGet("/projects/presentations/2015/11-xmlams/cf/cftemp.json").next().value.toObject()
json["Resources"]["InstanceSecurityGroup"]["Type"]
Returns “AWS::EC2::SecurityGroup
”
xquery version "1.0-ml";
let $json := xdmp:document-get("/projects/presentations/2015/11-xmlams/cf/cftemp.json")
return
$json//InstanceSecurityGroup/Type
Returns “AWS::EC2::SecurityGroup
”
With apologies to Taylor Swift
'Cause the parsers gonna parse, parse, parse,
And the haters gonna hate, hate, hate,
Baby, I’m just gonna shake, shake, shake, shake, shake
I shake it off, I shake it off