mirror of
https://github.com/discourse/discourse.git
synced 2025-02-25 18:55:32 -06:00
FEATURE: update CommonMark to version 0.28
This commit is contained in:
174
spec/fixtures/md/spec.txt
vendored
174
spec/fixtures/md/spec.txt
vendored
@@ -1,8 +1,8 @@
|
||||
---
|
||||
title: CommonMark Spec
|
||||
author: John MacFarlane
|
||||
version: 0.27
|
||||
date: '2016-11-18'
|
||||
version: 0.28
|
||||
date: '2017-08-01'
|
||||
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
|
||||
...
|
||||
|
||||
@@ -11,10 +11,12 @@ license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
|
||||
## What is Markdown?
|
||||
|
||||
Markdown is a plain text format for writing structured documents,
|
||||
based on conventions used for indicating formatting in email and
|
||||
usenet posts. It was developed in 2004 by John Gruber, who wrote
|
||||
the first Markdown-to-HTML converter in Perl, and it soon became
|
||||
ubiquitous. In the next decade, dozens of implementations were
|
||||
based on conventions for indicating formatting in email
|
||||
and usenet posts. It was developed by John Gruber (with
|
||||
help from Aaron Swartz) and released in 2004 in the form of a
|
||||
[syntax description](http://daringfireball.net/projects/markdown/syntax)
|
||||
and a Perl script (`Markdown.pl`) for converting Markdown to
|
||||
HTML. In the next decade, dozens of implementations were
|
||||
developed in many languages. Some extended the original
|
||||
Markdown syntax with conventions for footnotes, tables, and
|
||||
other document elements. Some allowed Markdown documents to be
|
||||
@@ -312,7 +314,7 @@ form feed (`U+000C`), or carriage return (`U+000D`).
|
||||
characters].
|
||||
|
||||
A [Unicode whitespace character](@) is
|
||||
any code point in the Unicode `Zs` class, or a tab (`U+0009`),
|
||||
any code point in the Unicode `Zs` general category, or a tab (`U+0009`),
|
||||
carriage return (`U+000D`), newline (`U+000A`), or form feed
|
||||
(`U+000C`).
|
||||
|
||||
@@ -331,7 +333,7 @@ is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
|
||||
|
||||
A [punctuation character](@) is an [ASCII
|
||||
punctuation character] or anything in
|
||||
the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
|
||||
the general Unicode categories `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
|
||||
|
||||
## Tabs
|
||||
|
||||
@@ -402,8 +404,8 @@ as indentation with four spaces would:
|
||||
Normally the `>` that begins a block quote may be followed
|
||||
optionally by a space, which is not considered part of the
|
||||
content. In the following case `>` is followed by a tab,
|
||||
which is treated as if it were expanded into spaces.
|
||||
Since one of theses spaces is considered part of the
|
||||
which is treated as if it were expanded into three spaces.
|
||||
Since one of these spaces is considered part of the
|
||||
delimiter, `foo` is considered to be indented six spaces
|
||||
inside the block quote context, so we get an indented
|
||||
code block starting with two spaces.
|
||||
@@ -481,7 +483,7 @@ We can think of a document as a sequence of
|
||||
quotations, lists, headings, rules, and code blocks. Some blocks (like
|
||||
block quotes and list items) contain other blocks; others (like
|
||||
headings and paragraphs) contain [inline](@) content---text,
|
||||
links, emphasized text, images, code, and so on.
|
||||
links, emphasized text, images, code spans, and so on.
|
||||
|
||||
## Precedence
|
||||
|
||||
@@ -525,7 +527,7 @@ Markdown document.
|
||||
|
||||
A line consisting of 0-3 spaces of indentation, followed by a sequence
|
||||
of three or more matching `-`, `_`, or `*` characters, each followed
|
||||
optionally by any number of spaces, forms a
|
||||
optionally by any number of spaces or tabs, forms a
|
||||
[thematic break](@).
|
||||
|
||||
```````````````````````````````` example
|
||||
@@ -1582,7 +1584,7 @@ begins with a code fence, indented no more than three spaces.
|
||||
|
||||
The line with the opening code fence may optionally contain some text
|
||||
following the code fence; this is trimmed of leading and trailing
|
||||
spaces and called the [info string](@).
|
||||
whitespace and called the [info string](@).
|
||||
The [info string] may not contain any backtick
|
||||
characters. (The reason for this restriction is that otherwise
|
||||
some inline code would be incorrectly interpreted as the
|
||||
@@ -1643,6 +1645,15 @@ With tildes:
|
||||
</code></pre>
|
||||
````````````````````````````````
|
||||
|
||||
Fewer than three backticks is not enough:
|
||||
|
||||
```````````````````````````````` example
|
||||
``
|
||||
foo
|
||||
``
|
||||
.
|
||||
<p><code>foo</code></p>
|
||||
````````````````````````````````
|
||||
|
||||
The closing code fence must use the same character as the opening
|
||||
fence:
|
||||
@@ -2031,6 +2042,37 @@ or [closing tag] (with any [tag name] other than `script`,
|
||||
or the end of the line.\
|
||||
**End condition:** line is followed by a [blank line].
|
||||
|
||||
HTML blocks continue until they are closed by their appropriate
|
||||
[end condition], or the last line of the document or other [container block].
|
||||
This means any HTML **within an HTML block** that might otherwise be recognised
|
||||
as a start condition will be ignored by the parser and passed through as-is,
|
||||
without changing the parser's state.
|
||||
|
||||
For instance, `<pre>` within a HTML block started by `<table>` will not affect
|
||||
the parser state; as the HTML block was started in by start condition 6, it
|
||||
will end at any blank line. This can be surprising:
|
||||
|
||||
```````````````````````````````` example
|
||||
<table><tr><td>
|
||||
<pre>
|
||||
**Hello**,
|
||||
|
||||
_world_.
|
||||
</pre>
|
||||
</td></tr></table>
|
||||
.
|
||||
<table><tr><td>
|
||||
<pre>
|
||||
**Hello**,
|
||||
<p><em>world</em>.
|
||||
</pre></p>
|
||||
</td></tr></table>
|
||||
````````````````````````````````
|
||||
|
||||
In this case, the HTML block is terminated by the newline — the `**Hello**`
|
||||
text remains verbatim — and regular parsing resumes, with a paragraph,
|
||||
emphasised `world` and inline and block HTML following.
|
||||
|
||||
All types of [HTML blocks] except type 7 may interrupt
|
||||
a paragraph. Blocks of type 7 may not interrupt a paragraph.
|
||||
(This restriction is intended to prevent unwanted interpretation
|
||||
@@ -3637,11 +3679,15 @@ The following rules define [list items]:
|
||||
If the list item is ordered, then it is also assigned a start
|
||||
number, based on the ordered list marker.
|
||||
|
||||
Exceptions: When the first list item in a [list] interrupts
|
||||
a paragraph---that is, when it starts on a line that would
|
||||
otherwise count as [paragraph continuation text]---then (a)
|
||||
the lines *Ls* must not begin with a blank line, and (b) if
|
||||
the list item is ordered, the start number must be 1.
|
||||
Exceptions:
|
||||
|
||||
1. When the first list item in a [list] interrupts
|
||||
a paragraph---that is, when it starts on a line that would
|
||||
otherwise count as [paragraph continuation text]---then (a)
|
||||
the lines *Ls* must not begin with a blank line, and (b) if
|
||||
the list item is ordered, the start number must be 1.
|
||||
2. If any line is a [thematic break][thematic breaks] then
|
||||
that line is not a list item.
|
||||
|
||||
For example, let *Ls* be the lines
|
||||
|
||||
@@ -5796,6 +5842,15 @@ we just have literal backticks:
|
||||
<p>`foo</p>
|
||||
````````````````````````````````
|
||||
|
||||
The following case also illustrates the need for opening and
|
||||
closing backtick strings to be equal in length:
|
||||
|
||||
```````````````````````````````` example
|
||||
`foo``bar``
|
||||
.
|
||||
<p>`foo<code>bar</code></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
## Emphasis and strong emphasis
|
||||
|
||||
@@ -5845,19 +5900,20 @@ for efficient parsing strategies that do not backtrack.
|
||||
|
||||
First, some definitions. A [delimiter run](@) is either
|
||||
a sequence of one or more `*` characters that is not preceded or
|
||||
followed by a `*` character, or a sequence of one or more `_`
|
||||
characters that is not preceded or followed by a `_` character.
|
||||
followed by a non-backslash-escaped `*` character, or a sequence
|
||||
of one or more `_` characters that is not preceded or followed by
|
||||
a non-backslash-escaped `_` character.
|
||||
|
||||
A [left-flanking delimiter run](@) is
|
||||
a [delimiter run] that is (a) not followed by [Unicode whitespace],
|
||||
and (b) either not followed by a [punctuation character], or
|
||||
and (b) not followed by a [punctuation character], or
|
||||
preceded by [Unicode whitespace] or a [punctuation character].
|
||||
For purposes of this definition, the beginning and the end of
|
||||
the line count as Unicode whitespace.
|
||||
|
||||
A [right-flanking delimiter run](@) is
|
||||
a [delimiter run] that is (a) not preceded by [Unicode whitespace],
|
||||
and (b) either not preceded by a [punctuation character], or
|
||||
and (b) not preceded by a [punctuation character], or
|
||||
followed by [Unicode whitespace] or a [punctuation character].
|
||||
For purposes of this definition, the beginning and the end of
|
||||
the line count as Unicode whitespace.
|
||||
@@ -5936,7 +5992,7 @@ The following rules define emphasis and strong emphasis:
|
||||
7. A double `**` [can close strong emphasis](@)
|
||||
iff it is part of a [right-flanking delimiter run].
|
||||
|
||||
8. A double `__` [can close strong emphasis]
|
||||
8. A double `__` [can close strong emphasis] iff
|
||||
it is part of a [right-flanking delimiter run]
|
||||
and either (a) not part of a [left-flanking delimiter run]
|
||||
or (b) part of a [left-flanking delimiter run]
|
||||
@@ -5976,8 +6032,8 @@ the following principles resolve ambiguity:
|
||||
an interpretation `<strong>...</strong>` is always preferred to
|
||||
`<em><em>...</em></em>`.
|
||||
|
||||
14. An interpretation `<strong><em>...</em></strong>` is always
|
||||
preferred to `<em><strong>..</strong></em>`.
|
||||
14. An interpretation `<em><strong>...</strong></em>` is always
|
||||
preferred to `<strong><em>...</em></strong>`.
|
||||
|
||||
15. When two potential emphasis or strong emphasis spans overlap,
|
||||
so that the second begins before the first ends and ends after
|
||||
@@ -7000,14 +7056,14 @@ Rule 14:
|
||||
```````````````````````````````` example
|
||||
***foo***
|
||||
.
|
||||
<p><strong><em>foo</em></strong></p>
|
||||
<p><em><strong>foo</strong></em></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
```````````````````````````````` example
|
||||
_____foo_____
|
||||
.
|
||||
<p><strong><strong><em>foo</em></strong></strong></p>
|
||||
<p><em><strong><strong>foo</strong></strong></em></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
@@ -7148,8 +7204,9 @@ A [link destination](@) consists of either
|
||||
- a nonempty sequence of characters that does not include
|
||||
ASCII space or control characters, and includes parentheses
|
||||
only if (a) they are backslash-escaped or (b) they are part of
|
||||
a balanced pair of unescaped parentheses that is not itself
|
||||
inside a balanced pair of unescaped parentheses.
|
||||
a balanced pair of unescaped parentheses. (Implementations
|
||||
may impose limits on parentheses nesting to avoid performance
|
||||
issues, but at least three levels of nesting should be supported.)
|
||||
|
||||
A [link title](@) consists of either
|
||||
|
||||
@@ -7255,35 +7312,29 @@ Parentheses inside the link destination may be escaped:
|
||||
<p><a href="(foo)">link</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
One level of balanced parentheses is allowed without escaping:
|
||||
|
||||
```````````````````````````````` example
|
||||
[link]((foo)and(bar))
|
||||
.
|
||||
<p><a href="(foo)and(bar)">link</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
However, if you have parentheses within parentheses, you need to escape
|
||||
or use the `<...>` form:
|
||||
Any number of parentheses are allowed without escaping, as long as they are
|
||||
balanced:
|
||||
|
||||
```````````````````````````````` example
|
||||
[link](foo(and(bar)))
|
||||
.
|
||||
<p>[link](foo(and(bar)))</p>
|
||||
<p><a href="foo(and(bar))">link</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
However, if you have unbalanced parentheses, you need to escape or use the
|
||||
`<...>` form:
|
||||
|
||||
```````````````````````````````` example
|
||||
[link](foo\(and\(bar\))
|
||||
.
|
||||
<p><a href="foo(and(bar)">link</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
```````````````````````````````` example
|
||||
[link](foo(and\(bar\)))
|
||||
[link](<foo(and(bar)>)
|
||||
.
|
||||
<p><a href="foo(and(bar))">link</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
```````````````````````````````` example
|
||||
[link](<foo(and(bar))>)
|
||||
.
|
||||
<p><a href="foo(and(bar))">link</a></p>
|
||||
<p><a href="foo(and(bar)">link</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
@@ -7567,13 +7618,16 @@ that [matches] a [link reference definition] elsewhere in the document.
|
||||
A [link label](@) begins with a left bracket (`[`) and ends
|
||||
with the first right bracket (`]`) that is not backslash-escaped.
|
||||
Between these brackets there must be at least one [non-whitespace character].
|
||||
Unescaped square bracket characters are not allowed in
|
||||
[link labels]. A link label can have at most 999
|
||||
characters inside the square brackets.
|
||||
Unescaped square bracket characters are not allowed inside the
|
||||
opening and closing square brackets of [link labels]. A link
|
||||
label can have at most 999 characters inside the square
|
||||
brackets.
|
||||
|
||||
One label [matches](@)
|
||||
another just in case their normalized forms are equal. To normalize a
|
||||
label, perform the *Unicode case fold* and collapse consecutive internal
|
||||
label, strip off the opening and closing brackets,
|
||||
perform the *Unicode case fold*, strip leading and trailing
|
||||
[whitespace] and collapse consecutive internal
|
||||
[whitespace] to a single space. If there are multiple
|
||||
matching reference link definitions, the one that comes first in the
|
||||
document is used. (It is desirable in such cases to emit a warning.)
|
||||
@@ -8326,11 +8380,11 @@ The link labels are case-insensitive:
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
If you just want bracketed text, you can backslash-escape the
|
||||
opening `!` and `[`:
|
||||
If you just want a literal `!` followed by bracketed text, you can
|
||||
backslash-escape the opening `[`:
|
||||
|
||||
```````````````````````````````` example
|
||||
\!\[foo]
|
||||
!\[foo]
|
||||
|
||||
[foo]: /url "title"
|
||||
.
|
||||
@@ -8570,7 +8624,7 @@ a [single-quoted attribute value], or a [double-quoted attribute value].
|
||||
|
||||
An [unquoted attribute value](@)
|
||||
is a nonempty string of characters not
|
||||
including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``.
|
||||
including [whitespace], `"`, `'`, `=`, `<`, `>`, or `` ` ``.
|
||||
|
||||
A [single-quoted attribute value](@)
|
||||
consists of `'`, zero or more
|
||||
@@ -8691,9 +8745,13 @@ Illegal [whitespace]:
|
||||
```````````````````````````````` example
|
||||
< a><
|
||||
foo><bar/ >
|
||||
<foo bar=baz
|
||||
bim!bop />
|
||||
.
|
||||
<p>< a><
|
||||
foo><bar/ ></p>
|
||||
foo><bar/ >
|
||||
<foo bar=baz
|
||||
bim!bop /></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user