WikiMarkupStandard

来自Jimmy's Wiki
Jimmyho讨论 | 贡献2021年10月14日 (四) 09:09的版本
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳到导航 跳到搜索

This page discusses ways to allow visitors from one wiki engine to edit pages on other wikis without having to learn their WikiSyntax. There are quite a lot of WikiEngine""s out there by now. Most have a similar, but not identical markup language.

This page just tries to determine a potential convergence target for text markup rules which we call the "BasicSet". So that most wiki engine authors would be willing to support.

During WikiSym 06 there was a workshop on the WikiCreole which was agreed upon.

CategoryWikiStandard (proposal)

# Creating the WikiMarkupStandard

# Suggestion A

Goal

  • Trying to make it easier to contribute to several wikis, and trying to facilitate the exchange of content between wikis.
  • Not trying to be compatible and not trying to do old and new markup simultaneously (that would be hopeless and bloat).
  • So that proposal will define the basic stuff and tries to be complete in the things it defines. If you need other things you should be able to extend it, but it is not intended that you have to extend the basic stuff or that you use old and new markup in parallel (at least not long-term).

Process

  1. Collect ideas and existing markup.
  2. Find optimimum markup (not conflicting internally.)
  3. Define a convergence standard that could become an official standard.
  4. Implement parsers and converters for converting existing markup to target markup (the plan is to migrate to the new markup, not to support it additionally, see bloat.)

Problems

  • How to handle wikis with site-specific extensions.
  • How to deal with multilingual wikis.


# Suggestion B

Goal

  • Try to (only) make it easier for newbies to contribute to several wikis.
  • Try to stay compatible as the proposed markup is implemented as a small subset or in parallel to old, already being used markup.
  • Exchange of content between wikis won't be generally possible if old markup is used.

Process

  1. Collect existing markup.
  2. Find useful common subset with few exceptions.
  3. Ask wiki engine authors to implement the remaining exceptions.

Problems

  • Markups mix uncontrolled, content not exchangeable.

# General Guidelines for a Future Markup Standard

  • Do not use HTML like tags. Wiki does not try to be HTML, so it shouldn't look like, either. See also: Wiki:WhyDoesntWikiDoHtml
  • Use easy to type characters. Minimize number of key presses required for common things.
    • (Should international considerations factor into this? For instance, square brackets occur on fewer keyboard layouts than parentheses.)
  • Try to think about what syntax would be best for the average user. Favor syntax that's obvious yet hard to parse over syntax that's obscure yet easy to parse.
  • Assign meaning to as few characters as possible, especially the most common punctuation.
  • Use pieces of existing, ubiquitous standards such as mailto: to denote an email address.

# Collecting Existing Markup

# Links

# WikiWord Links

  • WikiWord
  • ~WikiWord
    • Argument against: cannot be used on internationalised wikis (most non-Western writing systems do not have capital letters), problematic with agglutinating languages.
    • Argument against: some keyboard layouts don't have "~" key. For example, using an spanish keyboard on Windows you must press ALT+126 to get the "~" char. Not very userfriendly.

# Nonwiki Links

External link
http:// or ftp:// or gopher:// etc.
Email link
name@domain.org or mailto:name@domain.org
File attachment of a page
attachment:filename.doc
File from user upload area
Upload:UserName/filename.doc

# Free Links

Inline
  • _free_link_
  • free_link
  • %free link%
  • |free link|
  • `free link` (a backtick, usually found under the tilde (~) key; not a single quote ('))
    • Argument against: backticks are hard to read in many fonts and can be mangled by typesetting software.
Block

What is the difference between a "inline free link" and a "block free link" ? Are there *any* wiki that implement "block free link" ?

free links with square brackets
    • Argument against:
    • Square brackets are often used inside quotations.
      • Quotations should be quoted anyways
    • Argument against: square brackets can be difficult to type on many keyboard layouts.
    • Square brackets can be difficult to type on many keyboard layouts.
  • [free link]
  • _[free link]
  • [[free link]]
  • ["free link"]
free links without square brackets
  • {{free link}}
  • ((free link))
  • _(free link)
    • Argument for: parentheses are on almost every keyboard.
  • <free link>
    • Argument against: easy to confuse with SGML/XML.
    • Argument for: often used in e-mails and news posts
Discussion

Link thoughts: We may have to demand that a wiki supporting WikiMarkupStandard has to support free links - or we will have migration problems. CamelCase links could be emulated by free links, so that we may want them, but not need to demand them. (Alternatively, use WikiNameCanonicalization.)

Space as word separator is very important for usability. A successful wiki is using underscores while displaying spaces.

There are are only two theoretical types of free links: external framed (LEFT_DELIMITER free text RIGHT_DELIMITER) and internally linked (HANGING_DELIMITER? free INTERNAL_LINKAGE text HANGING_DELIMITER?). External frames have the advantage of making phrases easy to link; internal links have the advantage of putting architectural pressure to make smaller link titles.


# Titled Links

Inline
  • |title | link|
    • Argument against: the order in which the title and uri go in is not obvious.
    • Argument against: I don't think I get it—is the opposite order more obvious, or is the order in any of the items below more obvious? (See further remarks in Discussion, below)
Block
  • [link title]
  • [title | link]
  • [link | title]
    • Argument against: square brackets are often used inside quotations.
      • Quotations should be quoted anyways
  • [:link: text]
  • [[link][title]]
  • ((link | title))
  • [[link | title]]
    • Argument against: the order in which the title and uri go in is not obvious.
  • "title":link
  • [title > link]
  • [title >> link]
  • [title -> link]
    • Argument for: the order in which the title and uri go in is relatively obvious.
Discussion

May I cast a vote (put forth an argument) supporting a link format with the title first? Only so that, in an ordinary text editor, I can sort a list of titled (or untitled) links alphabetically with minimal jumping through hoops. I would point out that, if you want links ordered alphabetically, you want that done on the basis of the text that a reader sees, i.e., the title, not the actual link URL. -- RandyKramer

# Links to Anchors

  • [link#anchor]
  • [#anchorname anchor-reference]
  • #anchor[link]

# Interwiki Links

Inline
  • TargetWiki:TargetPage
  • TargetWiki.TargetPage
  • TargetWiki@TargetPage
Block
  • [[TargetWiki>TargetPage]]

# Avoiding Linking

See also the section Literal/Unprocessed Text below.

Method 1: Enclosing Markup
Inline
  • ""NoWikiLink""
Block
  • ]]NoWikiLink[[
  • ))NoWikiLink((
    • Argument against: using inverted brackets could lead to ambiguity for reading and parsing (for instance: [[link]]WikiWord[[link]].)
  • [[NoWikiLink]]
  • [[NoWikiLink]
= SGML/XML Markup =
  • <n>NoWikiLink</n>
  • <noautolink>NoWikiLink</noautolink>
  • <nolink>NoWikiLink</nolink>
Method 2: Breaking the Link Pattern
  • N¦oWikiLink
  • No``Wiki
    • Argument against:
    • Backticks are hard to read in many fonts and can be mangled by typesetting software.
  • No''''''WikiLink
    • Argument against: far too much mark-up.
Method 3: Escape Character
  • !NoWikiLink
    • Argument against: exclaimation points are rather ubiquitous (contra-indiction: they shouldn't be at the start of a word anyway.)

# Assorted

WikiWord link to anchor/section
WikiWord#anchor
Link to other language
[languagecode:pagename]
Link to other language and namespace
[languagecode:Namespace:pagename]
Absolute link to subpage
WikiWord/SubPage
Relative link to subpage
/SubPage
Discussion

I think that parsing http://... links could be harmful.

  • They are low quality links. Most of them are ugly and unreadable: http://somedomain.com/~mumbo/jumbo?p=blah
  • They are tricky to parse. "visit http://blah.com/00,123,002.html, and see yourself" (notice commas)
  • They might contain wiki markup or tags that confuse parsers. "http://html.info/tellmeabout=<pre>"

Also enforcing short links makes them low quality, too:

  • WAI priority 2 requirement: "Make sure that all link phrases make sense when read out of context". One-word links hardly meet that requirement.
  • Wikis often contain useful info, so being present in search engines is good idea. Search engines take keywords from links. "see [article] about mumbo and jumbo" is meaningless. "see [article about mumbo and jumbo]" is nice source of keywords and link is self-descriptive.

-- KornelLesinski

# Text Formatting

# Assorted =

Paragraphs
Blank lines separate paragraphs.
Strong emphasis
[+example text+]
Very strong emphasis
[++example text++]
Hilighted text
##example text##
Notes
[example text]
Reversed background color text
[rev example text]
Red text
{r}example text{/r} or <r>example text</r>
Green text
{g}example text{/g} or <g>example text</g>
Blue text
{b}example text{/b} or <b>example text</b>
Colored text
{#FFFFFF}example text{/#}
Justified text
<>( example )
  • Argument against: it's should be more about content than about presentation and looks

# Citations

  • ??source??
  • "quote (source)" or 'quote (source)'
  • "quote [source]" or 'quote [source]'

# Emphasis (italics)

Inline
  • /emphasized words/
  • //emphasized words
  • //emphasized words//
    • Argument for:
    • Intuitive. Looks like italics.
  • ''emphasized words''
    • Argument for:
    • A natural translation from print, where double-quote means italics. (I added a Gutenberg text to a wiki that uses this, and it naturally italicized where it should have because of this.)
  • ^emphasized words^
  • _emphasized words_
    • Argument for:
    • Established popular "markup" in text-only environments
    • Argument against:
    • Ambiguity with computer hostnames and URLs which use underscores
      • These should be quoted anyways
  • ~~emphasized words~~
  • {I}italicized text
  • ///emphasized words///
    • Argument against:
    • Too much mark-up.
Block
  • [/italicized text/]
  • [i italicized text]


# Bold =

Inline
  • *bold text*
  • **bold text**
    • Argument for:
    • Established popular "markup" in text-only environments
    • Argument against:
    • Ambiguous with established bulleting method
  • ##bold text##
  • ||bold text||
  • __bold text__
  • {B}bold text
  • '''bold text'''
    • Argument against:
    • Multiple single quotes can look like double quote characters in proportional fonts.
    • Too much mark-up.
    • Too similar to italic, and far too much markup when combining italic with bold.
Block
  • [*bold text*]
  • [b bold text]


# Inserted Text (underline)

Inline
  • _underline_
  • __underline__
  • ++underline++
Block
  • [_underline_]


# Deleted Text (strikethrough)

Inline
  • -strikethough-
  • --strikethough--
    • Argument against: unacceptable because hyphens are far too common - both single hyphens representing minus signs and double hyphens representing em-dashes.
Block
  • [-strikethrough-]
  • -/strikethrough/-


# Monospaced Text

Inline
  • =technical term=
    • Argument against: could conceivably cause some problem with mathematics.
  • @technical term@
  • @@technical term@@
  • #technical term#
  • ##technical term##
    • Argument for: `#` is often a comment character, and you don't put comments in inline code snippets.
  • `technical term`
    • Argument against: backticks are hard to read in many fonts and can be mangled by typesetting software.
  • 'technical term'
    • Argument against: unacceptable because single quotes are too common.
Block
  • [=technical term=]
  • {{technical term}}
  • /*technical term*/


# Literal/Unprocessed Text

Inline

  • %%example text%%
  • `example text`
  • ``example text``
    • Argument against: backticks are hard to read in many fonts and can be mangled by typesetting software.
  • ```example text```
    • Argument against:
    • Backticks are hard to read in many fonts and can be mangled by typesetting software.
    • Too much mark-up.

Block

  • {example text}
  • {{{example text}}}
    • Argument against:
    • Too much mark-up.
  • [%example text%]
  • [\example text\]
  • [=example text=]
  • [esc]example text[/esc]
  • [literal]example text[/esc]
    • Argument against:
    • No benefit over SGML/XML.
SGML/XML Markup
  • <nowiki>example text</nowiki>
  • <verbatim>example text</verbatim>
  • <ignore>example text</ignore>
  • <literal>example text</literal>


# Superscript Text

Inline
  • ^superscript text^
  • ^^superscript text^^
Block
  • [^superscript text^]


# Subscript Text

Inline
  • ,,subscript text,,
  • ~subscript text~
  • vvsubscript textvv
    • Argument against: the name of the wiki that has this syntax has been witheld to avoid humiliating the author.
Block =
  • [,subscript text,]


# Large Text

Inline
  • +large text+
    • Argument against: could conceivably cause problems with mathematics.
Block
  • [+large text+]
  • ~+small text+~!


# Small Text

Inline
  • -small text-
    • Argument against: unacceptable because single hyphens are too common.


Block
  • [-small text-]
  • !-small text-!
  • ~-small text-~


# Centered Text

Inline
  • {c}example text
  • <:>example text
Block
  • >>example text<<
  • ><( example text )

# Left Aligned Text

  • {l}example text
  • <(>example text
Block
  • <-( example text )

# Right Aligned Text

  • {r}example text
  • <)>example text
Block
  • ->( example text )

# Line Breaks

  • %%%%
  • >>>
  • \\
  • [[BR]]
    • Argument against:
    • No benefit over HTML.
    • Locale-specific.
    • Obsfuscated abbreviation.

# Headings

Method 1

A sequence of heading characters at the beginning of a line indicates heading level.

= Heading 1
== Heading 2
=== Heading 3
Argument against
Less important titles stand out more.
! Heading 1
!! Heading 2
!!! Heading 3
Argument for
Intuitive. Exclamation point says: here's something important.
Argument against
Less important titles stand out more.
- Heading 1
-- Heading 2
--- Heading 3
Argument against
Less important titles stand out more.
Argument against
Double hyphens at the beginning of a line may also introduce a signature.

Method 2

A sequence of heading characters at the beginning and end of a line indicates heading level.

= Heading 1 =
== Heading 2 ==
=== Heading 3 ===
Argument for
Intuitive. Looks like a banner.
Argument against
Less important titles stand out more.
-= Heading 1 =-
-== Heading 2 ==-
-=== Heading 3 ===-
Argument against
Forces user to count the correct number of characters twice.
Argument against
Less important titles stand out more.

Method 3

Rule: a sequence of heading characters at the beginning of a line indicates heading level, and any heading characters after the title are ignored.

= Heading 1 =======================================
== Heading 2 ===================
=== Heading 3 ===

Basically the number of heading characters at the end is ignored, as long as there is at least one.

Argument against
While more important titles may stand out, they don't have to. It would be nice if this rule was enforced.

Method 4

Rule: a line of text with all-capitalized words.

Any Line Of All Capitalized Words Becomes A Heading
Argument for
Clever.
Argument against
Actual titles don't have all capitalized words.
Argument against
Not feasible for most non-English languages.

Method 5

Rule: headings are underlined (or over-and underlined) with a printing nonalphanumeric character. The underline/overline must be at least as long as the title text.

=============
First Heading
=============
Second Heading
~~~~~~~~~~~~~~
Third Heading
-------------
Argument pro
Important titles stand out more.
Argument against
Hard to use with proportional fonts. A possible fix is to just require a minimum amount of underlining (eg. four characters).

Method 6

Rule: heading characters plus a number indicate heading level.

---+1 Heading 1
---+2 Heading 2
---+3 Heading 3

Method 7

Rule: heading characters plus additional sequence of characters to indicate heading level.

---+ Heading 1
---++ Heading 2
---+++ Heading 3
Argument against:
Too much markup.
Argument pro
Less important titles stand out more.

Method 8

Rule: Change of bullet character indicates change of level

* Heading 1 *
+ Heading 2 +
* Heading 1 *
- Heading 2 -
+ Heading 3 +

Method 9

Rule: Number of heading characters indicates level importance. The highest number of heading characters is heading 1, the second highest is heading 2, etc.

==== Heading 1
=== Heading 2
== Heading 3
= Heading 4
Argument pro
Important titles stand out more.
Argument against
There must be a maximum number of levels.

Miscellaneous

= # enumerated heading text =

# Lists and Indentations

# Known Bullet Characters

  • *
  • -
  • @
  • +
  • !
  • ?
  • >
  • %
  • #
  • o
    • Argument against:
    • The letter 'o' is a word in many languages.
  • x

# Unordered Lists

Method 1

Rule: a sequence of bullet characters indicate level.

* level 1
** level 2
*** level 3
000 also aligned with third level, but no bullet
... also aligned with third level, but no bullet
Argument for
easier to parse.
Argument against
less intuitive.
Method 2

Rule: a sequence of spaces followed by a bullet character indicate level.

* level 1
 * level 2
  * level 3
    * level 5
Argument against
Counting spaces, like other invisible characters, is not user friendly.
Method 2

Rule: an indent followed by a bullet character indicate level.

* level 1

     * level 2
            * level 3
             * level 4
Method 3

Rule: A change of bullet character indicates level change. Indetation optional.

* level 1
- level 2
+ level 3
- level 2 again
@ level 3
+ level 4

# Ordered Lists

Method 1

Rule: a sequence of ordered list characters indicate level.

# level 1
## level 2
### level 3
#3 restart numbering from 3
> level 1
>> level 2
>>> level 3
0 level 1
00 level 2
000 level 3
Method 2

Rule: a sequence of spaces followed by an enumerator indicate level.

1. level 1
 1. level 2
  1. level 3
1.#3 restart numbering from 3
1) level 1
 2) level 2
  3) level 3
Method 3

Rule: Each level has it's own numeration.

1. level 1
1.1 level 2
1.1.1 level 3
1.2 level 2

# Indentations / Block Quotes

Method 1

Rule: a sequence of spaces indicates indentation level.


outer
 indent 1
  indent 2
Method 2

Rule: a sequence of indentation characters indicates indentation level.

outer
: indent 1
:: indent 2
outer
> indent 1
>> indent 2
  • Argument for: commonly used when pretty-printing e-mails and news posts.

# Definition Lists

Method 1
;Term: Definition
   $ Term: Definition
Method 2
Term:: Definition
Method 3
Term:
    Definition


# Tables

Method 1: Sequence of Rows

  • | col1 | col2 | col3 |
  • || col1 || col2 || col3 ||

Method 2

[| col 1, row 1 || col 2, row 1 ||
|| col 1, row 2 || col 2, row 2 |]

Method 3: Drawing Boxes

+------------+------------+-----------+
| Header 1   | Header 2   | Header 3  |
+============+============+===========+
| body row 1 | column 2   | column 3  |
+------------+------------+-----------+
| body row 2 | Cells may span columns.|
+------------+------------+-----------+
| body row 3 | Cells may  | - Cells   |
+------------+ span rows. | - contain |
| body row 4 |            | - blocks. |
+------------+------------+-----------+
Argument against
Very hard to parse and takes a lot of effort to type.
|---------------------|
| Header 1 | Header 2 |
|=====================|
| Column 1 | Column 2 |
|---------------------|

Method 4

=====  =====  ======
   Inputs     Output
------------  ------
  A      B    A or B
=====  =====  ======
False  False  False
True   False  True
False  True   True
True   True   True
=====  =====  ======

Method 5: Definition Tables

Term 1 |
   Definition 1 begins here.
   Term 1.1 |
      Definition 1.1
   Term 1.2 |
      Definition 1.2
   This is part of definition 1.
Term 2 |
   Here's definition 2.

Method 6: Wiki-pipe Syntax

{|
!heading 1 !! heading2
!heading 3
|-
|text 1a || text 2a || text 3a
|-
|text 1b
|text 2b
|text 3b
|}

Method 7: Relational

[[Table][Seperator=;]
[Columns=Person,Height,Weight]
Person=Person; Height=Height; Weight=Weight
Person=Peter; Height=180; Weight=84
Person=Martha; Weight=52; Height=167
]

Method 8: Double commas

(Here's how it's done in SdiDesk)

Without headings

a,, b,, c
1,, 2,, 3
4,, 5,, 6

With heading

a,, b,, c
____
1,, 2,, 3
4,, 5,, 6
Argument for
Very quick and easy to enter
Argument against
Can confuse people who think that double comma implies an empty cell
Argument against
gets confusing to edit a wide table. (But this is true of most alternatives)
Argument against
precludes double-commas for other purposes (like what?)


# Cell Attributes

Cell Attribute Specification
  • |abc
  • ||<abc>
  • ||{a,b=c}
Cell Attributes in Use
Top alignment
{t} or <^>
Bottom alignment
{b} or <v>
Column spanning
{w=number} or <-number>
Row spanning
<|number>
Border width
{Tb=number}
Cell class
{C=string}
Cell style
{s=string}
Cell width
<100%>
Background color
<#XXXXXX>

Miscellaneous

|<<END|<<END|
col1 text is here
END
col2 text is here
END

# Horizontal Rules/Separators

  • ---
  • ---- (4 dashes at beginning of line--extra dashes are ignored.)
  • -----
  • ____ (4 underscores at beginning of line)


  • ---- (4 dashes at beginning of line--more than 4 gets thicker.)
    • Argument against:
    • This seems like a petty stylistic feature which I can't imagine someone actually caring about.

Discussion

Having four as a minimum is totally arbitrary. Parsing is not ambiguous as a separator should begin and end with a newline. Thus the only possible ambiguity is for the reader in that a small separator could get "lost" in the document. -- IanBollinger

One and two could not be the minimum for obvious reasons. Three cannot for I've seen at least one Wiki that uses --- for an em dash, and some word processors do as well. So four hyphens is the minimum number that can safely be chosen.


# Meta-Wiki

# Macros, Variables, Plugins and Extensions

  • [[MacroName(arguments)]]
  • <MacroName>
  • <MacroName(arguments)>
  • @MacroName@
  • {{MacroName(arguments)}}
  • %MacroName{"parameter" key="named parameter"}%
  • <?plugin MacroName arg1=val1 arg2=val2 ?>
  • [[MacroName][parameter=value]...body...]
  • $MacroName

# Comments

  • #comment
  • ##comment
SGML/XML Markup
  • <hide>comment</hide>

# Processing Instructions and Meta Data

  • #TYPE value
  • @type:name:value
  • %META:type{arg1="val1" arg2="val2"}%

# Character Replacement

  • -- becomes an em-dash. (—)
  • 1-1 becomes an en-dash. (1–1)
  • "text becomes a double left quote. (“text)
  • text" becomes a double right quote. (text”)
  • 'text becomes a single left quote. (‘text)
** Argument against: some English abbreviations begin with single right quotes ("I said 'e would").
  • don't becomes a single right quote. (don’t)
  • text' becomes a single right quote. (text’)
  • ' becomes &'apos; in XML.
  • > becomes &'gt; in SGML/XML.
  • < becomes &'lt; in SGML/XML.
  • & becomes &'amp; in SGML/XML.
  • ff, fi, fl, ffi, ffl and st become their appropriate ligatures.

# X/HTML Markup in Wikis

(Not including wikis that are too lazy to restrict the use of HTML at all, which is inherently insecure.)

  • <abbr>abbreviation</abbr>
  • <br> or <br/>
  • <cite>cited source</cite>
  • <code>program source code</code>
  • <dfn>definition</dfn>
  • <em>emphasis</em>
  • <h1>Heading 1</h1> through <h6>Heading 6</h6>
  • <hr> or <hr/> for horizontal rules
  • <kbd>keyboard text</kbd>
  • <pre>preformatted text</pre>
  • <strong>strong emphasis</strong>
  • <samp>sample output</samp>
  • <sub>subscript text</sub>
  • <sup>superscript text</sup>
  • <var>variable</var>
  • <acronym>acronym</acronym>: deprecated in XHTML 2.
  • <b>bold</b>: deprecated in XHTML 2.
  • <big>big text</big>: deprecated in XHTML 2.
  • <del>deleted text</del>: deprecated in XHTML 2.
  • <i>italics</i>: deprecated in XHTML 2.
  • <ins>inserted text</ins>: deprecated in XHTML 2.
  • <small>small text</small>: deprecated in XHTML 2.
  • <tt>technical term</tt>: deprecated in XHTML 2.
  • <s>strikethrough</s>: deprecated in HTML 4.
  • <strike>strikethrough</strike>: deprecated in HTML 4.
  • <u>underline</u>: deprecated in HTML 4.


# Suggested Basic Set

# Basic Set A

We are still in the idea collecting phase, so there is nothing here yet.

# Plan B

The "original" Basic Set B is smaller than the one below.

You can find it on CommunityWiki:MarkupStandardPlanB.

# Basic Set B

Internal CamelCase link
WikiWord
Internal free link
[[free link]]
External link
URL or [[URL][text]]
Paragraph
empty line separates paragraphs
Emphasis (usually italics)
''emphasized words''
Strong emphasis (usually bold)
'''strong emphasis'''
Headings
== Headline text ==, use more equal signs to get lower level headlines
Lists
Use number of asterisks, no leading space
Horizontal line (separator)
----
Indenting
:+<text>
Description lists
;hello: world
Line break
\\
Wiki escape
<nowiki>...</nowiki>

# TikiWiki RFC

TikiWiki tried to publish their syntax as an IETF RFC; cf. http://tikiwiki.org/RFCWiki

# Heilbronn University Proposal

The Heilbronn University is leading the Wiki Markup Standard Workshop at WikiSym 2006. More details: http://www.i3g.hs-heilbronn.de/Wiki.jsp?page=WikiMarkupStandard

Our recent discussion about WMS can be found at HeilbronnWMSDiscussion.

Discussion

This is like a minimal UseMod/OddMuse set, and doesn't use significant whitespace except for the empty line separating paragraphs. Note that working URL links mean that mailto:alex@emacswiki.org will be a valid link.

I suggest backslash to be only escaping character. Backslash as first character of line = ignoring wiki markup on whole line. Backslash immediately before markup = ignore that markup. Simple, short and standard (\ is escape char in almost all programming languages)

\''no emphasis here\'', and ''thats emphasis with \'' in it''
\ this line is like in <pre>
normal line \
without line-break

Um, No, either that first line would be whole-line escaped or the second would not. --Kevin D. Keck

Then perhaps it should read "backslash-space as first pair of characters of line = ignoring wiki markup"? -- ChrisPurcell


# General Discussion

Most users don't even know wiki yet at all, so it is the task of wiki authors to agree on a standard soon.

It unnatural if you have to speak 5 different wiki markup languages to discuss 5 topics on 5 different wiki engines, so it is less problematic to migrate to a common basic markup standard than to keep multiple markups.

Imagine the web without having a (mostly) common HTML markup language. -- Anonymous


Clearly having a common tongue (as English seems to serve for *some* of the web) for Wikis would make sense however, and make life a lot easier - especially if it gets promoted as an "official second language" on multiple wikis.

Ask yourself:

  • How does the above markup allow for editing, reference or inclusion of a specific named paragraph in a block of text?
  • How does it allow a table to include the contents of another page based on a pattern match?
    • for these please suggest a way for it or suggest a way for extending generally
  • How does it allow for page links which include spaces? (ie a non-camel case link?)
    • See free link above.
  • How about defining slides for a presentation?
  • What about links to in-browser edited images/diagrams ?
  • How do you deal with the fact that some wiki's allow structured records editable in the browser?
  • Or actively deal with structure? (PurpleWiki being one of the more intriguing - http://www.blueoxen.org/tools/purplewiki/)
  • Or deal with editing of adhoc structured data — such as descriptions of pictures in a semi-automatically generated gallery?
  • Probably more!
    • these might be not a topic for a basic standard as well as some of the other stuff mentioned here, but just suggest a way...

These are by definition the hardest aspects to deal with (all of which I've seen used in wikis) - and whilst clearly not suitable for being in the core of a minimalist markup, there is a need to be inclusive towards these desires rather than exclusive.

    • The goal is a markup standard that should be extensible with such stuff.

Personally, I'm led to two main conclusions:

  1. Any minimalist wiki markup should be very, very minimalistic and should also provide a method for extending the syntax and indicating this.
    1. If that would be the case, this would be completely hopeless. I hope most wiki authors could agree on such a basic syntax as that really should be in the basic set (not sure about underlining though as this might be confused with links).
  2. Wiki markup could become separated from storage - after all if we allow users to speak their local lingo to a wiki (which might be WikiStandardMarkup, rather than Usemod, MoinMoin, TWiki Wiki, etc). Like SunirShah this to me implies at least a partial parser based model rather than plain replacement


These obviously aren't mutually exclusive - so I'm encouraged to see this page. (It's not the only page of this kind however - when I get a chance I'll dig up references to others). However the syntax presented is almost entirely different to the syntaxes I use at present!

To give a flavor of the problem however for just linking to content:

  • PurpleWiki allows (or aims to allow) linking to specific paragraphs linked by NodeIDs (think paragraphs identifiable by MD5 hash), and by Hierarchical IDs ("Section 1, subsection 3, sub-sub-section 2, paragraph 3").
  • TWiki (http://twiki.org) allows links to:
    • Pages inside other namespaces (a 1-tier hierarchy)
    • Specific named paragraphs (it's "TOC" equivalent generates these)
  • O'Wiki (a Twiki fork - http://owiki.org/ ) allows linking to specific named sections (which may overlap)
    • This will however also allow logically nested namespaces, and topic/page-space slicing and splicing.
  • MegaTWiki (another fork, no longer supported I believe) allows links to nested namespaces.

-- MichaelSparks



So still nobody has agreed on a common markup. Personally I strongly share the opinion, that the Wiki idea will never get close to WorldDomination, if there isn't even the most basic set of markup users could assume to work in every WikiWare. But also I believe it is too arrogant to call it WikiMarkupStandard in respect to the multitude of existing implementations (and the authors of each probably had good reasons to choose a different markup set). Eventually this page should be renamed to StandardWikiMarkup and document just that, instead of all available markup variations.

Then it would be possible to register a WikiTextMimeType (text/wiki) with IANA and the IETF. This finally gets us a bit closer to the InterWiki idea by providing the WikiWorld with a standard similar to the base of the WWW (namely text/html). -- MarioSalzer


I've been thinking a lot on this topic as well and the only insight that I've come up with so far is that maybe there should be in effect three different Wiki formats--the first is oriented toward ease of entry, and actually works with most Wiki variant TextFormattingRules all at the same time! Then the system parses that text into a standard text format (which I'm calling canonical text, or CanText), which is also editable as text, but is very readable as is, but maybe not quite as easy to enter from scratch. One interesting principal is that if canonical text is run through the wiki text filter, the result will be exactly equal to the input. Then there is finally the final html markup. -- ChristopherAllen


Speaking of which, my ideas are embedded in the InfiniteMonkey parser (script) where I break down syntax types by the functional forms of the parser. Blocks are the fundamental part, which are broken down into line & paragraph parsing, then aggregate blocks like lists and tables. Links are also special. cf. WikiParserModel. We will likely have a workable "standard" by the end of the year for those who wish to follow it. -- SunirShah


It seems impossible to define one-size-fits-all wiki markup that everyone will agree (still, of course, standard should be made. authors may later agree with it).

I think every WikiEngine should be able to transform its markup to standarized DocumentModel - a HTML-like structure of paragraphs, lists, sections, headings, etc - which could be stored as XML and loaded into another WikiEngine. This would allow relatively easy conversion between any flavor of Wiki markup.

-- KornelLesinski


Requiring arbitrary sequences of identical characters in markup seems like a very bad idea to me--more than three is probably too many. (How many letters are in this sequence: 'lll'? How many in this one: 'llllll?')

Also, some of the formatting options seem superfluous. Underline, for instance, is bad typography, and is confusing because only links should have underlines. The , , , and tags were removed from the XHTML 2 specification for good reason, as and were from the HTML 4 specification. -- IanBollinger


I've been involved in standards work for the past decade, so that argument isn't anything I'd disagree with. But in the case of some of the things I've helped standardize, there was a ready audience that wanted it standardized. I'm not sure that the wiki community would actually want one standard by which they were all to follow. Put it this way: if next month a wiki standard showed up, how many wiki would dump what they have now (both pages and supporting software) and go with the standard? I think a more viable "standard" would be one for interchange, which is a different tack and doesn't require anyone to alter the primary syntax (and software) they use, only suggests a method(ology) for interchanging wiki content. I do still advocate a wiki-wide "standard" for identifying wiki syntaxes (the !#wiki idea) because those who wish to self-identify need only make a very minor change. I'd not suggest anyone adopt an entirely new syntax. I don't think very many would do that. (Esperanto as an interchange language) -- MurrayAltheim


Who said they'd have to dump everything? There can be transition phase when both standard and custom syntax can be used. You could automatically convert pages.
I found this page because I started to create my own wiki and I wanted to be compatible with something (and more familiar for users). Since there is no standard :(recommendation) I'm left alone with making yet-another incompatible wiki derivative
-- KornelLesinski

I get myself extremely frustrated by the differences in markup between different wikis, but I'm also involved in standards work (nntp, and usenet format), and it can be even more frustrating to create a standard that has a chance of being actually adopted (nntp is doing it, hell might freeze over before USEFOR does). From what I have seen of the wiki world, I concur with Murray that it's just very unlikely to happen, and think the effort would be best spent on interoperability, easy import, and conversion, parts of which are above referred to as "markup babel fish" or "markup skins". But if you want to proceed with this, your best chance is to do it via an IETF standards process to which you want to invite all those wiki authors who'll need to implement that standard. --AlixPiranha


I'd like to see people use some sort of CSS markup instead of inventing HTML2-style markup extensions for Wiki. -- MarioSalzer

----


It might be useful to categorize the percentage of use of categories. Bold is probably something used on like, 90% of pages. Italics, maybe 70%, Underline, 10%. This is just off the top of my head based on intuition. Regardless, they're the same category of markup. Color and font type/size changes strike me as ancillary markups. They sometimes serve a purpose of emphasis or clarification, but aren't generally needed over and above the basics. Of course, you could suggest the reason people don't use these extended markups is because they aren't readily available. But I imagine the majority of users come from a paper writing/word processing background where one doesn't typically make use of these effects. So it seems to me that it would make sense to make those "harder to use" in the interest of keeping the standard cleaner and more consistent.
-- JerryHsu
----

Maybe I could clean this up and make a mock standards document out of it? The TWiki people have done the same and I'm not too much of a fan of that syntax.

What seems best is decide on one syntax for each operation, put it here, and then put the rest and discussion into a separate page. This would be much easier on us implementors :). -- RyanNorton

Can anyone provide a link to that TWiki (mock) standards document? --RandyKramer

Actually its Tiki, http://tikiwiki.org/tiki-index.php?page=RFCWiki -- RyanNorton


Is the Tiki work Ryan refers to fairly well included in the material that has now accumulated here?

One aspect of a markup standard that appears to not yet have been considered is the specific intent of the use of the wiki technology. For example:

By recognizing that all wikis produce html and many can save that (rather than just displaying it) it becomes practical to use a wiki software's editing and display functions separately, at different times, which reduces my concerns about the use of different markups. In effect, I can use a Personal wiki (that make it possible for me to choose my 'personal markup'. As long as the Personal Wiki produces standard HTML (pretty well assured for any wiki that expects a Browser to do its presentation) than all that may be needed is an HTML2MyMarkUp conversion utility.

-- HansWobbe


May I suggest another way to look at some of this—how about looking at what we (I?) would like to achieve with wiki markup? Here are some things I'd like to be able to do that have not been possible / easy in the wikis I've looked at or used:

  • Indent "continuation paragraphs" in lists to match the indentation of the (relevant) list item—AFAICT, this can be facilitated by the addition of start and end list markers (equivalent to (i.e., to be translated to) <ul/ol>)
  • Allow continuous numbering of noncontinuous list items—again, AFAICT, this can be facilitated by the addition of start and end list markers
  • Allowing single or double (vertical) spacing between list items (same potential solution as above), and between other entities like headings and following paragraphs, (or headings and subheadings with no intervening text)
  • Allow "hierarchical" (not sure that's the right word, I mean variations of multiple numbers, i.e., 2.1, 2.a, II.A, etc.) numbering of list items and headings—although there is some support (IIUC) for numbering of list items in HTML, it does not include numbering as described, and (AFAIK) there is no support for numbering of headings (I've done a little but not much research trying to determine that), hence, both would require "logic" in the wiki engine rather than a simple translation to HTML to accomplish—hmm, it seems Usemod has accomplished this, at least for headings—I'll have to look into how they've done it.

Asides:

  • Part of what I want in a wiki is an easy means to "publish" nicely formatted stuff without knowing/learning HTML. In addition, make it easy for others to [modify | add to] my work (therefore the wiki).
  • I'm thinking about writing a TWiki-like thing in Ruby, and considering how to do the parsing. I think I want it to be fairly fast (I want the wiki to be fairly fast, not sure how significant the markup parsing time is in the overall scheme of things)—since I'm thinking about this, I'm (re) considering some of the TWiki markup with the thought of at least adding some that hasn't been covered so far (some of that is implied in the items above)) (If I really want fast parsing, I'm considering writing something in C to work character by character, and then "wrap" that for Ruby—I will have to learn C first ;-)
  • one of the (fairly oddball) ideas I'm considering is to "render" < and > as < and > by default, and (as a consequence) require "real" HTML tags to be marked up with HTML entities (& lt;, & gt;)—although I plan to allow HTML (probably filtering out some things, though, like script tags), I expect to use it rarely. (It sounds crazy perhaps, but can be done—I can write <, the TWiki thing can translate that to & lt; so the HTML will render it as <. Similarly, if I write & lt; for an HTML tag, the TWiki thing can translate that to < so it will be recognized (by the browser) as an HTML tag.
  • I hope this renders reasonably well, as this is not my "native" wiki language ;-)

-- RandyKramer

Formation of an IETF WorkingGroup

See also WikiMarkupStandardWorkingGroup for the mailing list supplementing this discussion.

Proposal for page refactoring

  • Follow WikiSyntax, move syntax examples to their own pages. Build a little lexicon of WikiSyntax.
  • Clean up those pages.
    • First, list known examples in as simple and clean fashion as possible.
    • Next, refactor the theory for the examples into a solid design argument.
  • Use these new syntax nuggets to inform a recommended WikiParserModel.
  • Develop from the WikiParserModel a WikiMarkupStandard (unlikely to ever happen), a WikiInterchangeFormat, and/or a standard theory of parsing that could be used to develop APIs for such things as a widely compatible WysiwygWiki widget.

This is a solid BarnRaisingNomination. Start with low-hanging fruit until the job is done. -- SunirShah


Dissenting Opinion(s)

Moved to WikiMarkupStandardIsMisguided.

When did DisagreeByDeletion become normal behaviour on MeatBall ? -- MichaelSamuels
What deletion are you referring to? If you are referring to discussion being moved to a separate page, then I did that as an effort to reduce the size of this page. Since this page's topic is on how best to implement a wiki markup standard and not why a wiki markup standard may not be the best idea, it made sense to move that discussion to its own page. Perhaps the title I chose for this page offended you and for that I apologize but I could not think of a better one at the time. -- IanBollinger
I didn't understand the context of the edit (indeed I didn't see the original edit), and as a result it looked like DisagreeByDeletion . I have no objection to shrinking things (refactoring GOOD :), I obviously disagree with the level of the change, but I'm not about to undo your work :). Sunir let me know that the discussion had picked up again recently (which is also good :) and I'm currently thinking of how best to put forward my thoughts on the current efforts in a positive way. Noticing the deletion just made me rather surprised, that's all. I probably could've phrased the question better - apologies!
As for the title, I think it's fairly accurate, though I would probably have said WikiMarkupStandardSyntaxIsMisguided is more accurate. shrug :) -- MichaelSamuels

Practical Considerations for Wiki Programmers

I've implemented several WikiParsers and also created several levels of complexity of WikiMarkup for various CMS. |I'm currently involved in creating a simplified, basic parser for the Wikipedia database.

Personally (from a programmers/practical POV) I'd proceed as follows, create a ancillary markup of the final html output, such as enclosing the rendered html in

...

. This would be the first step of letting others extract the html result from a rendered WikiPage without choking on other page elements.

Secondly create reverse parsers for each wiki that turn the html back to the respective wiki markup. at this stage you might find some unreversible/ambiguous markup which needs to be changed/disambiguated.

At this stage you will have a basic interchange format. After all, html exists as a well(?!) defined standard, and most wiki engines can already convert wiki markup to html.

Another advantage of using the html as interchange format is that any wiki markup that might be supported by one wiki and not by another will become portable.

Many wikis have some form of support for basic html syntax. Supporting the html versions of otherwise unsupported wiki markup is trivial. For example if a wiki supports underline syntax by turning "_underlined words_" into "underlined words" and the next wiki does not recognise the tags, it will simply keep the html syntax and thus preserve the document.

At the same time, any security concerns, such as otherwise unsupported html entering the wikis document space can be allayed by treating any non-converted, remaining html the same as html entered by a user. This might for example strip the tags form my example and just leave "underline word" in the final wiki text on the target wiki.

At this stage we can start investigating the spread and breadth of markup. For example using sample bodies of wiki pages and various engines it will be trivial to create statistics about the spread of particular markup and also to see which important markup is split into majorities among the compared wikis.

This will show a practical way forward regarding the basic wiki markup set. It can also serve as a guide to those looking to implement wiki parsers for wikis or CMS systems.

I think starting to discuss a standard markup set in isolation, without a study of the practical considerations and realities is difficult. I've always seen wiki as a practical solution to a practical problem, rather then a theoretical ideal to a narrow purpose.

I have many times rebuilt a basic wiki syntax set according to the best of my memories of my initial experiences with C2, and have managed to create multiple related syntaxes, incompatible with itself.

In addition, Wikis may go a similar way to html. If we offer raw wiki text output, and a browser site extension to process it (similar to RSS or FTP support), this may encourage more versatile parsers that, similar to html browsers, can interpret multiple wiki markups in a best effort basis. this may not be desirable, but it might be necessary for the evolution of wiki markup. End of uncontrolled rant. -- Wiki:SvenNeumann

The WikiGateway library provides getPage and getPageHTML functions to retrieve the wiki markup and the HTML associated with the content of a given wiki page. Currently WikiGateway supports UseMod, MoinMoin, and OddMuse, but the plan is for it to eventually support a lot more -- perhaps it would be of use to you in extracting text and HTML from wiki pages.

Also, if you write Python screenscraping routines for MediaWiki, and would like to contribute them to WikiGateway, I'd be interested :)

-- BayleShanks


See also MetaWeb:Wikitext_standard http://www.metaweb.com/wiki/wiki.phtml?title=Wikitext_standard .


Another critique/proposal: WikiCoreAstStandard


From WikiSym it seems that few developers are interested in the topic of a WikiMarkupStandard because almost everyone expects to move to WYSIWYG / exchange format, so that markups loose most of their importance. -- HelmutLeitner