Content Management/Publishing System Problems

Reading the Treating user Myopia I remembered one thing I has been thinking about lately: the problems of all Content Management Systems.
Please note that I will be talking about Web Content Publishing which produces (as an end result) a web page represented with HTML.

Since first CMS the bunch of issues started to arise and most of them are related to basic things:
  1. How user should edit the content?
  2. How the content provided by user should be represented as HTML?

Editing user content

The most popular options to edit the content are:
  1. Poor HTML - user can provide HTML as it is.
  2. Plain text - the text is rendered as-as. Similar to 1, but is HTML encoded, so this is really 1-to-1 match of the text.
  3. Plain text with formatting - user edits the plain text according to specific rules, then the text is parsed and renders as HTML.
  4. Rich Editing (WYSIWYG) - is basically user friendly poor HTML. The main difference is that user should not know the HTML itself (with all its pros and cons).
  5. Preprocessed HTML - this is mix of Poor HTML/Rich Editing and Plain text with formatting. The edited content is in format of HTML but reserves special markup to be parsed dynamically. (Think of ASP.NET, JSP or any other dynamic page generated on server, but provided by user)
To give you an idea where each of them is used and its benefits or issues:
  1. Poor HTML - usually used in primitive/simple management systems. But is always (>99%) used as a back-door when options 4 is available. Characteristics:

    1. Very customisable (anything that can be represented with HTML can be done in this mode).
    2. Requires knowledge from the users.
    3. The actual output is 100% corresponds to the edited content.
    4. Easily overused (users can apply fonts/colors/text size etc with no actual need).
    5. Previous point leads to non-maintainable CMS itself as it is technically very hard change common layout with this approach.
    6. No way to enforce web-standards.
    7. No way to enforce common site layout rules.

  2. Plain text. Is actually part of any system (not only CMS). We can see that everywhere where users' input is shown on a page. It anything from Contact Us and Registration form a real CMS. Its characteristics:

    1. Easy to edit.
    2. No customisation points at all. The output will always be plain text.
    3. The actual output will often be different (newlines and space in plain text are in most cases displayed as such, but just ignored by browsers).
    4. Web-standards are automatically enforced.
    5. Common site layout rules are N/A.

  3. Plain text with formatting. Is used in most of Wiki sites. Often called Markdown. The aim is to allow user to edit content in plain text, but reformat it to rich (HTML) content according to the rules of Markdown. Characteristics:

    1. Requires users' knowledge of the markdown.
    2. There are a lot differen mark-ups/downs which increases learning curve for end-users overall.
    3. Really describes the content and NOT how to represent the content.
    4. Has limited customisation. Only set of particular HTML tags can be rendered as a result.
    5. Easy to enforce web-standards.
    6. Easy to enforce common site layout rules.

  4. Rich Editing (WYSIWYG). The aim is to provide users ability to edit content and see how it looks just in place. What You See Is What you get. Used in most of CMS systems, blogs, forums etc where there is a requirement to publish something better than just a plain text. Considered to be the must-have option and is a de-facto standard for content publishing these days. Characteristics (some inherited from Poor HTML):

    1. Very customisable.
    2. Does not requires knowledge from the users (easy to use).
    3. The actual output in most cases is > 90% similar to the edited content (which is pretty good).
    4. Easily overused (users can apply fonts/colors/text size etc with no actual need).
    5. Previous point leads to non-maintainable CMS itself as it is technically very hard change common layout with this approach.
    6. Issues with the editors.
    7. No way to enforce web-standards.
    8. No way to enforce common site layout rules.
    9. Copy-paste from other documents is very buggy.

  5. Preprocessed HTML/MarkDown - this can used in systems that dynamically build online forms or are in some way application builders. The idea is to allow user to provide a content but still make it dynamic. Usually used in pretty complex applications. Typical example I can take from top of my head is DekiWiki. Characteristics:

    1. Requires knowledge of the markup.
    2. Customisation points are limited.
    3. Content is always mixed with dynamic behavior which leads to painful maintainability.
    4. The actual output is often totally different comparing to the original content.
    5. No way to enforce web-standards.
    6. No way to enforce common site layout rules.
Probably, the Rich Editor is the golden middle here. But the things go wrong when people try to copy-paste content (very rich-formatted), let's say, from MS Word. Hardly any WYSIWYG editor can handle that. And the result output will be awful. Forget about web standards....

The point I want to make here is that there is no unique option for all possible solutions. If you create a system for geeks - use mardown, for normal users - use Rich Editor and be prepared to review its content.

But generally, I feel there is a lack of science behind this. We need to analyse this area to provide a good solution for both users and developers, so they can live happy lives and not bother writing posts like the Treating user Myopia.

For any ideas on how to solve this publishing issue I promise beer.