Using TinyMCE for WSYISWYG editing

I will cover the architecture of my blog engine in another article, but one feature that I wanted to add was some form of WSYISWYG editor for two main uses: (i) so that end-users can add rich text comments to any articles, and (ii) to allow authors to do quick WSYSIWYG edits of articles.

The existing architecture allowed authors to use an FTP mapped directory to edit articles. In my case I use an Ubuntu Laptop for my personal workstation, and Ubuntu supports FTP directory mapping as standard through the GNU virtual filesystem, so I have a simple script where typing article 24, say, will open an OpenOffice.org HTML editor on article 24 so that I can use OOo’s rich edit features to author an article. Saving this will automatically update the master copy in the blog database. I find this to be a very effective method of doing bulk authoring. However, in cases where I want to make a quick charge – say when I spot a typo – I would really like to have an alternative one-click method of editing an article without needing to depend on FTP or OOo.

There are a number of good open-source HTML based WYSIWYG editors available, such as FCKeditor and TinyMCE which offer such functionality. In this article, I describe how I have integrated TinyMCE into my blog engine to add these features.

The TinyMCE engine is a complex client-side javascript application which makes heavy use of dynamic HTML support within modern browsers to allow end-user rich-text editing of content. #11 – Fullpage example is one such example of its use and integration into a web application. I don’t want to regurgitate the material that Moxiecode Systems AB have provided on their website, but rather to cover some of the specific issues that I had to address in doing this integration.

Customising the engine to my application

The engine is reasonably well documented through its wiki, and customising was a process of trial and error, plus trawling through the documentation. The editor comes in two variants: simple and advanced, but I decided that the simple version wasn’t up to either of my needs, so I went with the advanced. I then had to:

Decide which standard plugins that I needed. In the case of the comment editor, these were safari, emotions, inlinepopups, preview and searchreplace;
Allocate the buttons that I wished to enable to the theme_advanced_buttons1 button bar;
Add other trim, such as allow block formats and font sizes.

I then tested the configuration script in a variant of one of the Moxiecode-supplied examples. Once this was working, I could then optimise it for both user responsiveness and server load.

Optimising Javascript caching

As discussed in #13 – Load on demand using compressor, the editor engine supports lazy loading of the javascript, but I decided to use a simple variant which was to default to no comment input (as the significant number of article views do not result in a comment generation. If the reader wishes to make a comment then clicking the Post a comment URL will result in the editor being download and enabled on the readers browser. This includes the following lines in the page HTML (to enable comment submission):

<script type="text/javascript" src="includes/tinymce/tiny_mce.js"></script> 
<script type="text/javascript" src="includes/tinymce/tiny_mce_comment_bootstrap.js"></script>

The first is the bulk (some 105Kb) of the compressed javacode. The second is a small (3Kb) file which includes the tinyMCE_GZ client-side dynamic loader as a preamble, followed by the configuration for comments editor, as is given in the next section.

This stub determines the load-list of extra TinyMCE components needed to run this edit configuration and uses an AJAX-style XMLHttpRequest to a server-side loader tiny_mce_gzip.php which assembles, compresses and caches these in a single javascript download. I had a lot of problems getting the Moxiecode-supplied version of the server-side PHP loader to work. It had quite a few bugs in it, and the TinyMCE forum comments seemed to indicated that it is rarely used, so in the end I found it simpler just to rewrite it in current php5 style (It’s only 120 lines long).

Because of my Apache / .htaccess configuration (see Using .htaccess files on a Webfusion shared service), these scripts are normally cached in the browser cache, and hence are executed without additional network delays, and only in the case where the user has requested an edit. Even in the case of a cache miss, the editor is downloaded in three compressed javascript files totalling some 160 Kb so editing is still responsive.

WSYISWYG comment generation

Here Is the customised part of the tiny_mce_comment_bootstrap.js as at the date of this article. (The valid elements list needs a bit more trimming.):

var themeList = "advanced";
var pluginList = "safari,emotions,inlinepopups,preview,searchreplace";
function setup() {
 tinyMCE_GZ.init(
  { themes : themeList, plugins : pluginList, languages : "en", disk_cache : true }, 
  function() { 
   tinyMCE.init({
    mode : "textareas",
    theme : themeList,
    plugins : pluginList,
    theme_advanced_buttons1 : "bold,italic,underline,strikethrough,sub,sup,teletype,charmap,emotions,|,hr,bullist,numlist,|" + 
     ",cut,copy,paste,undo,redo,pastetext,|,replace,|,link,|,formatselect,fontsizeselect,|,preview,",
    theme_advanced_buttons2 : "",
    theme_advanced_buttons3 : "",
    theme_advanced_buttons4 : "",
    theme_advanced_toolbar_location : "bottom",
    theme_advanced_toolbar_align : "left",
    theme_advanced_font_sizes : "1,2,3,4",
    theme_advanced_blockformats : "p,pre,h3,h4,blockquote",
    valid_elements : "@[id|class|style|title],a[type|name|href|target],b/strong,i/em,strike,u,#p,-ol[type|compact],"+
     "-ul[type|compact],-li,br,-sub,-sup,-div,-span,-pre,-h3,-h4,hr,-font[face|size],q[cite],"+
     "tt,small,big,img[align<bottom?left?middle?right?top|alt|border|height|src|style|title|width]",
    content_css : "themes/terry/tinymce_content.css",
   })
  }) 
};
setup();

The only wrinkles were:

I use the htmlescapechars function on any text returned to the client <textarea> to prevent insertion attacks from malicious clients.
As I discuss below, I can’t trust that the end-user won’t disable javascript and bypass the editor to submit hostile HTML, so I post-process any submitted HTML to limit markup tags and attributes to a safe subset.
I allow users to add subheadings to comments but since each comment is at <h3>, these are limited to <h4> and <h5>.

WSYISWYG article generation

I already constrain my article content to use the site CSS and thus have a consistent look and feel. Therefore by convention I don’t make use of in-line styles. (There are a few exceptions, for example you need to use inline styles or classes to implement underlining in strict XMTHL 1.0 or later.) This means that my article editor is a very similar configuration to the comment configuration discussed above. However, as blog author’s are trusted I have included such features as HTML view/editing, full-screen editing and single-click save. I won’t bother to include the full configuration, but here are the two button bars that I use in the article editor:

theme_advanced_buttons1 : "bold,italic,underline,strikethrough,teletype,sub,sup,charmap,emotions,"+
  "|,justifyleft,justifycenter,|,hr,bullist,numlist,|,cut,copy,paste,pastetext,pasteword,visualchars,"+
  "|,link,unlink,anchor,image,cleanup,help,code,|,fullscreen,|,save",
theme_advanced_buttons2 : "undo,redo,|,search,replace,|,tablecontrols,link,|,formatselect",

Note that I don’t even have a fontsizeselect drop-down.

The one slight problem that I had is that I’ve picked up a common Wikipedia practice which is to use the teletype (<tt>) tag to mark fixed font inline text (such as this), and the TinyMCE does not provide a button for this as standard. I have added this as a quick code mod of a dozen or so new lines of (as this is simply another stateful matched tag, such as <b> and <sup> from the editor viewpoint). However, I do need to go back and do this as a proper custom plugin when I have time.

Using pure CSS – removing in-line style attributes

I have to scrub the returned HTML for two separate reasons:

The OOo webpage editor generates HTML 4.01 with some OOo-specific layout characteristics which are valid in HTML 4.01 but are either not compliant under XHTML 1.0 strict or include attributes which are redundant with my CSS. These include:
- Tags are upper case.
- The HTML is “prettified” by adding CRs and indentation whitespace.
- The <TT> tag includes an attribute CLASS="western".
The TinyMCE editor generates XHTML 1.0 strict output, but this again includes attributes which are redundant with my CSS. It also preserves any indentation whitespace, but removes the CRs and as a result, it outputs XHTML which contains “random” redundant whitespace. Also, as I mentioned above I can’t trust that users haven’t disabled their client-side editors and aren’t returning malicious HTML (for example javascript attributes).

So I post-process any returned HTML. I use a common routine for both comments and articles, though this is data driven to permit the small rule variations for article versus comment content. This code:

- Replaces all whitespace (other than inside <pre> tags) with a single space or CR, depending on line length.
- Converts all tags to lowercase, and closes tags such as <BR> giving <br/> etc..

Closes <LI> tags giving <li> … </li> pairs.
Removes unwanted tags and attributes.

My goal is to produce XHTML 1.0 strict output that will be rendered by my CSS as I want and that will be subsequently editable both within OOo and TinyMCE.

One side effect of modifying the TinyMCE code was that I liked its technique of data-driven building vectors of lamba functions to do the HTML parsing, so I rewrote my HTML parser to do likewise. (I can’t use the nice PHP library support as this isn’t available on Webfusion shared services, so it’s a case of “roll-your-own”.) The generator uses create_function to bind each generated decode routine to a dispatch table. It works well even if the PHP 5.2 implementation is a little tacky. So in the case of a hyperlink tag, the data string “a:name:type:href,” executes $tagParse['a'] = create_function( '$endTag,$attr', $code) where the generated $code is:

if( $endTag ) { $newTag = '</a>'; } else {  $newTag = '<a';
if( isset( $attr['name'] ) ) $newTag .= " name=\"$attr[name]\""; 
if( isset( $attr['type'] ) ) $newTag .= " type=\"$attr[type]\""; 
if( isset( $attr['href'] ) ) $newTag .= " href=\"$attr[href]\""; 
$newTag .= ">"; } 
return $newTag;

Customising the engine to my application

Optimising Javascript caching

WSYISWYG comment generation

WSYISWYG article generation

Using pure CSS – removing in-line style attributes

You Might Also Like

Putting it all together: my blog engine architecture

The EggBlog Adventure

A quick point about timing

Leave a Reply Cancel reply