Preprocessing

Contact

About Preprocessing Tools

This short text explains architecture and background of tools in this project:

multi-language authoring
live data embedder

Architecture

Let me show you a basic architecture of how these tools are combined and connected to perform a desired authoring and publishing work.

A text passes in three conversion phases. In reverse order, the final output is HTML (or possibly PDF). An intermediate Markdown (or other wikitext) source is converted to the final output. The first written source text contains specially designed markings. These markings are processed by one or more authoring tools to produce a pure Markdown text. I call the first-written manuscript text as PREPRO-marked text.

Preprocessings prior to Markdown-HTML conversion are done by a collection of Perl scripts. They are connected by UNIX pipes to form a chain of processings. The last part of the chain is a Markdown-to-HTML converter (I currently use Pandoc for this purpose).

This processing chain can be executed from a UNIX command line usually using a shell script to pipe these tools together.

$./Selector.pm --select ja < README.mdp | ./Draw.pm | pandoc ... > README.html

In addition, the tools can be incorporated into an Apache web server to produce content dynamically. A passage-marked text can be processed and converted to HTML on the fly on reception of a browser request to the server. A CGI shell script is used to start and connect the perl tools and the converter.

Background

A basic idea behind this authoring project is preprocessing of a source text that can be passed to down-stream tools such as a Markdown-to-HTML converter.

Preprocessing of text or programming code is called in many different ways. An old UNIX tool 'm4' is called macro processor and used to replace words in a text file. The C language preprocessor replaces names with constants and selects code blocks. Web engineers use tools called template engine to replace words in an HTML page such as Java Server Pages for Java, PHP, React for Javascript, to name a few.

With preprocessing of text, you can freely add trivial but convenient features to a source text unless the addition does not break wikitext or HTML syntax rules. You can still use a popular wikitext converter and stable web server such as Apache without any compromise.

Two kinds of preprocessing exist: passage- or block-level markings operate on one or more lines while inline or span-level markings operate phrases or words in a line. For example, the Selector handles block-level text while the Live Data Embedder handles inline macros (varriables and actions).

A good thing about preprocessing is it is personal or private. A souce text prior to be applied preprocessing does not have to be public or does not need to be compliant to any well-known standards or specifications. You can freely design the input format as long as you have a tool to convert it to a legitemate final output format such as Markdown or HTML.

I am developing these preprocessing tools in order to get freedom in writing style. I want to write the way I like to do. I am glad if any of my tools suit your needs and help you write more easily and with fun.

Presented by Kobu.Com

Written 2020-May-21
Updated 2020-Oct-11 minor changes

Contact