Editoria Typescript transforms HTML into a format required for the Coko Foundation’s Wax WYSIWYG word processor for Ketida (previously named Editoria). While Wax has been built specifically for book editing and publication, it is by no means its only application, and it could be repurposed. Other similar chains could be implemented to target another format.
Editoria Typescript translates the document structure, inline and class formatting, endnotes and footnotes into a subset of near-HTML, while eliminating HTML attributes not used by Wax.
Editoria Typescript should be run in the following order:
p-split-around-br.xsl
editoria-basic.xsl
editoria-reduce.xsl
p-split-around-br.xsl
It is possible to specify line breaks within paragraphs in Word (<w:br/>
, which are extracted as XHTML <br class="br" />
tags).
As Wax does not support <br>
s, this step simply divides paragraphs on breaks, removing the break and creating two separate <p>
elements instead.
<p style="font-family: Times New Roman; text-indent: 36pt">
Kṛṣṇadevarāya discusses this practice in the following verse:
<br class="br"/>
Make trustworthy Brahmins
</p>
becomes
<p>Kṛṣṇadevarāya discusses this practice in the following verse:</p>
<p>Make trustworthy Brahmins</p>
<p>The commanders of your forts</p>
editoria-basic.xsl
XSweet’s initial extraction divides the contents of the HTML document into sections: <div class-"docx-content">
, <div class-"docx-endnotes">
, and <div class-"docx-footnotes">
. This step rearranges the content:
<div class="docx-content">
becomes <container id="main">
<div id="notes">
Notes and their id
s are also rewritten, from:
<div class="docx-endnotes">
<div class="docx-endnote" id="en1">
<p class="EndnoteText">
<span class="EndnoteReference">
<span class="endnoteRef">1</span>
</span> endnote</p>
</div>
</div>
<div class="docx-footnotes">
<div class="docx-footnote" id="fn1">
<p class="FootnoteText">
<span class="FootnoteReference">
<span class="footnoteRef">a</span>
</span> footnote</p>
</div>
</div>
to
<div id="notes">
<note-container id="container-en1">
<p class="EndnoteText"> endnote</p>
</note-container>
<note-container id="container-fn1">
<p class="FootnoteText"> footnote</p>
</note-container>
</div>
These are then properly linked and nicely displayed in Wax. Endnotes and footnotes are combined into one sequential list:
editoria-basic.xsl
writes some properties from CSS style
attributes inline:
font-style: italic
is written to inline elements wrapped in an <em>
tagfont-weight: bold
is written inline as <strong>
tagstext-decoration: underline
is written inline as <i>
tags, which is*The following inline formatting tag mapping then occur:
<b>
s are converted to <strong>
<u>
is converted to <i>
<i>
is then converted to <em>
Note that we have made the decision convert underlining to italics. Wax does not currently support underlining.
editoria-reduce.xsl
All class
and style
information is dropped. Bye bye class
, bye-bye style
!
<p class="EndnoteText"> endnote</p>
becomes <p> endnote</p>
Other tag attributes (e.g. id
) are passed through
<sub>
and <sup>
tags are passed through
Inline markup on whitespace only (spaces, tabs) is removed, e.g. <b> <b>
tabs
are removed: <span class="tab">
Paragraphs or headings with only whitespace or no content at all are removed, e.g. <p></p>
, <p> </p>
, <h1></h1>
Internal-to-Word bookmarks (see this example](/xsweet-core/#links)) are removed
<head><style>
tag is removed