Editoria Typescript transforms HTML into a format required for the Coko Foundation’s Wax WYSIWYG word processor for Editoria. While Wax has been built specifically for book editing and publication, it is by no means its only application, and it could be repurposed. Other similar chains could be implemented to target another format.
Editoria Typescript translates the document structure, inline and class formatting, endnotes and footnotes into a subset of near-HTML, while eliminating HTML attributes not used by Wax.
Editoria Typescript should be run in the following order:
It is possible to specify line breaks within paragraphs in Word (
<w:br/>, which are extracted as XHTML
<br class="br" /> tags).
As Wax does not support
<br>s, this step simply divides paragraphs on breaks, removing the break and creating two separate
<p> elements instead.
<p style="font-family: Times New Roman; text-indent: 36pt"> Kṛṣṇadevarāya discusses this practice in the following verse: <br class="br"/> Make trustworthy Brahmins </p>
<p>Kṛṣṇadevarāya discusses this practice in the following verse:</p> <p>Make trustworthy Brahmins</p> <p>The commanders of your forts</p>
XSweet’s initial extraction divides the contents of the HTML document into sections:
<div class-"docx-endnotes">, and
<div class-"docx-footnotes">. This step rearranges the content:
Notes and their
ids are also rewritten, from:
<div class="docx-endnotes"> <div class="docx-endnote" id="en1"> <p class="EndnoteText"> <span class="EndnoteReference"> <span class="endnoteRef">1</span> </span> endnote</p> </div> </div> <div class="docx-footnotes"> <div class="docx-footnote" id="fn1"> <p class="FootnoteText"> <span class="FootnoteReference"> <span class="footnoteRef">a</span> </span> footnote</p> </div> </div>
<div id="notes"> <note-container id="container-en1"> <p class="EndnoteText"> endnote</p> </note-container> <note-container id="container-fn1"> <p class="FootnoteText"> footnote</p> </note-container> </div>
These are then properly linked and nicely displayed in Wax. Endnotes and footnotes are combined into one sequential list:
editoria-basic.xsl writes some properties from CSS
style attributes inline:
font-style: italicis written to inline elements wrapped in an
font-weight: boldis written inline as
text-decoration: underlineis written inline as
<i>tags, which is*
The following inline formatting tag mapping then occur:
<b>s are converted to
<u>is converted to
<i>is then converted to
Note that we have made the decision convert underlining to italics. Wax does not currently support underlining.
style information is dropped. Bye bye
<p class="EndnoteText"> endnote</p>becomes
Other tag attributes (e.g.
id) are passed through
<sup> tags are passed through
Inline markup on whitespace only (spaces, tabs) is removed, e.g.
tabs are removed:
Paragraphs or headings with only whitespace or no content at all are removed, e.g.
Internal-to-Word bookmarks (see this example](/xsweet-core/#links)) are removed
<head><style> tag is removed