<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="generator" content="AsciiDoc 8.6.9">
<title>Unicode</title>
<link rel="stylesheet" href="./asciidoc.css" type="text/css">
<link rel="stylesheet" href="./pygments.css" type="text/css">


<script type="text/javascript" src="./asciidoc.js"></script>
<script type="text/javascript">
/*<![CDATA[*/
asciidoc.install();
/*]]>*/
</script>
<link rel="stylesheet" href="./mlton.css" type="text/css">
</head>
<body class="article">
<div id="banner">
<div id="banner-home">
<a href="./Home">MLton 20180207</a>
</div>
</div>
<div id="header">
<h1>Unicode</h1>
</div>
<div id="content">
<div class="sect1">
<h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>
<div class="sectionbody">
<div class="paragraph"><p>There is no real support for Unicode in the
<a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away
sentences along the lines of "the characters with numbers 0 to 127
coincide with the ASCII character set."</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>
<div class="sectionbody">
<div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis
Library</a>.  The general consensus (which includes the opinions of the
editors of the Basis Library) is that the <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span>
structures are insufficient for the purposes of Unicode.  There is no
<span class="monospaced">LargeChar</span> structure, which in itself is a deficiency, since a
programmer can not program against the largest supported character
size.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_current_support_in_mlton">Current Support in MLton</h2>
<div class="sectionbody">
<div class="paragraph"><p>MLton, as a minor extension over the Definition, supports UTF-8 byte
sequences in text constants.  This feature enables "UTF-8 convenience"
(but not comprehensive Unicode support); in particular, it allows one
to copy text from a browser and paste it into a string constant in an
editor and, furthermore, if the string is printed to a terminal, then
will (typically) appear as the original text.  See the
<a href="SuccessorML#ExtendedTextConsts">extended text constants feature of
Successor ML</a> for more details.</p></div>
<div class="paragraph"><p>MLton, also as a minor extension over the Definition, supports
<span class="monospaced">\Uxxxxxxxx</span> numeric escapes in text constants and has preliminary
internal support for 16- and 32-bit characters and strings.</p></div>
<div class="paragraph"><p>MLton provides <span class="monospaced">WideChar</span> and <span class="monospaced">WideString</span> structures, corresponding
to 32-bit characters and strings, respectively.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_questions_and_discussions">Questions and Discussions</h2>
<div class="sectionbody">
<div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode
in MLton/SML.  In December 2004, there was a discussion that led to
some seemingly sound design decisions.  The discussion started at:</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a>
</p>
</li>
</ul></div>
<div class="paragraph"><p>There is a good summary of points at:</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a>
</p>
</li>
</ul></div>
<div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of
some coding.</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a>
</p>
</li>
</ul></div>
</div>
</div>
<div class="sect1">
<h2 id="_also_see">Also see</h2>
<div class="sectionbody">
<div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode
documents.</p></div>
</div>
</div>
</div>
<div id="footnotes"><hr></div>
<div id="footer">
<div id="footer-text">
</div>
<div id="footer-badges">
</div>
</div>
</body>
</html>
