<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="generator" content="AsciiDoc 8.6.9">
<title>ShareZeroVec</title>
<link rel="stylesheet" href="./asciidoc.css" type="text/css">
<link rel="stylesheet" href="./pygments.css" type="text/css">


<script type="text/javascript" src="./asciidoc.js"></script>
<script type="text/javascript">
/*<![CDATA[*/
asciidoc.install();
/*]]>*/
</script>
<link rel="stylesheet" href="./mlton.css" type="text/css">
</head>
<body class="article">
<div id="banner">
<div id="banner-home">
<a href="./Home">MLton 20180207</a>
</div>
</div>
<div id="header">
<h1>ShareZeroVec</h1>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div class="paragraph"><p><a href="ShareZeroVec">ShareZeroVec</a> is an optimization pass for the <a href="SSA">SSA</a>
<a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_description">Description</h2>
<div class="sectionbody">
<div class="paragraph"><p>An SSA optimization to share zero-length vectors.</p></div>
<div class="paragraph"><p>From <a href="https://github.com/MLton/mlton/commit/be8c5f576"><span class="monospaced">be8c5f576</span></a>, which replaced the use of the
<span class="monospaced">Array_array0Const</span> primitive in the Basis Library implementation with a
(nullary) <span class="monospaced">Vector_vector</span> primitive:</p></div>
<div class="quoteblock">
<div class="content">
<div class="paragraph"><p>The original motivation for the <span class="monospaced">Array_array0Const</span> primitive was to share the
heap space required for zero-length vectors among all vectors (of a given type).
It was claimed that this optimization is important, e.g., in a self-compile,
where vectors are used for lots of syntax tree elements and many of those
vectors are empty. See:
<a href="http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html">http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html</a></p></div>
<div class="paragraph"><p>Curiously, the full effect of this optimization has been missing for quite some
time (perhaps since the port of <a href="ConstantPropagation">ConstantPropagation</a> to the SSA IL).  While
<a href="ConstantPropagation">ConstantPropagation</a> has "globalized" the nullary application of the
<span class="monospaced">Array_array0Const</span> primitive, it also simultaneously transformed it to an
application of the <span class="monospaced">Array_uninit</span> (previously, the <span class="monospaced">Array_array</span>) primitive to
the zero constant.  The hash-consing of globals, meant to create exactly one
global for each distinct constant, treats <span class="monospaced">Array_uninit</span> primitives as unequal
(appropriately, since <span class="monospaced">Array_uninit</span> allocates an array with identity (though
the identity may be supressed by a subsequent <span class="monospaced">Array_toVector</span>)), hence each
distinct <span class="monospaced">Array_array0Const</span> primitive in the program remained as distinct
globals.  The limited amount of inlining prior to <a href="ConstantPropagation">ConstantPropagation</a> meant
that there were typically fewer than a dozen "copies" of the same empty vector
in a program for a given type.</p></div>
<div class="paragraph"><p>As a "functional" primitive, a nullary <span class="monospaced">Vector_vector</span> is globalized by
ClosureConvert, but is further recognized by ConstantPropagation and hash-consed
into a unique instance for each type.</p></div>
</div>
<div class="attribution">
</div></div>
<div class="paragraph"><p>However, a single, shared, global <span class="monospaced">Vector_vector ()</span> inhibits the
coercion-based optimizations of <span class="monospaced">Useless</span>.  For example, consider the
following program:</p></div>
<div class="listingblock">
<div class="content"><div class="highlight"><pre><span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valOf</span><span class="w"> </span><span class="p">(</span><span class="n">Int</span><span class="p">.</span><span class="n">fromString</span><span class="w"> </span><span class="p">(</span><span class="n">hd</span><span class="w"> </span><span class="p">(</span><span class="n">CommandLine</span><span class="p">.</span><span class="n">arguments</span><span class="w"> </span><span class="p">())))</span><span class="w"></span>

<span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="n">v1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">tabulate</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"></span>
<span class="w">                              </span><span class="k">let</span><span class="w"> </span><span class="k">val</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">fromInt</span><span class="w"> </span><span class="n">i</span><span class="w"></span>
<span class="w">                              </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="n">w</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mh">0wx1</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="mh">0wx1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w</span><span class="p">)</span><span class="w"></span>
<span class="w">                              </span><span class="k">end</span><span class="p">)</span><span class="w"></span>
<span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="n">v2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span><span class="n">w1</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx2</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w2</span><span class="p">,</span><span class="w"> </span><span class="mh">0wx3</span><span class="w"> </span><span class="n">*</span><span class="w"> </span><span class="n">w3</span><span class="p">))</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>
<span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="n">v3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">vector</span><span class="w"> </span><span class="p">(</span><span class="n">VectorSlice</span><span class="p">.</span><span class="n">slice</span><span class="w"> </span><span class="p">(</span><span class="n">v1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">SOME</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="n">-</span><span class="w"> </span><span class="mi">2</span><span class="p">)))</span><span class="w"></span>
<span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="n">ans1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((</span><span class="n">w1</span><span class="p">,</span><span class="n">w2</span><span class="p">,</span><span class="n">w3</span><span class="p">),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w1</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w3</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v1</span><span class="w"></span>
<span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="n">ans2</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v2</span><span class="w"></span>
<span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="n">ans3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">Vector</span><span class="p">.</span><span class="n">foldl</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">((_,</span><span class="n">w2</span><span class="p">,_),</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="n">w2</span><span class="p">)</span><span class="w"> </span><span class="mh">0wx0</span><span class="w"> </span><span class="n">v3</span><span class="w"></span>

<span class="w">    </span><span class="k">val</span><span class="w"> </span><span class="p">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">print</span><span class="w"> </span><span class="p">(</span><span class="n">concat</span><span class="w"> </span><span class="p">[</span><span class="s">&quot;ans1 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;  &quot;</span><span class="p">,</span><span class="w"></span>
<span class="w">                           </span><span class="s">&quot;ans2 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans2</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;  &quot;</span><span class="p">,</span><span class="w"></span>
<span class="w">                           </span><span class="s">&quot;ans3 = &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">Word16</span><span class="p">.</span><span class="n">toString</span><span class="w"> </span><span class="n">ans3</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">])</span><span class="w"></span>
</pre></div></div></div>
<div class="paragraph"><p>We would like <span class="monospaced">v2</span> and <span class="monospaced">v3</span> to be optimized from
<span class="monospaced">(word16 * word16 * word16) vector</span> to <span class="monospaced">word16 vector</span> because only
the 2nd component of the elements is needed to compute the answer.</p></div>
<div class="paragraph"><p>With <span class="monospaced">Array_array0Const</span>, each distinct occurrence of
<span class="monospaced">Array_array0Const((word16 * word16 * word16))</span> arising from
polyvariance and inlining remained a distinct
<span class="monospaced">Array_uninit((word16 * word16 * word16)) (0x0)</span> global, which
resulted in distinct occurrences for the
<span class="monospaced">val v1 = Vector.tabulate ...</span> and for the
<span class="monospaced">val v2 = Vector.map ...</span>. The latter could be optimized to
<span class="monospaced">Array_uninit(word16) (0x0)</span> by <span class="monospaced">Useless</span>, because its result only
flows to places requiring the 2nd component of the elements.</p></div>
<div class="paragraph"><p>With <span class="monospaced">Vector_vector ()</span>, the distinct occurrences of
<span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> arising from
polyvariance are globalized during <span class="monospaced">ClosureConvert</span>, those global
references may be further duplicated by inlining, but the distinct
occurrences of <span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> are
merged to a single occurrence.  Because this result flows to places
requiring all three components of the elements, it remains
<span class="monospaced">Vector_vector((word16 * word16 * word16)) ()</span> after
<span class="monospaced">Useless</span>. Furthermore, because one cannot (in constant time) coerce a
<span class="monospaced">(word16 * word16 * word16) vector</span> to a <span class="monospaced">word16 vector</span>, the <span class="monospaced">v2</span>
value remains of type <span class="monospaced">(word16 * word16 * word16) vector</span>.</p></div>
<div class="paragraph"><p>One option would be to drop the 0-element vector "optimization"
entirely.  This costs some space (no sharing of empty vectors) and
some time (allocation and garbage collection of empty vectors).</p></div>
<div class="paragraph"><p>Another option would be to reinstate the <span class="monospaced">Array_array0Const</span> primitive
and associated <span class="monospaced">ConstantPropagation</span> treatment.  But, the semantics
and purpose of <span class="monospaced">Array_array0Const</span> was poorly understood, resulting in
this break.</p></div>
<div class="paragraph"><p>The <a href="ShareZeroVec">ShareZeroVec</a> pass pursues a different approach: perform the 0-element
vector "optimization" as a separate optimization, after
<span class="monospaced">ConstantPropagation</span> and <span class="monospaced">Useless</span>.  A trivial static analysis is
used to match <span class="monospaced">val v: t vector = Array_toVector(t) (a)</span> with
corresponding <span class="monospaced">val a: array = Array_uninit(t) (l)</span> and the later are
expanded to
<span class="monospaced">val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)</span>
with a single global <span class="monospaced">val zeroArr_[t] = Array_uninit(t) (0)</span> created
for each distinct type (after coercion-based optimizations).</p></div>
<div class="paragraph"><p>One disadvantage of this approach, compared to the <span class="monospaced">Vector_vector(t) ()</span>
approach, is that <span class="monospaced">Array_toVector</span> is applied each time a vector
is created, even if it is being applied to the <span class="monospaced">zeroArr_[t]</span>
zero-length array.  (Although, this was the behavior of the
<span class="monospaced">Array_array0Const</span> approach.)  This updates the object header each
time, whereas the <span class="monospaced">Vector_vector(t) ()</span> approach would have updated
the object header once, when the global was created, and the
<span class="monospaced">zeroVec_[t]</span> global and the <span class="monospaced">Array_toVector</span> result would flow to the
join point.</p></div>
<div class="paragraph"><p>It would be possible to properly share zero-length vectors, but doing
so is a more sophisticated analysis and transformation, because there
can be arbitrary code between the
<span class="monospaced">val a: t array = Array_uninit(t) (l)</span> and the corresponding
<span class="monospaced">val v: v vector = Array_toVector(t) (a)</span>, although, in practice,
nothing happens when a zero-length vector is created.  It may be best
to pursue a more general "array to vector" optimization that
transforms creations of static-length vectors (e.g., all the
<span class="monospaced">Vector.new&lt;N&gt;</span> functions) into <span class="monospaced">Vector_vector</span> primitives (some of
which could be globalized).</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_implementation">Implementation</h2>
<div class="sectionbody">
<div class="ulist"><ul>
<li>
<p>
<a href="https://github.com/MLton/mlton/blob/master/mlton/ssa/share-zero-vec.fun"><span class="monospaced">share-zero-vec.fun</span></a>
</p>
</li>
</ul></div>
</div>
</div>
<div class="sect1">
<h2 id="_details_and_notes">Details and Notes</h2>
<div class="sectionbody">
<div class="paragraph"><p></p></div>
</div>
</div>
</div>
<div id="footnotes"><hr></div>
<div id="footer">
<div id="footer-text">
</div>
<div id="footer-badges">
</div>
</div>
</body>
</html>
