Table of Contents
Issue
I am trying to encode a small lambda calculus with algebraic datatypes in Scheme. I want it to use lazy evaluation, for which I tried to use the primitives delay
and force
. However, this has a large negative impact on the performance of evaluation: the execution time on a small test case goes up by a factor of 20x.
While I did not expect laziness to speed up this particular test case, I did not expect a huge slowdown either. My question is thus: What is causing this huge overhead with lazy evaluation, and how can I avoid this problem while still getting lazy evaluation? I would already be happy to get within 2x the execution time of the strict version, but faster is of course always better.
Below are the strict and lazy versions of the test case I used. The test deals with natural numbers in unary notation: it constructs a sequence of 2^24
suc
s followed by a zero
and then destructs the result again. The lazy version was constructed from the strict version by adding delay
and force
in appropriate places, and adding let
-bindings to avoid forcing an argument more than once. (I also tried a version where zero
and suc
were strict but other functions were lazy, but this was even slower than the fully lazy version so I omitted it here.)
I compiled both programs using compile-file
in Chez Scheme 9.5 and executed the resulting .so
files with petite --program
. Execution time (user only) for the strict version was 0.578s, while the lazy version takes 11,891s, which is almost exactly 20x slower.
Strict version
(define zero 'zero)
(define (suc x) (cons 'suc x))
(define one (suc zero))
(define two (suc one))
(define three (suc two))
(define (twice m)
(if (eq? m zero)
zero
(suc (suc (twice (cdr m))))))
(define (pow2 m)
(if (eq? m zero)
one
(twice (pow2 (cdr m)))))
(define (consume m)
(if (eq? m zero)
zero
(consume (cdr m))))
(consume (pow2 (twice (twice (twice three)))))
Lazy version
(define zero (delay 'zero))
(define (suc x) (delay (cons 'suc x)))
(define one (suc zero))
(define two (suc one))
(define three (suc two))
(define (twice m)
(delay
(let ((mv (force m)))
(if (eq? mv 'zero)
(force zero)
(force (suc (suc (twice (cdr mv)))))))))
(define (pow2 m)
(delay
(let ((mv (force m)))
(if (eq? mv 'zero)
(force one)
(force (twice (pow2 (cdr mv))))))))
(define (consume m)
(delay
(let ((mv (force m)))
(if (eq? mv 'zero)
(force zero)
(force (consume (cdr mv)))))))
(force (consume (pow2 (twice (twice (twice three))))))
Solution
One can see statistics for the two phases of the test program using ChezScheme’s (time ...)
procedure:
$ scheme
Chez Scheme Version 9.5.2
> (load-program "strict.ss")
(time (pow2 (twice (...))))
21 collections
0.695561822s elapsed cpu time, including 0.521065634s collecting
0.695607000s elapsed real time, including 0.521191000s collecting
672586992 bytes allocated, including 236483824 bytes reclaimed
(time (consume u2^24))
no collections
0.037766347s elapsed cpu time
0.037762000s elapsed real time
0 bytes allocated
and for the lazy version:
$ scheme
> (load-program "lazy.ss")
(time (pow2 (twice (...))))
no collections
0.000000000s elapsed cpu time
0.000000000s elapsed real time
400 bytes allocated
(time (force (consume u2^24)))
572 collections
11.997971385s elapsed cpu time, including 10.798406971s collecting
12.012723000s elapsed real time, including 10.813517000s collecting
4832215216 bytes allocated, including 4460306000 bytes reclaimed
So 90% of time is collecting. Adjusting collector parameters may improve this, eg:
(collect-trip-bytes 1000000)
(collect-generation-radix (greatest-fixnum))
(heap-reserve-ratio 2.0)
(these values halve lazy time OMM)
One might also replace ChezScheme’s delay
and force
with stripped down versions:
(import (except (chezscheme) delay force))
(define (make-promise p)
(let ([value (box p)])
(lambda ()
(when (box? value)
(let ([x ((unbox value))])
(when (box? value)
(set! value x))))
value)))
(define-syntax delay
(syntax-rules ()
[(_ expr) (make-promise (lambda () expr))]))
(define (force promise)
(promise))
(add above to the beginning of lazy.ss)
NB these have no error checking, and don’t handle multiple values or lazy boxes.
(The ChezScheme implementation is here)
With these changes the lazy version is about 4x slower than strict:
$ scheme
> (load-program "lazy.ss")
(time (pow2 (twice (...))))
no collections
0.000000000s elapsed cpu time
0.000000000s elapsed real time
336 bytes allocated
(time (force (consume u2^24)))
3813 collections
2.977003428s elapsed cpu time, including 2.175818398s collecting
2.977292000s elapsed real time, including 2.179504000s collecting
4029652320 bytes allocated, including 2414247968 bytes reclaimed
Answered By – mnemenaut
Answer Checked By – Mary Flores (BugsFixing Volunteer)