Brett Saiki

Rearchitecting Herbie’s improvement loop

2024-01-30T00:00:00+00:00

Herbie’s improvement loop has five basic phases:

Expression selection
Local error analysis
Rewriting
1. Taylor polynomial approximation ("taylor")
2. rule-based rewriting ("rr")
Simplify
Prune

First, Herbie selects a few expressions (1) from its database as a starting point for rewriting. Then, it analyzes (2) the expression to find AST nodes that exhibit high local error. Based on the analysis, the tool utilizes a couple rewriting techniques including polynomial approximation and equality saturation via the egg library. Finally, a second rewriting pass simplifies expressions using egg before merging the new expressions with the current set of alternative implementations, keeping only the best.

This sequential process is repeated for a fixed number of iterations before Herbie extracts and fuses expressions through an algorithm called regime inference. All versions of Herbie utilize this architecture in one way or another. In the last couple versions of the tool, I have noticed a tension building between the rewriting phases within the improve loop.

Herbie employs two primary IRs which we will call SpecIR (programs with real number semantics) and ProgIR (programs with floating-point semantics). We can convert from ProgIR to SpecIR without additional information since we simply replace the rounding operation for every mathematical operator with the identity function. However, translating from SpecIR to ProgIR requires assigning a finite-precision rounding operation to each mathematical function.

Returning to Herbie’s rewriting phases, observer the approximate type signatures of the three rewriting steps:

val taylor : SpecIR -> SpecIR list
val rr : ProgIR -> ProgIR list
val simplify : ProgIR -> ProgIR list

Why the difference? In truth, engineering challenges. The egg-based rewriters rr and simplify are newer and an important part of the Pareto-Herbie (Pherbie) design from 2021. On the other hand, the first version of Herbie included taylor as far back as 2014. Clearly, ProgIR is richer than SpecIR since it contains number format information. For this reason, both rewriting and precision tuning and performed in rr (see Pareto-Herbie) and a shim is required for taylor.

But in light of a recent conversation I had with a fellow PLSE member, I hypothesize that this architecture is likely incorrect, or at the least, problematic. Of course, we could dream of the “sufficiently smart” numerical compiler that resembles the following architecture:

SpecIR, error bound ~~~~~~ magic ~~~~~~> C, machine code, etc.

But just as a compiler achieves its goal via numerous lowering passes, so too should a system like Herbie. In fact, Herbie really is just a messy version of this compiler taking a mathematical specification and number representations producing floating-point code. In light of these observations, I propose rearchitecting Herbie’s improvement loop. Expanding on the numerical compiler diagram, we would implement

SpecIR ~~~~~~ rewrite ~~~~~~> SpecIR ~~~~~~ lowering ~~~~~~> ProgIR

There are two important phases:

rewrite: either equivalent rewrites or approximations producing a program over real numbers
lowering: assignment of rounding operations, e.g. precision tuning, and operator selection, and other decisions required for finite-precision computation.

Of course, just as in Herbie’s current architecture, we use this process potentially multiple times with expression selection at the beginning and pruning at the end. Therefore, the whole process looks something like the following:

Expression selection
Local error analysis
Lifting from ProgIR to SpecIR
Rewriting
1. Taylor polynomial approximation ("taylor")
2. equivalent (real-number semantics) rewrites ("rr")
Lowering from SpecIR to ProgIR
1. operator selection, e.g. reciprocal vs. division
2. precision tuning
Pruning

How (4) and (5) interact is an open question. An interesting proposal would be to seed an egraph with the starting subexpressions from (3) and the approximations from (4a), run rewrites from (4b), and extract from the same egraph in (5a) and possibly (5b). Additionally, it is unclear if separating (4) and (5) will prevent us from finding certain rewrites.

The new design requires that egg be used for both SpecIR and ProgIR which suggests splitting the rules into those over SpecIR and those over ProgIR. The majority of rewriting will be done over the reals, but a small set of rewrites may be used in (5) for operator selection, e.g. posit-quire operations. Constant folding should only be done over SpecIR since these programs are over the reals. If rewriting ProgIR is done in a egraph, it will be minimal (no constant folding) and will just be for extraction via Herbie’s cost model.

Based on this new design, we have a number of insights and engineering improvements:

if we wish to rewrite expressions over the reals, then the rewrites should be performed over real number programs
operator (implementation) selection is precision tuning; choosing to approximate 1/x (real number program) with recip_f64(x) is both a choice of syntax and rounding.
egg becomes a pure syntactic rewriter; it need not know about representations; Herbie’s egg interface can be slimmed down, e.g. no rule expansion and a simpler EggIR.

To summarize, I propose a new design for Herbie’s improvement loop. Problematic expressions should be identified via the normal expression selection and local error analysis phases. These expressions should be lifted to purely real-number programs and passed through two phases. The first phase applies equivalent rewrites (over the real numbers) and polynomial approximations. The second phase lowers to actual floating-point operations via operator selection and precision tuning. Finally, the same pruning operation is performed to shrink the set of useful alternative implementations.

A One-Year Retrospective on Minim

2021-10-02T00:00:00+00:00

September 20th marked the one year anniversary of Minim’s creation. It initially started as an pandemic-fueled side project based on an online guide of making a small Lisp language. Since then, Minim has undergone significant changes in design and scope. Today Minim is a fully-interpreted language, complete with a small standard library, syntax macros (not quite R6RS compliant), and many more features. To celebrate the past year of development, I have decided to describe my experience designing and implementing the language.

The Early Days

The initial versions of Minim were hacky at best. The goal of development during Minim 0.1.x was expanding the number of available types. By version 0.1.2, the language supported booleans, exact and inexact numbers, symbols, strings, pairs, lists, user-defined functions, hash tables, vectors, and sequences. Nearly all of these types since then have undergone structural changes. The set of procedures to create and modify these procedures was minimal at the time, just enough to be useful.

The worst feature of this era was the owner-reference system that kept track of objects, so it could free resources when objects went out of scope. Initially, new copies were created every time objects were referenced, but with an “improvement”, objects could be set as owners of their data or references of another object’s data. This required unnecessary amounts of copying objects, annoying equality checks, and hours of debugging segmentation faults. What I didn’t know at the time was Scheme implementations usually never copy immutable objects and use garbage collectors to know when to free memory. This issue was not fixed until I finally added my own garbage collector in version 0.3.0.

Expansion

With a plethora of types in the language, the focus for Minim 0.2.x shifted to increasing the number of procedures available. First, file reading was added to support a standard library that was not hard-coded in C. In these versions, file reading occured separately from the existing parser, tracking parentheses and ignoring comments to extract a hopefully parsable string. Initially, reading source code was rough since the reader would spontaneously fail, leading to hours of searching for and fixing bugs, and often to my frustration, the creation of new bugs. In the same release, I added errors with backtracing and syntax with source locations which proved to be helpful when modifying the standard library.

Minim 0.2.1 and 0.2.2 were expansions of the math and list libraries. I put as many of the new procedures to the standard library as I could rather to the already large set of primitive procedures. I also added arity and type errors so that these new procedures could reject arguments upfront and print a more descriptive error. By then, issues with parsing motivated me to remake the parser from scratch which turned out to be a huge improvement for the language. Since then the parser has barely changed and no longer hinders development.

Standardization

Development of Minim 0.3.x felt quite different from the previous versions. Primarily, I began reading the existing standards on Scheme including R4RS, R5RS, and R6RS. For the most part, I ignored these standards before because I wasn’t interested in sticking to an existing blueprint for Minim. However, my design choices began to feel flawed and haphazard and the way forward was becoming unclear. Implementing procedures and features described in the standards became the main goal during this era of development.

The first few changes were performance-related: the addition of a garbage collector and the implementation of tail calling. I detailed the design of the garbage collector in my previous blog post which you can read here. These changes caused a significant slow down in performance, but accelerated the pace of development since I didn’t have to worry about memory management, aside from a few bugs that needed patching.

With garbage collection in place, I turned to important features of any Scheme language: quoting and syntax macros. Syntax macros turned out to be quite the headache; my intial implementation continually caused Minim to crash. I eventually reimplemented macros from scratch, but I found out that even my most recent attempt is still not compliant with the standard. As of today, I have considered that part of the project to be “good enough”, but fixing it is still definitely on my to-do list.

After the syntax macro mess, I moved on and added types like characters, records, and file ports; additional procedures for strings and lists; and more features like multi-valued expressions, continuations, and multi-signature functions with case-lambda.

Today

And with that, we have finally reached the present day. As you have just read, the development of Minim has been a long and winding path, from basic and hacky beginnings to a much more robust implementation. If there’s anything I’ve learned, it’s that a small idea can be fully realized with time and effort.

As for those following my footsteps: I’d highly recommend reading standards for an existing language. You might not be able to implement everything, but I found that following the Scheme standards made Minim a much more robust and sensical language. Standards are an important part of language design no matter how long and dense they might seem.

The Future

What’s next for Minim? As I’ve mentioned before, syntax macros are not Scheme-compliant, but there are many more features of Minim that have deviated from the standard, usually because of my naive choices. A couple examples include the use of def instead of define and the syntax of function definitions. I am still weighing whether or not to resolve these design differences.

More recently, I have implemented caching for Minim source code files: syntax macros are applied and the resulting desguared code is emitted for later use. Testing shows that this decreases the number of expressions executed on boot significantly. On top of caching, I have implemented constant folding since certain expressions can be resolved before runtime. In particular, my implementation of case-lambda is egregious in its use of constant expressions for resolving arity.

My target goal is to implement a native-code compiler for Minim, but having just begun a compilers course this quarter, I have a feeling this may be a long ways away. Until then, I will focus on implementing more of the standard. Please check out the source repository for Minim to see my progress and give the language a try!

Garbage Collection in Minim

2021-07-14T00:00:00+00:00

Currently, one of the key issues in Minim is the lack of garbage collection. To handle precise memory management, I implemented a weird owner-and-reference system that has been quite tiresome to keep track of. To solve this, I am developing a garbage collector for the Minim, and I recently merged a working version into the project for testing. The garbage collector greatly simplifies the code base by getting rid of calls to free() and removes most instances of copying objects. In terms of performance, Minim is now much slower, anywhere between 2x and 4x. Much of this slow down is beacuse allocated objects are no longer freed precisely, and there is additional overhead from tracking allocations.

The Minim GC is a generational, conservative, mark-and-sweep garbage collector implemented in C based on the Tiny Garbage Collector, a minimal garbage collector. It expands on the TGC by separating allocations into two generations, young and old. The younger generation is marked and swept every cycle, by default every time new allocations total 8 MB in size. Any surviving allocations are moved to the older generation. The older generation is marked and swept every 15 cycles.

The stack is swept from the “bottom” of the stack, provided during initialization, to a local address within the sweeping function. A neat trick to always forcing pointer addresses on the stack no matter the optimization level is to use the setjmp(jmp_buf env) function from setjmp.h which saves the values of registers to set a jump point. It makes sense that this function exists, but I honestly didn’t realize it could be used for this mechanism.

Like the TGC, the Minim GC provides a way to associate a destructor with an allocated block in case memory allocated outside of the garbage collector needs to be freed upon sweeping. However, it also provides a way to associate a “marking” function with each memory block. These functions precisely mark internal pointers so that the garbage collector doesn’t naively mark every possible pointer within the memory block. These functions can cause issues if a developer changes a struct but not its respective marking function; I ran into this issue on a few occasions when switching Minim to use the garbage collector. In addition, atomic allocation macros are provided to allocate data not containing any pointers.

Although naive, the Minim GC has made working with the Minim code base much easier since I don’t have to spend hours dealing with segmentation faults. If I had to make Minim all over again, I definitely would have stuck with using a garbage collector from the beginning. It eliminates my owner-and-reference system that was causing quite a headache when dealing with copies of objects. Now the same object can be in a list, hash table, and vector, all at the same time, since immutability is respected.

As for perfomance issues, I managed to claw back almost all of the slow down by adding tail call optimization which ensures the amount of allocated memory and the size of the stack remains quite small. This garbage collector along with tail call optimization, quasiquoting, and syntax macros will be in the next release, and is currently in the main branch.

Developing Minim

2021-03-07T00:00:00+00:00

Recently, I released version 0.2.0 of Minim, my hobby-language that I’ve been developing since last fall. It’s inspired by my time working with Racket (now more than a year). Despite having no formal experience with programming languages, I’ve made quite a bit of progress, although I still have lots to learn. Here are a few of my thoughts from developing the language.

From Static to Dynamic Types

My language of choice for developing Minim is C. Not the best choice for a smooth experience, but it’s been quite interesting. The most annoying part is obviousing dealing with pointers; nearly every bug I’ve run into is a segmentation fault. The most interesting concept, however, is the interaction of Minim’s dynamic type system with C’s static type system.

Mainly, how do you define a dynamic type system in a language that is statically typed, especially without classes like in C++ or Java? The best solution that I’ve found is void pointers and enums. Intuitively, store a void pointer and an enumerated value such that the value hints at what is stored at the pointer. With this method, we can store anything from an integer to a string in a unified object, or the MinimObject in the case of the Minim language. Minim can easily verify the type of inputs when invoking a function and can throw an error when necessary.

Of course, all of this is abstracted away when running Minim. The user can still identify the type of an object with procedures such as number? or string?, but they need not worry about its initialization, storage, and deletion. An added benefit is multiple types of objects can be stored in the same list such as (list 1 'a "str"). This concept may be trivial, but it is quite interesting to see that a simple construct can do away with the restrictions of statically-typed code.

Parsing, Parsing, Parsing

The worst experience, by far, has been figuring out how to parse an input string without any fancy libraries. For the most part, if we can munge the string, it’s not too awful. Split the string by looking for spaces and parentheses/brackets. New lines and tabs are really just spaces in Minim, so they’re not too useful like Python. All we have to do is keep track of quoted strings, Lisp-style quotes, and comments.

Unfortunately with version 0.2.0, Minim supports executing expressions from a file, and errors without syntax information are quite unhelpful. Therefore, we need to store the syntax information of where expressions and functions are located. We can’t munge the string before parsing since we lose information about row and column numbers of characters. We must (a) track row and column information, (b) ignore distracting whitespace, and (c) still tokenize words. I managed to pull it with a reader thats around 200 lines long and a parser of similar length, but it’s quite a mess.

Nevertheless, the results have been impressive. Backtraces from errors are quite detailed and they print out “stack frames” with the following format: : . It’s crudeness will definitely be a problem in the future, but for now it works.

Problems to Come

The most broken part of Minim is the blurry distinction between owners and references, and the lack of separation between mutable and immutable objects.

The first problem borrows a concept from C++. In brief, certain objects are the original owners of their information. It’s better to pass a reference of that object to a function rather than the entire copy since it takes less space and is less cumbersome than pointers that are dominant in C. Minim also implements this system, since it’s less resource intensive to not copy lists every time we use them for read-only purposes. However, without a static type system, the use for this is more subtle. Every built-in function in Minim needs to have two separate cases for owners and references, and this parallel strategy causes many issues. It’s not ideal, but it seems to work.

The second problem is a distinction that Racket makes clear: there are immutable objects and there are mutable objects. In Racket, the two different types of objects each have their own set of procedures. Initially, I chose not to care since it seemed cumbersome to have two sets of procedures. Invoking an in-place update of any hash table seemed reasonable, but it’s problematic with function calls and references. There are a number of examples that I can think of that will break Minim.

Conclusion and Future Work

Although, I spent much of this blog talking about what is bad, there has been a lot of good. Most importantly, I’ve learned quite a bit about developing something as complex as a “lightweight” programming language. As of the time I’m writing this, the repository is well over 10,000 lines of code and the language contains over 120 procedures and numerous types like lists, strings, hash tables, vectors, and more.

In the next update, there will be considerably more procedures in the “standard library” I’ve been developing. They will mostly include math functions like gcd and lcm. More list procedures are a must since lists form the backbone of any Lisp/Scheme language. Additionally, I need to resolve the issues mentioned above as well as making procedures proper closures (storing the environment from which they were created).

This blog was long-winded mostly because there was a lot to talk about. I hope to write more about Minim in the near future as it develops from the small fledgling it is today to a language that is full and robust. Stay tuned for more.

Publishing the “generic-flonum” package

2021-01-21T00:00:00+00:00

As a side effect of recent work, I created an alternate MPFR interface in Racket. I posted in the previous blog, that I was planning on extracting that code into a package for public use. As of today, that library has officially been cleaned up, documented, and published in the Racket Package Index. To try it out, install Racket and run raco pkg install generic-flonum. Here is an excerpt from the documentation.

While the math/bigfloat interface is sufficient for most high-precision computing, it is lacking in a couple areas. Mainly, it does not properly emulate subnormal arithmetic or allow the exponent range to be changed.

Normally, neither of these problems cause concern. For example, if a user intends to find an approximate value for some computation on the reals, then subnormal arithmetic or a narrower exponent range is not particular useful. However, if a user wants to know the result of a computation specifically in some format, say half-precision, then math/bigfloat is insufficient.

At half-precision, (exp -10) and (exp 20) evaluate to 4.5419e-05 and +inf.0, respectively. On the other hand, evaluating (bfexp (bf -10)) and (bfexp (bf -10)) with (bf-precision 11) returns (bf "4.5389e-5") and (bf "#e4.8523e8"). While the latter results are certainly more accurate, they do not reflect proper behavior in half-precision. The standard bigfloat library does not subnormalize the first result (no subnormal arithmetic), nor does it recognize the overflow in the second result (fixed exponent range).

This library fixes the issues mentioned above by automatically emulating subnormal arithmetic when necessary and providing a way to change the exponent range. In addition, the interface is quite similar to math/bigfloat, so it will feel familiar to anyone who has used the standard bigfloat library before. There are also a few extra operations from the C math library such as gflfma, gflmod, and gflremainder that the bigfloat library does not support.

See math/bigfloat for more information on bigfloats.

To read more of the documentation, please visit here. The source code for the package can be found at this repository. In the future, I plan on integrating this into the FPBench reference interpreter, so we can finally emulate subnormal arithmetic for various floating-point formats correctly. This has been an outstanding issue for a long time.

First Entry

2020-11-11T00:00:00+00:00

Note: This is my first entry for my blog, a compendium of my thoughts, articles, references, etc. Is blog even the correct word? I’m still trying to figure out what exactly this will. Hopefully, I’ll be able to write here often.

Life Update: I’m currently in the first quarter of my sophomore year and things are going fairly well considering the state of the world. My work in Herbie has started moving much faster with my exploration of multi-precision expressions and cost/accuracy Pareto curves. Today, I began work on a generic IEEE-754 floating-point plugin for Herbie which will be very useful in the future.

One of the problems I had run into was the lack of subnormals in Racket’s bigflonum library - the language’s MPFR interface. Fortunately, the FFI procedures buried in the math library overall proved to be quite useful, so now I have a parallel set of operations that correctly handles subnormals and checks the range on the result. I emphasize the fact that it is parallel, since I still have access to the original, arguably more practical, set of operations that allow floats to have exponents somewhere on the order of 2^15 and ignore subnormals entirely. On a side note - it would be excellent to see support for limiting the exponent size and doing subnormal arithmetic in Racket’s library; however, I may be the only person in the world currently in need of such features, so… maybe not?

Another problem is the mapping of ordinals since such procedures are done on the Racket side of things, but they have no regard for subnormal numbers and seem to be causing issues on the Herbie side of things. Now that I think about it, it seems that all my problems have to do with subnormal floating point numbers. Go figure.

I see some possibility in separating out this new set of MPFR bindings and creating a generic floating point package that allows a user to specify a float with a given significand and exponent. It would be useful in FPBench since some of the tooling is very broken.

On a separate note: My website is quite new. It’s only been up for one week!! I’ve been working on it recently, and things are looking good. I began using Racket to generate my html pages, so everything is much easier to work with.

Anyways, that’s all my thoughts for today. Maybe next time I’ll write something more technical rather than rambly… Stay tuned for more!!