Rust Creator Graydon Hoare Recounts the History of Compilers

Main Contents:

Rust Creator Graydon Hoare Recounts the History of Compilers – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Rust Creator Graydon Hoare Recounts the History of Compilers – InApps in today’s post !

Key Summary

Overview

Event: On March 26, Graydon Hoare, creator of Rust, spoke to University of British Columbia students in an introductory compiler construction class about the history of compilers.
Purpose: To demystify the gap between academic projects and industrial compilers, encouraging students to explore compiler design as a career.

Key Points from the Talk

Compiler Examples:
1. Clang: 2M lines of C++ (800k for Clang, 1.2M for LLVM), multi-organization team, permissive licensing, fast code, good diagnostics.
2. Swiftc: 530k lines of C++ plus 2M lines of Clang/LLVM, supports Swift compilation.
3. Rustc: 360k lines of Rust plus 1.2M LLVM lines, originally Mozilla-driven; Hoare led initial development.
4. GCC: 2.2M lines (mostly C/C++, 600k Ada), since 1987, large multi-organization team, generates fast code.
5. V8: 660k lines, JavaScript JIT compiler for Chrome/Node, balances runtime performance and compile time.
6. Chez Scheme: 87k lines, single-developer project, uses 27 intermediate representations (IRs).
7. Poly/ML: 44k lines, supports multicore hardware.
8. TREE-META: 184-line metacompiler from a 1967 U.S. Air Force project.
Historical Context:
1. 1940s: ENIAC programming via rewiring; Jean Bartik’s team stored instructions in memory.
2. 1949: High-level pseudo codes with software interpreters emerged.
3. 1950s: Grace Hopper’s A-0 System for UNIVAC, the first compiler, converted pseudo-code to machine language.
4. 1970s: Frances E. Allen co-authored “A catalog of optimizing transformations,” detailing key optimizations (e.g., inline, unroll, CSE, DCE, code motion, constant fold, peephole), achieving ~80% of best-case performance.
Compiler Tradeoffs:
1. Size and Complexity: Large compilers (e.g., GCC, Clang) justify high development costs with benefits like runtime performance, diagnostics, and hardware support, often using verbose languages for compatibility and familiarity.
2. Optimization Costs: Per Proebsting’s Law, compiler advances lag behind hardware (doubling compute power every 18 years vs. 18 months for transistors). Over-optimization can lead to excessive memory use, development effort, or slow compilation.
3. Language Impact: Proebsting’s Law is less applicable to languages with more abstractions but truer for low-level languages.
Smaller and Specialized Compilers:
1. Explored compilers like Glasgow Haskell Compiler, Franz Lisp, Manx Aztec C, 8cc, CakeML, Roslyn, Pharo/Cog, and Eclipse Compiler for Java.
2. Highlighted Mesa (1976–1981, Xerox PARC), a favorite for its influence.
3. JonesForth: 1,490 lines of Forth, 692-instruction virtual machine, showcasing minimal compiler design.
Interpreters and Metacompilers:
1. Discussed interplay between compilers and interpreters, citing Xavier Leroy: bytecode interpreters offer 1/4 the performance of native-code compilers at 1/20 the cost.
2. Truffle/Graal: An interpreter library yielding a compiler “for free” via partial evaluation, competitive with Oracle’s JVM.
3. Some compilers only compile specific functions, leaving others to interpreters.
Inspiration and Advice:
1. Hoare emphasized the diversity of 8,945 programming languages (per the Online Historical Encyclopaedia of Programming Languages).
2. Encouraged students to study past and present languages, explore compiler design, and “pick a future you like.”
3. Acknowledged omissions (e.g., Fortran, Algol, Cobol) due to time constraints.

Key Takeaways

Compilers vary widely in size, purpose, and complexity, from 184-line metacompilers to multi-million-line industrial tools.
Historical milestones (e.g., Hopper’s A-0, Allen’s optimizations) shaped modern compiler design.
Tradeoffs in optimization, language choice, and development effort are critical in compiler construction.
Hoare’s talk aimed to inspire curiosity and reduce intimidation, showing compiler design as an accessible, impactful field.

Balancing Costs and Benefits

“Compilers get big because the development costs are seen as justified by the benefits, at least to the people paying the bills,” Hoare explained, citing desired goals like better runtime performance and developer productivity (from things like diagnostics tools), as well as exploiting the capabilities of new hardware. The last bullet adds that some compilers are written in “verbose” languages “for all the usual reasons (compatibility, performance, familiarity).”

And the rest of the talk explores how those tradeoffs can be made, and if they should.

“In some contexts, ‘all the optimizations’ is too much,” explained one slide. “If you try to write a compiler performing every optimization, you’ll end up using too much memory or creating a compiler requiring far too much effort to develop and maintain — or that takes too long to compile!”

Hoare reminded the students of Proebsting’s Law, a sarcastic riff by University of Arizona computer science professor Todd A. Proebsting that posits advances in compilers will double our computing power every 18 years — an eternity compared to the 18 months it takes for chip manufacturers to double the number of transistors on their processors (“Moore’s Law”).

Hoare’s own take? Proebsting’s Law is less true if a language has more abstractions to eliminate — but unfortunately, it’s truer for lower-level languages.

Wandering Through Weirder Landscapes

Hoare also examined the smaller (660,000 lines of code) V8, the just-in-time JavaScript compiler in both Chrome and Node, which he describes as “always adjusting for the sweet spot of runtime performance vs. compile time.”

The Chez Scheme compiler uses 27 different IRs (a compiler’s internal “intermediate representation” structures) but is just 87,000 lines. Hoare adds that it’s mostly a single-developer project — made possible by its relatively small codebase. And the compiler for Poly/ML (an implementation of machine language that supports multicore hardware) is just 44,000 lines. Eventually, his presentation arrived at the 184-line TREE-META metacompiler from a 1967 U.S. Air Force research project at the Stanford Research Institute’s Augmentation Research Lab.

The “wander through a weird landscape” continued, with Glasgow Haskell Compiler, Franz Lisp, Manx Aztec C, and 8cc. There’s CakeML, Roslyn, Pharo/Cog, and the Eclipse Compiler for Java. There’s a slide for the compiler for the “highly-influential” language Mesa (which he notes is one of his favorites) developed at Xerox PARC between 1976 and 1981.

And that led him to a discussion about how compilers interact with interpreters — and a quick history of computers.

It starts with the 1940s-era ENIAC, where “programming” actually involved re-wiring until a team lead by Jean Bartik began storing instructions in memory. 1949 saw the arrival of high-level pseudo codes with software interpreters, and soon Grace Hopper was converting pseudo-code directly into machine language for the UNIVAC with her A-0 System, which was the first compiler.

Hoare also reminded the students of the pioneering work of Frances E. Allen, whose 45-year career at IBM included work on the compiler-optimization team for IBM’s “Harvest” supercomputer, installed at the National Security Agency.

February 1962 image of IBM HARVEST computer

In the early 1970s she co-authored “A catalog of optimizing transformations,” with John Cooke, a paper that aimed to “systematize the potpourri of optimizing transformations that a compiler can make to a program,” describing these optimizations in detail:

Inline
Unroll (and vectorize)
CSE (common subexpression elimination)
DCE (dead code elimination)
Code Motion
Constant Fold
Peephole

Hoare added that many compilers do just these eight things and get about 80% of a best-case performance.

A Grand Finale

Hoare touched on metacompilers and discussed the tradeoffs of doing compilation versus interpretation with an appropriate quote from Xavier Leroy, a primary developer on OCaml. “As a cheap implementation device, bytecode interpreters offer 1/4 of the performance of optimizing native-code compilers at 1/20 of the implementation cost.”

He also includes a pithy observation about Truffle/Graal, an open-source library for building interpreters. “Write an interpreter with some machinery to help the partial evaluator, get a compiler for free,” he said. Now being maintained by Oracle, Hoare calls it “seriously competitive! Potential future Oracle JVM.”

Some compilers only compilesome functions, leaving the rest to be handled by the interpreter.

I missed lots of things. Only 60 minutes, sadly. I also skipped Fortran, Algol, Cobol, PL/I, Simula, everything related to HPC, databases, array languages, Clu, Dylan, Lustre, Mumps, Basic, Eiffel, lots I’d have loved to have time to cover. Had to pick, sorry!

— Graydon Hoare (@graydon_pub) March 28, 2019

For his grand finale, he showed the audience JonesForth, one developer’s educational implementation of Forth with a 692-instruction virtual machine and 1,490 lines of Forth for its compiler, debugger, and read-eval-print loop. “Forth, like Lisp, is nearly virtual machine code at the input,” he told the audience.

Hoare’s appreciation for language design is evident, and he left the students with an inspiring parting message. “There have been a lot of languages,” he said, citing the 8,945 identified by the Online Historical Encyclopaedia of Programming Languages dating back to the 18th century.

“Go study them: past and present! Many compilers are possible!” he urged the students. “Pick a future you like!”

Rate this post

Anh Hoang

Anh Hoang is Head of SEO Optimization at InApps Technology, ensuring that the message and research of InApps Technology reach the most people possible while adhering to our strict journalistic standards of excellence and integrity.