C--
C-- (pronounced see minus minus) is a C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-level languages rather than written by human programmers. Unlike many other intermediate languages, its representation is plain ASCII text, not bytecode or another binary format.[1][2]
Paradigm | imperative |
---|---|
Designed by | Simon Peyton Jones and Norman Ramsey |
First appeared | 1997 |
Typing discipline | static, weak |
Website | https://www.cs.tufts.edu/~nr/c--/index.html |
Influenced by | |
C |
There are two main branches of C--. One is the original C-- branch, with the final version 2.0 released in May 2005.[3] The other is the Cmm fork actively used by the Glasgow Haskell Compiler as its intermediate representation.[4]
Design
C-- is a "portable assembly language", designed to ease the task of implementing a compiler which produces high quality machine code. This is done by having the compiler generate C-- code, delegating the harder work of low-level code generation and optimisation to a C-- compiler.
Work on C-- began in the late 1990s. Since writing a custom code generator is a challenge in itself, and the compiler back ends available to researchers at that time were complex and poorly documented, several projects had written compilers which generated C code (for instance, the original Modula-3 compiler). However, C is a poor choice for functional languages: it does not guarantee tail call optimization, or support accurate garbage collection or efficient exception handling. C-- is a simpler, tightly-defined alternative to C which does support all of these things. Its most innovative feature is a run-time interface which allows writing of portable garbage collectors, exception handling systems and other run-time features which work with any C-- compiler.
The language's syntax borrows heavily from C. It omits or changes standard C features such as variadic functions, pointer syntax, and aspects of C's type system, because they hamper certain essential features of C-- and the ease with which code-generation tools can produce it.
The name of the language is an in-joke, indicating that C-- is a reduced form of C, in the same way that C++ is basically an expanded form of C. (In C-like languages, "--" and "++" are operators meaning "decrement" and "increment".)
The first version of C-- was released in April 1998 as a MSRA paper,[1] accompanied by a January 1999 paper on garbage collection.[2] A revised manual was posted in HTML form in May 1999.[5] Two sets of major changes proposed in 2000 by Norman Ramsey ("Proposed Changes") and Christian Lindig ("A New Grammar") lead to C-- version 2, which was finalized around 2004 and officially released in 2005.[3]
Type system
The C-- type system is deliberately designed to reflect constraints imposed by hardware rather than conventions imposed by higher-level languages. In C--, a value stored in a register or memory may have only one type: bit vector. However, bit vector is a polymorphic type and may come in several widths, e.g., bits8, bits32, or bits64. A separate 32-or-64 bit family of floating-point types is supported. In addition to the bit-vector type, C-- also provides a Boolean type bool, which can be computed by expressions and used for control flow but cannot be stored in a register or in memory. As in an assembly language, any higher type discipline, such as distinctions between signed, unsigned, float, and pointer, is imposed by the C-- operators or other syntactic constructs in the language.
C-- version 2 removes the distinction between bit-vector and floating-point types. Programmers are allowed to annotate these types with a string "kind" tag to distinguish, among other things, a variable's integer vs float typing and its storage behavior (global or local). The first part is useful on targets that have separate registers for integer and floating-point values. In addition, special types for pointers and the native word is introduced, although all they do is mapping to a bit vector with a target-dependent length.[3]:10 C-- is not type-checked, nor does it enforce or check the calling convention.:28
Implementations
The specification page of C-- lists a few implementations of C--. The "most actively developed" compiler, Quick C--, was abandoned in 2013.[6]
Haskell
A C-- dialect called Cmm is the intermediate representation for the Glasgow Haskell Compiler.[7] GHC backends are responsible for further transforming C-- into executable code, via LLVM IR, slow C, or directly through the built-in native backend.[8]
Some of the developers of C--, including Simon Peyton Jones, João Dias, and Norman Ramsey, work or have worked on the Glasgow Haskell Compiler. Work on GHC has also led to extensions in the C-- language, forming the Cmm dialect. Cmm uses the C preprocessor for ergonomics.[4]
Despite the original intention, GHC does perform many of its generic optimizations on C--. As with other compiler IRs, GHC allows for dumping the C-- representation for debugging.[9] Target-specific optimizations are, of course, performed later by the backend.
References
- Nordin, Thomas; Jones, Simon Peyton; Iglesias, Pablo Nogueira; Oliva, Dino (1998-04-23). "The C– Language Reference Manual". Cite journal requires
|journal=
(help) - Reig, Fermin; Ramsey, Norman; Jones, Simon Peyton (1999-01-01). "C–: a portable assembly language that supports garbage collection". Cite journal requires
|journal=
(help) - Ramsey, Norman; Jones, Simon Peyton. "The C-- Language Specification, Version 2.0" (PDF). Retrieved 11 December 2019.
- GHC Commentary: What the hell is a .cmm file?
- Nordin, Thomas; Jones, Simon Peyton; Iglesias, Pablo Nogueira; Oliva, Dino (1999-05-23). "The C– Language Reference Manual".
- "C-- Downloads". www.cs.tufts.edu. Retrieved 11 December 2019.
- "An improved LLVM backend".
- GHC Backends
- Debugging compilers with optimization fuel
External links
- Archive of old official website (cminusminus.org)
- Quick C-- code archive (the reference implementation)