Off-side rule
A computer programming language is said to adhere to the off-side rule if blocks in that language are expressed by their indentation.[1][2] The term was coined by Peter J. Landin, possibly as a pun on the offside rule in football. This is contrasted with free-form languages, notably curly-bracket programming languages, where indentation is not meaningful and indent style is only a matter of convention and code formatting.
Definition
Peter J. Landin, in an article called "The Next 700 Programming Languages", defined the off-side rule thus: "Any non-whitespace token to the left of the first such token on the previous line is taken to be the start of a new declaration."[3]
Code examples
The following is an example of indentation blocks in Python. The colons are part of the Python language syntax for readability;[4] they are not necessary to implement the off-side rule.
In Python, the rule is taken to define the boundaries of statements rather than declarations.
def is_even(a: int) -> bool:
"""Determine whether number 'a' is even."""
if a % 2 == 0:
print('Even!')
return True
print('Odd!')
return False
Python also suspends the off-side rule within brackets. A statement within brackets continues until its brackets match (or mismatch):
{
"this": True,
"that": False,
"them": [1, 2,
3, 4, 5,
6, 7, 8,
9, 10, 11, 23]
}
In this dictionary, the keys are indented, and a list is split between two lines.
Implementation
The off-side rule can be implemented in the lexical analysis phase, as in Python, where increasing the indentation results in the lexer outputting an INDENT token, and decreasing the indentation results in the lexer outputting a DEDENT token.[5] These tokens correspond to the opening brace {
and closing brace }
in languages that use braces for blocks, and means that the phrase grammar does not depend on whether braces or indentation are used. This requires that the lexer hold state, namely the current indentation level, and thus can detect changes in indentation when this changes, and thus the lexical grammar is not context-free – INDENT/DEDENT depend on the contextual information of previous indentation level.
Alternatives
The primary alternative to delimiting blocks, popularized by C, is to ignore whitespace and mark blocks explicitly with curly brackets (i.e., {
and }
) or some other delimiter. While this allows for more freedom – the developer might choose not to indent small pieces of code like the break and continue statements – sloppily indented code might lead the reader astray.
Lisp and other S-expression-based languages do not differentiate statements from expressions, and parentheses are enough to control the scoping of all statements within the language. As in curly bracket languages, whitespace is mostly ignored by the reader (i.e., the read function). Whitespace is used to separate tokens.[6] The explicit structure of Lisp code allows it to perform automatic indentation, which acts as a visual cue for human Lisp readers.
Another alternative is for each block to begin and end with explicit keywords. For example, in ALGOL 60 and its descendant Pascal blocks start with keyword begin
and end with keyword end
. In some languages (but not Pascal), this means that newlines are important (unlike in curly brace languages), but the indentation is not.
In BASIC and Fortran, blocks begin with the block name (such as IF
) and end with the block name prepended with END
(e.g., END IF
). In Fortran, each and every block can also have its own unique block name, which adds an additional level of explicitness to lengthy code. ALGOL 68 and the Bourne shell (sh, and bash) are similar, but the ending of the block is usually given by the name of the block written backward (e.g., case
starts a switch statement and it spans until the matching esac
; similarly conditionals if
...then
...[elif
...[else
...]]fi
or for loops for
...do
...od
in ALGOL68 or for
...do
...done
in bash).
An interesting variation on this is made by Modula-2, a Pascal-like language which does away with the difference between one and multiline blocks. This allows the block opener ({
or BEGIN
) to be skipped for all but the function level block, requiring only a block terminating token (}
or END
). It also fixes dangling else. Custom is for the end
token to be placed on the same indentation level as the rest of the block, giving a blockstructure that is very readable.
One advantage to the Fortran approach is that it improves readability of long, nested, or otherwise complex code. A group of outdents or closing brackets alone does not provide any contextual cues as to which blocks are being closed, necessitating backtracking, and closer scrutiny while debugging. In addition, languages that allow a suffix for END-like keywords further improve such cues, such as continue
versus continue for x
. However, modern code editing applications often provide visual indicators and features such as folding to assist with these drawbacks.
Off-side rule languages
- Programming languages
- ABC
- Boo
- BuddyScript
- Cobra
- CoffeeScript
- Converge
- Curry
- Elixir (
, do:
blocks) - Elm
- F# (in early versions, when
#light
is specified; in later versions when#light "off"
is not[7]) - Genie
- Haskell[8] (only for
where
,let
,do
, orcase ... of
clauses when braces are omitted) - Inform 7
- ISWIM, the abstract language that introduced the rule
- LiveScript
- Miranda
- MoonScript[9][10]
- Nemerle
- Nim
- occam
- PROMAL
- Python
- Scheme, when using one of several Scheme Requests for Implementations, the latest of which is SRFI 119
- Spin
- XL
- Other languages
- Haml
- Make (build tool - tabs introduce commands in Make rules)
- reStructuredText[11]
- Sass
- Stylus
- YAML
- Pug (formerly Jade), see Comparison of web template engines
References
- Hutton, G. (Dec 6, 2012). "Parsing Using Combinators". In Davis, Kei; Hughes, John (eds.). Functional Programming: Proceedings of the 1989 Glasgow Workshop 21–23 August 1989, Fraserburgh, Scotland. Springer Science & Business Media. pp. 362–364. Retrieved September 3, 2015.
- Turner, D.A. (August 13, 2013). "Some History of Functional Programming Languages (Invited Talk)". In Loidl, Hans Wolfgang; Peña, Ricardo (eds.). Trends in Functional Programming: 13th International Symposium, TFP 2012, St Andrews, UK, June 12–14, 2012, Revised Selected Papers. Springer. p. 8. Retrieved September 3, 2015.
- Landin, P. J. (March 1966). "The next 700 programming languages" (PDF). Comm. ACM. 9 (3): 157–166. doi:10.1145/365230.365257.
- Python FAQ on colons
- Python Documentation, 2. Lexical analysis: 2.1.8. Indentation
- http://clhs.lisp.se/Body/02_adg.htm
- Syme, Don. "Detailed Release Notes for the F# May 2009 CTP Update and Visual Studio 2010 Beta1 releases". Archived from the original on 2019-01-21.
- The Haskell Report - Layout
- MoonScript, a language that compiles to Lua
- MoonScript 0.5.0 - Language Guide
- reStructuredText Markup Specification - Indentation
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.