Fat binary
A fat binary (or multiarchitecture binary) is a computer executable program or library which has been expanded (or "fattened") with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.
The usual method of implementation is to include a version of the machine code for each instruction set, preceded by a single entry point with code compatible with all operating systems, which executes a jump to the appropriate section. Alternative implementations store different executables in different forks, each with its own entry point that is directly used by the operating system.
The use of fat binaries is not common in operating system software; there are several alternatives to solve the same problem, such as the use of an installer program to choose an architecture-specific binary at install time (such as with Android multiple APKs), selecting an architecture-specific binary at runtime (such as with Plan 9's union directories and GNUstep's fat bundles),[1][2] distributing software in source code form and compiling it in-place, or the use of a virtual machine (such as with Java) and Just In Time compilation.
Apollo
Apollo's compound executables
In 1988, Apollo Computer's Domain/OS SR10.1 introduced a new file type, "cmpexe" (compound executable), that bundled binaries for Motorola 680x0 and Apollo PRISM executables.[3]
Apple
Apple's fat binary
A fat-binary scheme smoothed the Apple Macintosh's transition, beginning in 1994, from 68k microprocessors to PowerPC microprocessors. Many applications for the old platform ran transparently on the new platform under an evolving emulation scheme, but emulated code generally runs slower than native code. Applications released as "fat binaries" took up more storage space, but they ran at full speed on either platform. This was achieved by packaging both a 68000-compiled version and a PowerPC-compiled version of the same program into their executable files. The older 68K code (CFM-68K or classic 68K) continued to be stored in the resource fork, while the newer PowerPC code was contained in the data fork, in PEF format.[4]
Fat binaries were larger than programs supporting only the PowerPC or 68k, which led to the creation of a number of utilities that would strip out the unneeded version. In the era of small hard drives, when 80 MB hard drives were a common size, these utilities were sometimes useful, as program code was generally a large percentage of overall drive usage, and stripping the unneeded members of a fat binary would free up a significant amount of space on a hard drive.
NeXTSTEP Multi-Architecture Binaries
Fat binaries were a feature of NeXT's NeXTSTEP/OPENSTEP operating system, starting with NeXTSTEP 3.1. In NeXTSTEP, they were called "Multi-Architecture Binaries". Multi-Architecture Binaries were originally intended to allow software to be compiled to run both on NeXT's Motorola 68k-based hardware and on Intel IA-32-based PCs running NeXTSTEP, with a single binary file for both platforms. It was later used to allow OPENSTEP applications to run on PCs and the various RISC platforms OPENSTEP supported. Multi-Architecture Binary files are in a special archive format, in which a single file stores one or more Mach-O subfiles for each architecture supported by the Multi-Architecture Binary. Every Multi-Architecture Binary starts with a structure (struct fat_header) containing two unsigned integers. The first integer ("magic") is used as a magic number to identify this file as a Fat Binary. The second integer ("nfat_arch") defines how many Mach-O Files the archive contains (how many instances of the same program for different architectures). After this header, there are nfat_arch number of fat_arch structures (struct fat_arch). This structure defines the offset (from the start of the file) at which to find the file, the alignment, the size and the CPU type and subtype which the Mach-O binary (within the archive) is targeted at.
The version of the GNU Compiler Collection shipped with the Developer Tools was able to cross-compile source code for the different architectures on which NeXTStep was able to run. For example, it was possible to choose the target architectures with multiple '-arch' options (with the architecture as argument). This was a convenient way to distribute a program for NeXTStep running on different architectures.
It was also possible to create libraries (e.g. using libtool) with different targeted object files.
Mach-O and Mac OS X
Apple Computer acquired NeXT in 1996 and continued to work with the OPENSTEP code. Mach-O became the native object file format in Apple's free Darwin operating system (2000) and Apple's Mac OS X (2001), and NeXT's Multi-Architecture Binaries continued to be supported by the operating system. Under Mac OS X, Multi-Architecture Binaries can be used to support multiple variants of an architecture, for instance to have different versions of 32-bit code optimized for the PowerPC G3, PowerPC G4, and PowerPC 970 generations of processors. It can also be used to support multiple architectures, such as 32-bit and 64-bit PowerPC, or PowerPC and x86, or x86-64 and ARM64.[5]
Apple's Universal binary
In 2005, Apple announced another transition, from PowerPC processors to Intel x86 processors. Apple promoted the distribution of new applications that support both PowerPC and x86 natively by using executable files in Multi-Architecture Binary format. Apple calls such programs "Universal applications" and calls the file format "Universal binary" as perhaps a way to distinguish this new transition from the previous transition, or other uses of Multi-Architecture Binary format.
Universal binary format was not necessary for forward migration of pre-existing native PowerPC applications; from 2006 to 2011, Apple supplied Rosetta, a PowerPC (PPC)-to-x86 dynamic binary translator, to play this role. However, Rosetta had a fairly steep performance overhead, so developers were encouraged to offer both PPC and Intel binaries, using Universal binaries. The obvious cost of Universal binary is that every installed executable file is larger, but in the years since the release of the PPC, hard-drive space has greatly outstripped executable size; while a Universal binary might be double the size of a single-platform version of the same application, free-space resources generally dwarf the code size, which becomes a minor issue. In fact, often a Universal-binary application will be smaller than two single-architecture applications because program resources can be shared rather than duplicated. If not all of the architectures are required, the lipo and ditto command-line applications can be used to remove versions from the Multi-Architecture Binary image, thereby creating what is sometimes called a thin binary.
In addition, Multi-Architecture Binary executables can contain code for both 32-bit and 64-bit versions of PowerPC and x86, allowing applications to be shipped in a form that supports 32-bit processors but that makes use of the larger address space and wider data paths when run on 64-bit processors.
In versions of the Xcode development environment from 2.1 through 3.2 (running on Mac OS X 10.4 through Mac OS X 10.6), Apple included utilities which allowed applications to be targeted for both Intel and PowerPC architecture; universal binaries could eventually contain up to four versions of the executable code (32-bit PowerPC, 32-bit x86, 64-bit PowerPC, and 64-bit x86). However, PowerPC support was removed from Xcode 4.0 and is therefore not available to developers running Mac OS X 10.7 or greater.
In 2020, Apple announced another transition, this time from Intel x86 processors to Apple silicon. To smooth the transition Apple added support for the Universal 2 binary format. This allows the creation of binaries that run natively on both 64-bit Intel and 64-bit Apple silicon (an AArch64 variant).
DOS
Combined COM-style binaries for CP/M-80 and DOS
CP/M-80, MP/M-80, Concurrent CP/M, CP/M Plus and Personal CP/M-80 executables for the Intel 8080 (and Z80) processor families use the same .COM file extension as DOS-compatible operating systems for Intel 8086 binaries.[nb 1] In both cases programs are loaded at offset +100h and executed by jumping to the first byte in the file. As the opcodes of the two processor families are not compatible, attempting to start a program under the wrong operating system leads to incorrect and unpredictable behaviour.
In order to avoid this, some methods have been devised to build fat binaries which contain both a CP/M-80 and a DOS program, preceded by initial code which is interpreted correctly on both platforms. The methods either combine two fully functional programs each built for their corresponding environment, or add stubs which cause the program to exit gracefully if started on the wrong processor. For this to work, the first few instructions in the .COM file have to be valid code for both 8086 and 8080 processors, which would cause the processors to branch into different locations within the code. For example, the utilities in Simeon Cran's emulator MyZ80 start with the opcode sequence EBh, 52h, EBh. An 8086 sees this as a jump and reads its next instruction from offset +154h whereas an 8080 or compatible processor goes straight through and reads its next instruction from +103h. A similar sequence used for this purpose is EBh, 03h, C3h.[6][7]
Another method to keep a DOS-compatible operating system from erroneously executing .COM programs for CP/M-80 and MSX-DOS machines is to start the 8080 code with C3h, 03h, 01h, which is decoded as a "RET" instruction by x86 processors, thereby gracefully exiting the program, while it will be decoded as "JP 103h" instruction by 8080 processors and simply jump to the next instruction in the program.
Some CP/M-80 3.0 .COM files may have one or more RSX overlays attached to them by GENCOM.[8] If so, they start with an extra 256-byte header (one page). In order to indicate this, the first byte in the header is set to C9h, which works both as a signature identifying this type of COM file to the CP/M 3.0 executable loader, as well as a "RET" instruction for 8080-compatible processors which leads to a graceful exit if the file is executed under older versions of CP/M-80.
C9h is never appropriate as the first byte of a program for any x86 processor (it has different meanings for different generations,[nb 2] but is never a meaningful first byte); the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding incorrect operation.
Combined binaries for CP/M-86 and DOS
CP/M-86 and DOS do not share a common file extension for executables.[nb 1] Thus, it is not normally possible to confuse executables. However, early versions of DOS had so much in common with CP/M in terms of its architecture that some early DOS programs were developed to share binaries containing executable code. One program known to do this was WordStar 3.2x, which used identical overlay files in their ports for CP/M-86 and MS-DOS,[9] and used dynamically fixed-up code to adapt to the differing calling conventions of these operating systems at runtime.[9]
Digital Research's GSX for CP/M-86 and DOS also shares binary identical 16-bit drivers.[10]
Combined COM and SYS files
DOS device drivers start with a file header whose first four bytes are FFFFFFFFh by convention, although this is not a requirement.[11] This is fixed up dynamically by the operating system when the driver loads (typically in the DOS BIOS when it executes DEVICE statements in CONFIG.SYS). Since DOS does not reject files with a .COM extension to be loaded per DEVICE and does not test for FFFFFFFFh, it is possible to combine a COM program and a device driver into the same file[12][11] by placing a jump instruction to the entry point of the embedded COM program within the first four bytes of the file (three bytes are usually sufficient).[11] If the embedded program and the device driver sections share a common portion of code, or data, it is necessary for the code to deal with being loaded at offset +0100h as a .COM style program, and at +0000h as a device driver.[12] For shared code loaded at the "wrong" offset but not designed to be position-independent, this requires an internal address fix-up[12] similar to what would otherwise already have been carried out by a relocating loader, except for that in this case it has to be done by the loaded program itself; this is similar to the situation with self-relocating drivers but with the program already loaded at the target location by the operating system's loader.
Crash-protected system files
Under DOS, some files, by convention, have file extensions which do not reflect their actual file type.[nb 3] For example, COUNTRY.SYS[13] is not a DOS device driver,[nb 4] but a binary NLS database file for use with the CONFIG.SYS COUNTRY directive and the NLSFUNC driver.[13] The PC DOS and DR-DOS system files IBMBIO.COM and IBMDOS.COM are special binary images, not COM-style programs.[nb 4] Trying to load COUNTRY.SYS with a DEVICE statement or executing IBMBIO.COM or IBMDOS.COM at the command prompt will cause unpredictable results.[nb 3][nb 5]
It is sometimes possible to avoid this by utilizing techniques similar to those described above. For example, DR-DOS 7.02 and higher incorporate a safety feature developed by Matthias R. Paul:[14] If these files are called inappropriately, tiny embedded stubs will just display some file version information and exit gracefully.[15][14][16][13]
A similar protection feature was the 8080 instruction C7h ("RST 0") at the very start of Z-System language overlay files, which would result in a warm start (instead of a crash) under CP/M-80 if loaded inappropriately.[17]
In a remotely similar fashion, many (binary) file formats by convention include a 1Ah byte (ASCII ^Z) near the beginning of the file. This control character will be interpreted as "soft" end-of-file (EOF) marker when a file is opened in non-binary mode, and thus, under many operating systems (including RT-11, VMS, CP/M,[18][19] DOS,[20] and Windows[21]), it prevents "binary garbage" from being displayed when a file is accidentally typed at the console.
Linux
FatELF: Universal binaries for Linux
FatELF[22] is a fat binary implementation for Linux and other Unix-like operating systems. Technically, a FatELF binary is a concatenation of ELF binaries with some meta data indicating which binary to use on what architecture.[23] Additionally to the CPU architecture abstraction (byte order, word size, CPU instruction set, etc.), there is the advantage of binaries with support for multiple kernel ABIs and versions.
FatELF has several use-cases, according to developers:[22]
- Distributions no longer need to have separate downloads for various platforms.
- Separated /lib, /lib32 and /lib64 trees are not required anymore in OS directory structure.
- The correct binary and libraries are centrally chosen by the system instead of shell scripts.
- If the ELF ABI changes someday, legacy users can be still supported.
- Distribution of web browser plug ins that work out of the box with multiple platforms.
- Distribution of one application file that works across Linux and BSD OS variants, without a platform compatibility layer on them.
- One hard drive partition can be booted on different machines with different CPU architectures, for development and experimentation. Same root file system, different kernel and CPU architecture.
- Applications provided by network share or USB sticks, will work on multiple systems. This is also helpful for creating portable applications and also cloud computing images for heterogeneous systems.[24]
A proof-of-concept Ubuntu 9.04 image is available.[25] As of 25 April 2020, FatELF has not been integrated into the mainline Linux kernel.[26][27]
Windows
Fatpack
Although the Portable Executable format used by Windows does not allow assigning code to platforms, it is still possible to make a loader program that dispatches based on architecture. This is because desktop versions of Windows on ARM has support for 32-bit x86 emulation, making it a useful "universal" machine code target. Fatpack is a loader that demonstrates the concept: it includes a 32-bit x86 program that tries to run the executables packed into its resource sections one by one.[28]
Similar systems
The following approaches are similar to fat binaries in that multiple versions of machine code of the same purpose are provided in the same file.
Fat objects
GCC and LLVM do not have a fat binary format, but they do have fat object files for link-time optimization (LTO). Since LTO involves delaying the compilation to link-time, the object files must store the intermediate representation, but on the other hand machine code may need to be stored too (for speed or compatibility). An LTO object containing both IR and machine code is known as a fat object.[29]
Function multi-versioning
Even in a program or library intended for the same instruction set architecture, a programmer may wish to make use of some newer instruction set extensions while keeping compatibility with an older CPU. This can be achieved with function multi-versioning (FMV): versions of the same function are written into the program, and a piece of code decides which one to use by detecting the CPU's capabilities (such as through CPUID). Intel C++ Compiler, GNU Compiler Collection, and LLVM all have the ability to automatically generate multi-versioned functions.[30] This is a form of dynamic dispatch without any semantic effects.
Many math libraries feature hand-written assembly routines that are automatically chosen according to CPU capability. Examples include glibc, Intel MKL, and OpenBLAS. In addition, the library loader in glibc supports loading from alternative paths for specific CPU features.[31]
See also
Notes
- This isn't a problem for CP/M-86 style executables under CP/M-86, CP/M-86 Plus, Personal CP/M-86, S5-DOS, Concurrent CP/M-86, Concurrent DOS, Concurrent DOS 286, FlexOS, Concurrent DOS 386, DOS Plus, Multiuser DOS, System Manager and REAL/32 because they use the file extension .CMD rather than .COM for these files. (The .CMD extension, however, is conflictive with the file extension for batchjobs written for the command line processor CMD.EXE under the OS/2 and Windows NT operating system families.)
- On 8088/8086 processors, the opcode C9h is an undocumented alias for CBh ("RETF"), whereas it decodes as "LEAVE" on 80188/80186 and newer processors.
- This problem could have been avoided by choosing non-conflicting file extensions, but, once introduced, these particular file names were retained from very early versions of MS-DOS/PC DOS for compatibility reasons with (third-party) tools hard-wired to expect these specific file names.
- Other DOS files of this type are KEYBOARD.SYS, a binary keyboard layout database file for the keyboard driver KEYB under MS-DOS and PC DOS, IO.SYS containing the DOS BIOS under MS-DOS, and MSDOS.SYS, a text configuration file under Windows 95/MS-DOS 7.0 and higher, but originally a binary system file containing the MS-DOS kernel. However, MS-DOS and PC DOS do not provide crash-protected system files at all, and these file names are neither used nor needed in DR-DOS 7.02 and higher, which otherwise does provide crash-protected system files.
- This is the reason why these files have the hidden attribute set, so that they are not listed by default, thereby reducing the risk of being invoked accidentally.
References
- "PackagingDrafts/GNUstep". Fedora Project Wiki.
- "gnustep/tools-make: README.Packaging". GitHub.
- "Domain System Software Release Notes, Software Release 10.1" (PDF) (first printing ed.). Chelmsford, Massachusetts, USA: Apollo Computer Inc. December 1988. p. 2-16. Order No. 005809-A03. Archived (PDF) from the original on 2020-08-27. Retrieved 2020-08-17. (256 pages)
- Apple Computer (1997-03-11). "Creating Fat Binary Programs". Inside Macintosh: Mac OS Runtime Architectures. Archived from the original on 2004-03-07. Retrieved 2011-06-20.
- Apple Computer (2006-03-08). "Universal Binaries and 32-bit/64-bit PowerPC Binaries". Mac OS X ABI Mach-O File Format Reference. Archived from the original on 2009-04-04. Retrieved 2006-07-13.
- ChristW (2012-11-14) [2012-11-13]. Chen, Raymond (ed.). "Microsoft Money crashes during import of account transactions or when changing a payee of a downloaded transaction". The New Old Thing. Archived from the original on 2018-07-05. Retrieved 2018-05-19.
[…] byte sequence […] EB 03 C3 yy xx […] If you create a .COM file with those 5 bytes as the first ones […] you'll see 'JMP SHORT 3', followed by 3 garbage bytes. […] If you look at a Z80 disassembly […] that translates to 'EX DE,HL; INC BC;' […] The 3rd byte is 'JUMP' followed by the 16-bit address specified as yy xx […] you'll have a .COM file that runs on MS-DOS and […] CP/M […]
(NB. While the author speaks about the Z80, this sequence also works on the 8080 and compatible processors.) - Brehm, Andrew J. (2016). "CP/M and MS-DOS Fat Binary". DesertPenguin.org. Archived from the original on 2018-05-19. Retrieved 2018-05-19. (NB. While the article speaks about the Z80, the code sequence also works on the 8080 and compatible processors.)
- Elliott, John C.; Lopushinsky, Jim (2002) [1998-04-11]. "CP/M 3 COM file header". Seasip.info. Archived from the original on 2016-08-30. Retrieved 2016-08-29.
- Necasek, Michal (2018-01-30) [2018-01-28, 2018-01-26]. "WordStar Again". OS/2 Museum. Archived from the original on 2019-07-28. Retrieved 2019-07-28.
[…] The reason to suspect such difference is that version 3.2x also supported CP/M-86 (the overlays are identical between DOS and CP/M-86, only the main executable is different) […] the .OVR files are 100% identical between DOS and CP/M-86, with a flag (clearly shown in the WordStar 3.20 manual) switching between them at runtime […] the OS interface in WordStar is quite narrow and well abstracted […] the WordStar 3.2x overlays are 100% identical between the DOS and CP/M-86 versions. There is a runtime switch which chooses between calling INT 21h (DOS) and INT E0h (CP/M-86). WS.COM is not the same between DOS and CP/M-86, although it's probably not very different either. […]
- Lineback, Nathan. "GSX Screen Shots". Toastytech.com. Archived from the original on 2020-01-15. Retrieved 2020-01-15.
- Paul, Matthias R. (2002-04-11). "Re: [fd-dev] ANNOUNCE: CuteMouse 2.0 alpha 1". freedos-dev. Archived from the original on 2020-02-21. Retrieved 2020-02-21.
[…] FreeKEYB is […] a true .COM and .SYS driver (tiny model) in one. You can safely overwrite the first JMP, that's part of what I meant by "tricky header". […] you can replace the FFFFh:FFFFh by a 3-byte jump and a pending DB FFh. It works with MS-DOS, PC DOS, DR-DOS, and most probably any other DOS issue as well. […]
- Paul, Matthias R. (2002-04-06). "Re: [fd-dev] ANNOUNCE: CuteMouse 2.0 alpha 1". freedos-dev. Archived from the original on 2020-02-07. Retrieved 2020-02-07.
[…] Add a SYS device driver header to the driver, so that CTMOUSE could be both in one, a normal TSR and a device driver - similar to our FreeKEYB advanced keyboard driver. […] This is not really needed under DR DOS because INSTALL= is supported since DR DOS 3.41+ and DR DOS preserves the order of [D]CONFIG.SYS directives […] but it would […] improve the […] flexibility on MS-DOS/PC DOS systems, which […] always execute DEVICE= directives prior to any INSTALL= statements, regardless of their order in the file. […] software may require the mouse driver to be present as a device driver, as mouse drivers have always been device drivers back in the old times. These mouse drivers have had specific device driver names depending on which protocol they used ("PC$MOUSE" for Mouse Systems Mode for example), and some software may search for these drivers in order to find out the correct type of mouse to be used. […] Another advantage would be that device drivers usually consume less memory (no environment, no PSP) […] It's basically a tricky file header, a different code to parse the command line, a different entry point and exit line, and some segment magics to overcome the ORG 0 / ORG 100h difference. Self-loadhighing a device driver is a bit more tricky as you have to leave the driver header where it is and only relocate the remainder of the driver […]
- Paul, Matthias R. (2001-06-10) [1995]. "DOS COUNTRY.SYS file format" (COUNTRY.LST file) (1.44 ed.). Archived from the original on 2016-04-20. Retrieved 2016-08-20.
- Paul, Matthias R. (1997-07-30) [1994-05-01]. "Chapter II.4. Undokumentierte Eigenschaften externer Kommandos - SYS.COM". NWDOS-TIPs — Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds. MPDOSTIP. Release 157 (in German) (3 ed.). Archived from the original on 2017-09-10. Retrieved 2014-08-06.
Für ein zukünftiges Update für Calderas OpenDOS 7.01 habe ich den Startcode von IBMBIO.COM so modifiziert, daß er - falls fälschlicherweise als normales Programm gestartet - ohne Absturz zur Kommandozeile zurückkehrt. Wann diese Sicherheitsfunktion in die offizielle Version Einzug halten wird, ist jedoch noch nicht abzusehen.
(NB. NWDOSTIP.TXT is a comprehensive work on Novell DOS 7 and OpenDOS 7.01, including the description of many undocumented features and internals. It is part of the author's yet largerMPDOSTIP.ZIP
collection maintained up to 2001 and distributed on many sites at the time. The provided link points to a HTML-converted older version of theNWDOSTIP.TXT
file.) - Paul, Matthias R. (1997-10-02). "Caldera OpenDOS 7.01/7.02 Update Alpha 3 IBMBIO.COM README.TXT". Archived from the original on 2003-10-04. Retrieved 2009-03-29.
- DR-DOS 7.03 WHATSNEW.TXT - Changes from DR-DOS 7.02 to DR-DOS 7.03. Caldera, Inc. 1998-12-24. Archived from the original on 2019-04-08. Retrieved 2019-04-08.
- Sage, Jay (November–December 1992). Carlson, Art; Kibler, Bill D. (eds.). "Regular Feature, ZCPR Support, Language Independence, part 2". The Computer Journal (TCJ) - Programming, User Support, Applications. The Z-System Corner. Lincoln, CA, USA (58): 7–10. ISSN 0748-9331. ark:/13960/t70v9g87h. Retrieved 2020-02-09.
[…] there was an opcode of "RST 0", which, if executed, would result in a warm boot. A file containing a Z3TXT module should never be executed, but at a cost of one byte we could protect ourself against that outside chance. The header also contained the string of characters "Z3TXT" followed by a null (0) byte. Many Z-System modules include such identifiers. In this category are resident command packages (RCPs), flow command packages (FCPs), and environment descriptor modules (Z3ENVs). Programs, such as Bridger Mitchell's […] JETLDR.COM, that load these modules from files into memory can use the ID string to validate the file, that is, to make sure that it is the kind of module that the user has stated it to be. User mistakes and damaged files can thus be detected. […] The header, thus, now stands as follows: […] rst […] db 'Z3TXT',0 ; null-terminated ID […] ; 12345678 ; must be 8 characters, […] db 'PROGNAME' ; pad with spaces […] ; 123 ; must be 3 characters […] db 'ENG' ; name of language […] dw LENGTH ; length of module […]
- "2. Operating System Call Conventions". CP/M 2.0 Interface Guide (PDF) (1 ed.). Pacific Grove, California, USA: Digital Research. 1979. p. 5. Archived (PDF) from the original on 2020-02-28. Retrieved 2020-02-28.
[…] The end of an ASCII file is denoted by a control-Z character (1AH) or a real end of file, returned by the CP/M read operation. Control-Z characters embedded within machine code files (e.g., COM files) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. […]
(56 pages) - Hogan, Thom (1982). "3. CP/M Transient Commands". Osborne CP/M User Guide - For All CP/M Users (2 ed.). Berkeley, California, USA: A. Osborne/McGraw-Hill. p. 74. ISBN 0-931988-82-9. Retrieved 2020-02-28.
[…] CP/M marks the end of an ASCII file by placing a CONTROL-z character in the file after the last data character. If the file contains an exact multiple of 128 characters, in which case adding the CONTROL-Z would waste 127 characters, CP/M does not do so. Use of the CONTROL-Z character as the end-of-file marker is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. […]
- BC_Programmer (2010-01-31) [2010-01-30]. "Re: Copy command which merges several files tags the word SUB at the end". Computer Hope Forum. Archived from the original on 2020-02-26. Retrieved 2020-02-26.
- "What are the differences between Linux and Windows .txt files (Unicode encoding)". Superuser. 2011-08-03 [2011-06-08]. Archived from the original on 2020-02-26. Retrieved 2020-02-26.
- Gordon, Ryan C. (October 2009). "FatELF: Universal Binaries for Linux". icculus.org. Archived from the original on 2020-08-27. Retrieved 2010-07-13.
- Gordon, Ryan C. (November 2009). "FatELF specification, version 1". icculus.org. Archived from the original on 2020-08-27. Retrieved 2010-07-25.
- Windisch, Eric (2009-11-03). "Subject: Newsgroups: gmane.linux.kernel, Re: FatELF patches..." gmane.org. Archived from the original on 2016-11-15. Retrieved 2010-07-08.
- "VM image of Ubuntu 9.04 with Fat Binary support".
- Holwerda, Thom (2009-11-03). "Ryan Gordon Halts FatELF Project". osnews.com. Retrieved 2010-07-05.
- Brockmeier, Joe (2010-06-23). "SELF: Anatomy of an (alleged) failure". Linux Weekly News. Retrieved 2011-02-06.
- Mulder, Sijmen J. (2020-04-28). "sjmulder/fatpack". GitHub.
- "LTO Overview (GNU Compiler Collection (GCC) Internals)". gcc.gnu.org.
- Wennborg VI, Hans (2018). "Attributes in Clang". Clang 7 documentation.
- "Transparent use of library packages optimized for Intel architecture". Clear Linux* Project.