Self-relocation
In computer programming, a self-relocating program is a program that relocates its own address-dependent instructions and data when run, and is therefore capable of being loaded into memory at any address.[1][2] In many cases, self-relocating code is also a form of self-modifying code.
Overview
Self-relocation is similar to the relocation process employed by the linker-loader when a program is copied from external storage into main memory; the difference is that it is the loaded program itself rather than the loader in the operating system or shell that performs the relocation.
One form of self-relocation occurs when a program copies the code of its instructions from one sequence of locations to another sequence of locations within the main memory of a single computer, and then transfers processor control from the instructions found at the source locations of memory to the instructions found at the destination locations of memory. As such, the data operated upon by the algorithm of the program is the sequence of bytes which define the program.
Self-relocation typically happens at load-time (after the operating system has loaded the software and passed control to it, but still before its initialization has finished), sometimes also when changing the program's configuration at a later stage during runtime.[3][4]
Examples
Boot loaders
As an example, self-relocation is often employed in the early stages of bootstrapping operating systems on architectures like IBM PC compatibles, where lower-level chain boot loaders (like the master boot record (MBR), volume boot record (VBR) and initial boot stages of operating systems such as DOS) move themselves out of place in order to load the next stage into memory.
x86 DOS drivers
Under DOS, self-relocation is sometimes also used by more advanced drivers and RSXs/TSRs loading themselves "high" into upper memory more effectively than possible for externally provided "high"-loaders (like LOADHIGH/HILOAD, INSTALLHIGH/HIINSTALL or DEVICEHIGH/HIDEVICE etc.[5] since DOS 5) in order to maximize the memory available for applications. This is down to the fact that the operating system has no knowledge of the inner workings of a driver to be loaded and thus has to load it into a free memory area large enough to hold the whole driver as a block including its initialization code, even if that would be freed after the initialization. For TSRs, the operating system also has to allocate a Program Segment Prefix (PSP) and an environment segment.[6] This might cause the driver not to be loaded into the most suitable free memory area or even prevent it from being loaded high at all. In contrast to this, a self-relocating driver can be loaded anywhere (including into conventional memory) and then relocate only its (typically much smaller) resident portion into a suitable free memory area in upper memory. In addition, advanced self-relocating TSRs (even if already loaded into upper memory by the operating system) can relocate over most of their own PSP segment and command line buffer and free their environment segment in order to further reduce the resulting memory footprint and avoid fragmentation. Some self-relocating TSRs can also dynamically change their "nature" and morph into device drivers even if originally loaded as TSRs, thereby typically also freeing some memory.[4] Finally, it is technically impossible for an external loader to relocate drivers into expanded memory (EMS), the high memory area (HMA) or extended memory (via DPMS or CLOAKING), because these methods require small driver-specific stubs to remain in conventional or upper memory in order to coordinate the access to the relocation target area,[7][nb 1][nb 2] and in the case of device drivers also because the driver's header must always remain in the first megabyte.[7][6] In order to achieve this, the drivers must be specially designed to support self-relocation into these areas.[7]
Some advanced DOS drivers also contain both a device driver (which would be loaded at offset +0000h by the operating system) and TSR (loaded at offset +0100h) sharing a common code portion internally as fat binary.[6] If the shared code is not designed to be position-independent, it requires some form of internal address fix-up similar to what would otherwise have been carried out by a relocating loader already; this is similar to the fix-up stage of self-relocation but with the code already being loaded at the target location by the operating system's loader (instead of done by the driver itself).
IBM DOS/360 and OS/360 programs
IBM DOS/360 did not have the ability to relocate programs during loading. Sometimes multiple versions of a program were maintained, each built for a different load address. A special class of programs, called self-relocating programs, were coded to relocate themselves after loading.[8] IBM OS/360 relocated executable programs when they were loaded into memory. Only one copy of the program was required, but once loaded the program could not be moved (so called one-time position-independent code).
Other examples
As an extreme example of (many-time) self-relocation it is possible to construct a computer program so that it does not stay at a fixed address in memory, even as it executes. The Apple Worm[9] is a dynamic self-relocator.
See also
- Dynamic dead code elimination
- RPLOADER - a DR-DOS API to assist remote/network boot code in relocating itself while DOS boots
- Garbage collection
- Self-replication
- Self-reference
- Quine (computing)
Notes
- An exception to the requirement for a stub is when expanded memory is converted into permanent upper memory by the memory manager via EMSUMB, and thus it is effectively accessed as upper memory, not via EMS.
- There are two exceptions to the stub requirement for a driver to load into the HMA: A stub is not necessary when high memory is permanently enabled on machines without gate A20 logic, however, as this condition isn't met in general, generic DOS drivers cannot take advantage of it (unless they would explicitly test on this condition beforehand). Otherwise, a stub is also not necessary under DR DOS 6.0 and higher, when resident system extensions (like SHARE and NLSFUNC) only hook the multiplex interrupt INT 2Fh, because they can then utilize a backdoor interface to hook into the interrupt chain in kernel space so that the kernel's gate A20 handler will provide the functionality of the stub. Still, the driver has to perform self-relocation in order to function correctly in the HMA.
References
- Dhamdhere, Dhananjay M. (1999). Systems Programming and Operating Systems. New Delhi: Tata McGraw-Hill Education. p. 232. ISBN 0-07-463579-4. Archived from the original on 2020-02-01. Retrieved 2011-11-08. (658 pages)
- Dhamdhere, Dhananjay M. (2006). Operating Systems: A Concept-based Approach. New Delhi: Tata McGraw-Hill Education. p. 231. ISBN 0-07-061194-7. Archived from the original on 2020-02-20. Retrieved 2020-02-20. (799 pages)
- Paul, Matthias R.; Frinke, Axel C. (1997-10-13) [1991], FreeKEYB - Enhanced DOS keyboard and console driver (User Manual) (6.5 ed.) (NB. FreeKEYB is a Unicode-based dynamically configurable driver supporting most keyboard layouts, code pages, and country codes. Utilizing an off-the-shelf macro assembler as well as a framework of automatic pre- and post-processing analysis tools to generate dependency and code morphing meta data to be embedded into the executable file alongside the binary code and a self-discarding, relaxing and relocating loader, the driver supports to be variously loaded and install itself as TSR or device driver and it implements advanced self-relocation techniques (including into normal DOS memory, UMBs, unused video memory, or raw memory also utilizing program segment prefix overloading and environment segment recombination) and byte-level granular dynamic dead code elimination at load-time as well as self-modifying code and reconfigurability at run-time to minimize its memory footprint depending on the hardware, operating system and driver configuration as well as the selected feature set and locale.)
- Paul, Matthias R.; Frinke, Axel C. (2006-01-16), FreeKEYB - Advanced international DOS keyboard and console driver (User Manual) (7 (preliminary) ed.)
- "Chapter 10 Managing Memory". Caldera DR-DOS 7.02 User Guide. Caldera, Inc. 1998 [1993, 1997]. Archived from the original on 2017-08-30. Retrieved 2017-08-30.
- Paul, Matthias R. (2002-04-06). "Re: [fd-dev] ANNOUNCE: CuteMouse 2.0 alpha 1". freedos-dev. Archived from the original on 2020-02-07. Retrieved 2020-02-07.
[…] Add a SYS device driver header to the driver, so that CTMOUSE could be both in one, a normal TSR and a device driver - similar to our FreeKEYB advanced keyboard driver. […] This is not really needed under DR DOS because INSTALL= is supported since DR DOS 3.41+ and DR DOS preserves the order of [d]config.sys directives […] but it would […] improve the […] flexibility on MS-DOS/PC DOS systems, which […] always execute device= directives prior to any INSTALL= statements, regardless of their order in the file. […] software may require the mouse driver to be present as a device driver, as mouse drivers have always been device drivers back in the old times. These mouse drivers have had specific device driver names depending on which protocol they used ("PC$MOUSE" for Mouse Systems Mode for example), and some software may search for these drivers in order to find out the correct type of mouse to be used. […] Another advantage would be that device drivers usually consume less memory (no environment, no PSP) […] It's basically a tricky file header, a different code to parse the command line, a different entry point and exit line, and some segment magics to overcome the ORG 0 / ORG 100h difference. Self-loadhighing a device driver is a bit more tricky as you have to leave the driver header where it is and only relocate the remainder of the driver […]
- Paul, Matthias R. (2002-02-02). "Treiber dynamisch nachladen" [Loading drivers dynamically] (in German). Newsgroup: de.comp.os.msdos. Archived from the original on 2017-09-09. Retrieved 2017-07-02. (NB. Gives an overview on load-high methods under DOS, including the usage of LOADHIGH etc. commands and self-relocating methods into UMBs utilizing the XMSUMB API. It also discusses more sophisticated methods necessary for TSRs to relocate into the HMA utilizing intra-segment offset relocation.)
- Boothe Management Systems (1972-11-01). "Throughput - Are you getting all you deserve? - DOSRELO". Computerworld - The Newsweekly For The Computer Community (advertisement). VI (44). San Francisco, California, USA: Computerworld, Inc. p. 9. Archived from the original on 2020-02-06. Retrieved 2020-02-07.
[…] DOSRELO provides a method of making DOS problem programs self-relocating. DOSRELO accomplishes the self-relocation capability for all programs, regardless of the language, by adding entry point logic to the object code of the program before the Linkage Editor catalogs it on the Core Image Library. […]
- Dewdney, Alexander Keewatin (March 1985). "Computer Recreations - A Core War bestiary of viruses, worms and other threats to computer memories". Scientific American. 285: 38–39. Archived from the original on 2017-07-04. Retrieved 2017-07-04.
Further reading
- Kildall, Gary Arlen (February 1978). "A simple technique for static relocation of absolute machine code". Dr. Dobb's Journal of Computer Calisthenics & Orthodontia. People's Computer Company. 3 (2): 10–13 (66–69). ISBN 0-8104-5490-4. #22. Archived from the original on 2017-09-09. Retrieved 2017-08-19. (This "resize" method, named page boundary relocation, could be applied statically to a CP/M-80 disk image using MOVCPM in order to maximize the TPA for programs to run. It was also utilized dynamically by the CP/M debugger Dynamic Debugging Tool (DDT) to relocate itself into higher memory. The same approach was independently developed by Bruce Van Natta of IMS Associates to produce relocatable PL/M code. As paragraph boundary relocation another variant of this method was later utilized by dynamically HMA self-relocating TSRs like KEYB, SHARE, and NLSFUNC under DR DOS 6.0 and higher. A much more sophisticated and byte-level granular offset relocation method based on a somewhat similar approach was independently conceived and implemented by Matthias R. Paul and Axel C. Frinke for their dynamic dead-code elimination to dynamically minimize the runtime footprint of resident drivers and TSRs (like FreeKEYB).)
- Huitt, Robert; Eubanks, Gordon; Rolander, Thomas "Tom" Alan; Laws, David; Michel, Howard E.; Halla, Brian; Wharton, John Harrison; Berg, Brian; Su, Weilian; Kildall, Scott; Kampe, Bill (2014-04-25). Laws, David (ed.). "Legacy of Gary Kildall: The CP/M IEEE Milestone Dedication" (PDF) (video transscription). Pacific Grove, California, USA: Computer History Museum. CHM Reference number: X7170.2014. Archived (PDF) from the original on 2014-12-27. Retrieved 2020-01-19.
[…] Laws: […] "dynamic relocation" of the OS. Can you tell us what that is and why it was important? […] Eubanks: […] what Gary did […] was […] mind boggling. […] I remember the day at the school he came bouncing into the lab and he said, I have figured out how to relocate. He took advantage of the fact that the only byte was always going to be the high order byte. And so he created a bitmap. […] it didn't matter how much memory the computer had, the operating system could always be moved into the high memory. Therefore, you could commercialize this […] on machines of different amounts of memory. […] you couldn't be selling a 64K CP/M and a 47K CP/M. It'd just be ridiculous to have a hard compile in the addresses. So Gary figured this out one night, probably in the middle of the night thinking about some coding thing, and this really made CP/M possible to commercialize. I really think that without that relocation it would have been a very tough problem. To get people to buy it, it'd seem complicated to them, and if you added more memory you'd have to go get a different operating system. […] Intel […] had the bytes reversed, right, for the memory addresses. But they were always in the same place, so you could relocate it on a 256 byte boundary, to be precise. You could therefore always relocate it with just a bitmap of where those […] Laws: Certainly the most eloquent explanation I've ever had of dynamic relocation […]
(33 pages) - Mitchell, Bridger (July–August 1988). Carlson, Art (ed.). "Z3PLUS & Relocation - Information on ZCPR3PLUS, and how to write self relocating Z80 code". The Computer Journal (TCJ) - Programming, User Support, Applications. Advanced CP/M. Columbia Falls, Montana, USA (33): 9–15. ISSN 0748-9331. ark:/13960/t36121780. Retrieved 2020-02-09.
- Sage, Jay (September–October 1988). Carlson, Art (ed.). "ZCPR3 Corner - More on relocatable code, PRL files, ZCPR34, and Type-4 programs". The Computer Journal (TCJ) - Programming, User Support, Applications. Advanced CP/M. Columbia Falls, Montana, USA (34): 20–25. ISSN 0748-9331. ark:/13960/t0ks7pc39. Retrieved 2020-02-09.
- Harrell III, John B. (October 1983). "DOSPLUS 3.5". 80 Micro. Review. 1001001, Inc. (45): 160, 162, 164–168, 170. ISSN 0744-7868. ark:/13960/t8z906r42. Retrieved 2020-02-06.
- Smith, Lee; Haines, Lionel (1989-02-02) [1987-08-14]. RISC OS Application Image Format (previously Arthur Image Format) (Technical Memorandum) (1.00 ed.). Cambridge, UK: Acorn Computers Limited, Programming Languages Group. PLG-AIF. Archived from the original on 2017-08-30. Retrieved 2017-08-30.
- Properties of ARM Image Format. 1993. Archived from the original on 2017-08-31. Retrieved 2017-08-31.
- Huck, Alex (2016-08-14). "Nachladbare Treiber unter CP/M - PRL2COM". Homecomputer DDR (in German). Archived from the original on 2020-02-21. Retrieved 2020-02-21; Pohlers, Volker (2017-04-24) [2012-02-20, 2009, 2002, 1988-07-26, 1987-10-11]. "PRL2COM". Homecomputer DDR (in German). Archived from the original on 2020-02-21. Retrieved 2020-02-21.