Friday, June 13, 2008

C used to be simple


Why does the following C code cause a "Bus Error"?

  char * bug = "bug";

  bug[0] = 'm';

while this one runs fine:

  char bug [] = "bug";

  bug[0] = 'h';


The "char *" variable actually points to a "const char []" buffer which is located in a read-only memory address.


Read only address??

Isn't C "unmanaged" code?

Then who is checking that I'm not writing on "const"s??


Back in the old days - no one did.

Nowadays, we have Virtual-Memory and Paging

Introducing Virtual Memory:

Virtual memory is a feature of the CPU which common Operating Systems employ.


  • Restrict programs from access to the memory of other programs.
  • Let programs use more memory than the machine's RAM. In a backwards-compatible way - "tricking" the program to "believe" it is using RAM.
How (in short):

When a program accesses a memory address, many things happen:
  • The CPU translates the ("virtual") address to a physical address in the RAM, using the "Page Table"
  • The Page Table tells where to find each Page (a 4KB piece of address space)
  • Each Page may either be mapped to a location in the RAM or "unavailable"
  • If the page is mapped to RAM then the CPU will simply access it
  • If the page is "unavailable", the CPU will interrupt your program and jump to the Operating System
  • Why is the page unavailable? Because you don't have enough RAM for to hold all Virtual Memory in it
  • So where is the data? In your hard-drive.
  • The OS will return control to your program only after it reads the page from the Hard Drive and change the Page Table to reflect it's new mapping
And your unsuspecting code runs as if it was just using regular RAM all along.

  • The OS will need an available "Physical" RAM page to read the data from disk to.
  • To have one available it needs to "Swap Out" other memory.
  • Swapping out is the process of moving a page from RAM to disk to make room for more pages
More about Virtual Memory

Theoretical Alternative to Virtual Memory:
  • Protection of applications is achieved by running only "Managed Code"
  • Swapping is done explicitly or more intelligently using "hints". (Current strategy is the Least-Recently-Used Heuristic)
Advantages and reasons of Virtual Memory:
  • Operating Systems had to be backwards-compatible to be adopted.
  • Programmers refused to code in "Managed" software environments. They wanted to program directly to the CPU.
  • No alternative was developed
Use cases showing Virtual Memory/Paging's disadvantages:
  • Data that is faster to recompute than to write to disk and read from it, is now swapped out and in.
  • Data of objects that the program has freed and will override with new objects (without reading it again) is now swapped out and in.
  • When the LRU heuristic isn't the best you can do and you can prefetch
And I still haven't answered the question

(but all this exposition was mandatory)
  • Pages can also be marked as "Read Only"
  • When writing to those the CPU interrupts the program and calls the OS, like in unavailable pages
  • C's "const char []" literals are stored in Read-Only pages.
BTW: Read only pages are also used to implement "Copy-On-Write" of "Shared Memory"

C used to be simple..