by Phil Martin
Figure 60: Computer Architecture
The computer processer is usually called the central processing unit, or CPU, which is itself made up of three components – the ALU, the control unit and registers.
The arithmetic logic unit, or ALU, is a chip that performs mathematical and logical operations on data that is fed to it from the registers.
The control unit plays traffic cop by telling the other components what to do and when.
A register is a special type of memory that is built for extreme speed and is therefore fairly expensive. The register holds the data that the CPU will operate on.
The register can only hold a small amount of data, so other types of system memory are used by a computer to hold the bulk of data such as random-access memory, or RAM, hard disks, optical disks such as DVDs and CDs, and flash memory such as USB drives.
An I/O device is used by the computer to interact with external interfaces such as hard drives, printers, and USB devices. For example, a USB keyboard is an input device, while an HDMI monitor is an output device. Some devices can act as both input and output simultaneously, such as a touch screen.
The various components of a computer all talk with each other using a communication channel called a bus.
Before we can put all of this together and see how a computer operates, we need to understand instructions. After all, if a computer doesn’t have instructions, how does it know what to do? An instruction is comprised of two elements – an opcode and one or more operands. An operand is a bit of data, and the opcode tells the computer what should be done with an operand. Instructions are very specific to the hardware architecture, and only the processor manufacturer gets to define them. For example, Intel has a specific set of instructions that are not compatible with a CPU made by AMD. They both can do the same things, but the instruction set is unique to each chip manufacturer.
One last detail – the CPU register cannot hold a lot of data, so instead of loading data directly into its memory area, it often simply loads a memory address in RAM where the real data lives. When an instruction needs to be executed, it will fetch the data from RAM into the register and operate on it. Memory in RAM is arranged in sequential blocks, with each block having a unique address expressed as a hexadecimal value. For example, memory blocks start with 0x000000 and run up to 0xffffffff. The memory at the lower end near location 0x000000 is called low memory, and the memory blocks at the high end of the range near 0xffffffff are called high memory.
Now, why we are we really discussing this? After all, this book is about software security, not hardware architecture. The short answer is that an attacker will know this information and use it to exploit your software, and if you don’t understand it yourself, then you are not going to be able to properly secure your own software. So, armed with all this knowledge, let’s pull everything together.
Figure 61: Memory Layout
First, software will be stored in non-volatile memory such as a hard drive. The CPU will allocate space in RAM and load the program. Programs will be loaded in the same general layout as shown in Figure 61. At the lower end of memory, we find program text, which contains the instructions to execute. Then read-write data is loaded. This area contains all of the various variables used by the instructions. Next, we find the stack, which is where function variables, local data and some special register values are placed. One of these special values is called the execution stack pointer, or ESP, and is used to keep track of the currently executing function by pointing to the memory address where that function is located. In the same area as the stack we also find the heap, which is where variable-sized and large objects too large to be placed on the stack are stored. The heap also allows us to run more than one process at a time, but attacks will be mostly focused on the stack.
Figure 62: How the CPU Executes Instructions
Now let’s zero in on the program text space, where the instructions to be executed are stored as shown in Figure 62. The CPU will execute four steps for each instruction it encounters in this space. First, it fetches the instruction from system memory into the local registers. To do this, the control unit keeps track of two things – an instruction pointer that points to the currently executing instruction, and a data pointer, which points to the memory address where the data for the instruction is stored.
Next, the control unit decodes the instruction and directs the required data to be moved from system memory onto the ALU. The ALU then executes the instruction by performing the mathematical or logical operation on the data. Finally, the ALU stores the results of the operation in memory or in a register. The control unit then releases the result to an output device or a secondary storage device, such as system memory.
Figure 63: LIFO
The fetch-decode-execute-store process is known as the machine cycle. Remember that CPU registers store only the most immediate data, while the vast majority of data is stored in RAM. Both the stack and heap are found in RAM, and when a program instantiates an object the memory allocated will be on the stack or heap depending on the dynamic size of the object. The stack stores the most recently used function values and local variables, and operates on a LIFO principle – last-in, first-out. To store information on the stack, a PUSH operation is performed, while a POP operation Is used to remove information. Think of loading a Pringles chip can, one chip at a time. The first chip you PUSH into the can will fall to the bottom, and the next chip PUSHed will cover it up. The only way to get to the bottom chip is to POP the second chip off. The last chip put into the can must be the first chip removed. Figure 63 illustrates this mechanism.
As an attacker, this behavior is very interesting. If you look at how the stack is oriented in relation to low and high memory addresses, you can see that the first ‘chip’ pushed into the can will be assigned the highest memory address, and each subsequent ‘chip’ will be assigned the next available lower memory address. Now think ‘evilly’ – what could I do to crash a program? I could force the program to PUSH so many functions onto the stack that the stack space is overrun, and I can start overwriting the heap area, and then the read-write area, and finally the program text area! This is why an infinite recursive loop in a program is so destructive – a function calls itself, which calls itself, which calls itself, until eventually the stack is exhausted, and we encounter the blue screen of death. That is, if the machine is running Windows, anyway. It’s important to understand this so that we can put in proper countermeasures.
Evolution of Programming Languages
Now that we have covered hardware operations, let’s step back a little and look at how programming languages have evolved over time. Then we can take a look at how programming languages interact with the hardware architecture we just covered to produce even more opportunities for evil hackers to make our lives difficult.
The History
Imagine having to write software one instruction at a time in binary code – nothing but a series of 0’s and 1’s. Back in the days when computers were still new, this is exactly what programmers did by writing machine language – a language that machines were able to understand, but very difficult for people to comprehend, much less create. Then, some very smart people created assembly language which consists of a series of very rudimentary abbreviations. For example, to PUSH a value onto the stack, assembly language provides the ‘PUSH’ command. An assembler converts assembly language into machine language. Assembly language can still be found in-use today for very specific needs when performance needs to be high and code size needs to be low.
Figure 64: Programming Language Levels
At some point, more smart people decided a better approach should be taken and created the first high-level programming language that separated how a computer thinks from how a human thinks. High-level languages allow a programmer to express logic in a way that makes sense to the human mind, and a compiler converts the instructions into machine language. At this point, programmers became more concerned with impleme
nting business rules than how to make a computer understand what to do with individual bits. This drastically increased the complexity that was possible in programs. Unfortunately, this also allowed hackers to not have to understand how a computer operates at a low-level as well, so while increasing productivity, high-level languages also increased security risks.
Today programmers almost exclusively use very high-level languages such as Java and .Net, which can be read and somewhat understood even by non-programmers. The latest type of programming languages, called natural language, allows the programmer to simply state the problem in an English-like syntax instead of writing algorithms. Natural languages are still not in common use, however.
Each language has its own style of verbiage and constraints, which is called its syntax. When a programmer writes text using the proper syntax, it is called source code. Source code must be converted into machine language before a computer can execute it, which is called compiling. A different approach, which is very prevalent in modern languages, is to convert source code to an intermediate format, which is then run by a separate program at run-time. This is called an interpreted language as opposed to a compiled language.
Compiled Languages
Compiled languages were the norm up until the 1990s. This approach converts source code directly into machine code that a computer can execute. This conversion requires two separate steps as shown in Figure 65.
First, the textual source code written by the programmer is compiled into raw instructions that are specific to the target processor. This process is carried out by a program called a compiler. Remember that each processor has its own unique set of possible instructions. The output of the compiler is called object code, but this cannot be used by a computer until we combine it with other files the machine will need, which is the second step called linking. Linking produces the executable file that the computer can understand and use. In other words, object code that has been linked with other dependent files is the executable code. The process that performs linking is called a linker.
Figure 65: The Build Process
There are two types of linking – static and dynamic. When the linker copies all functions, variables and libraries into the executable itself, we are performing static linking. This results in better performance and easier distribution since it is self-contained. The downside is that the executable can become quite large in terms of both file size and how much dedicated memory must be allocated for it at run-time. Dynamic linking simply references these required dependencies in the executable but does not embed the resources. Instead, at run-time the operating system takes care of locating the referenced files and loads them on behalf of the executable. While this results in a much smaller executable, it can often cause problems if the required dependencies cannot be located at run-time. This also opens up a significant security vulnerability, as an attacker could replace the dependent files with his own malicious version, allowing him access to the internal workings of a process. Figure 66 summarizes this information.
Figure 66: Types of Linking
Interpreted Languages
Programs written in a compiled language are able to run directly on the processor, resulting in faster execution times. However, as they are by nature closer to the hardware, they also are more difficult to program. That changed when interpreted languages burst onto the scene in the 1990s, as represented by REXX, PostScript, Perl, Ruby and Python. Interpreted languages are still in common use today, with more modern examples being JavaScript and PHP. It is common to refer to such languages as ‘script’. Interpreted languages rely on a fully-compiled process to carry out instructions on their behalf. While interpreted source code is not compiled into anything close to native code, and as a result suffers from decreased run-time performance, scripted languages have the benefit of being easily updated without a need for recompilation or linking.
Figure 67: Types of Languages
Hybrid Languages
While compiled languages are difficult to use but fast, and interpreted languages are easy to use but slow, a hybrid language provides a very good compromise between the two by being easy to use while providing acceptable performance. In this approach, the source code is compiled into an intermediary format which resembles object code, which is then interpreted as required. Java and .Net are the most well-known examples of this approach, and arguably represent the bulk of modern programming going on today. However, they each have their own take on how to carry out the process.
Java compiles source code into bytecode, which closely resembles processor instructions but is still one step above assembly language. It requires a run-time interpreter called the Java Virtual Machine, or JVM, to execute the instructions. In some ways, this is not far from an interpreted language as an interpreter is involved, but the primary difference is that instead of the source code being interpreted at run-time, the compiled bytecode is.
.Net compiles source code into an intermediate format, known as Common Intermediate Language, or CIL. At run-time, the Common Language Runtime, or CLR, compiles the code into native code, which is executed by the machine directly. This results in a faster performance than Java but does have a slight impact on its portability across platforms.
Figure 67 summarizes the pros and cons of each type of language.
Programming Language Environment
Now that we’ve covered the various types of programming languages, let’s focus on the real question – which one should we use, and how does the answer impact secure software?
Selecting the Right Programming Language
Usually an organization will use the programming language based on the strengths of existing development staff. If 80% of developers know Java, then the language of choice will naturally be Java. Sometimes the choice is made due to preference – for example, if a Perl-based development team must move onto a modern platform, they will most likely choose Java as well, since it is closer to their preferred *nix environment. Other times a new and upcoming language that looks fun or would look good on a resume is chosen. This is perhaps the worst possible basis to use when selecting a language, but it happens more often than you might think. I have inherited more than one project that failed to deliver because the wrong language was chosen for this reason. Sometimes a complete rewrite was required to overcome limitations imposed by the wrong language choice. The appropriate programming language must be a purposeful choice made after design considerations. For example, an unmanaged code programming language such as C or C++ may be required when speed is of the essence and memory is at a premium, such as on embedded devices. However, unmanaged languages are inherently less secure than managed code such as Java or .Net, so the cost of a security breach must be taken into consideration if cost is the reason for choosing the language to begin with. A decent compromise is to use managed code by default and then switch to unmanaged code only when needed.
Protection mechanisms such as encryption, hashing and masking are crucial to security, but are concepts that are too high-level to be of much use when addressing security concerns down in the bowels of low-level source code. Here, we need to think in terms of data types, formatting, range limitations and length of values.
Primitive Data Types
Programming languages have something called primitive data types. Some common examples are character, integer, floating point numbers and Boolean. They are considered primitive as they are the most basic building blocks for everything else in the language. Some languages allow developers to define their own data types, but this is not recommended as it unduly increases the attack surface. Such languages are generally scripting-based such as JavaScript, VBScript and PHP, while strongly-typed languages such as C++, .Net and Java do not allow such a construct and are considered to be more secure since the attack surface remains protected from dynamic data types.
Name
Size (bits)
Range
Unsigned
Signed
by
te
8
0 to 255
-128 to 127
int, short, Int16, Word
16
0 to 65,535
-32,768 to 32,767
long int, Int32, Double Word
32
0 to 4,294,967,295
-2,147,483,648 to 2,147,483,647
long
64
0 to 18,446,744,073,709,551,615
- 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
byte
8
0 to 255
-128 to 127
int, short, Int16, Word
16
0 to 65,535
-32,768 to 32,767
long int, Int32, Double Word
32
0 to 4,294,967,295
-2,147,483,648 to 2,147,483,647
long
64
0 to 18,446,744,073,709,551,615
- 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
Figure 68: Integer data type sizes and ranges
With strongly-typed languages, the possible values assignable to a data type are restricted. For example, a Boolean has only two values – true or false. An integer can only contain whole numbers, never a fraction. A concept that can be confusing to non-developers is that of signed and unsigned values. For example, a signed integer can hold values from -32,768 to 32,768, while an unsigned integer can represent a value from 0 to 65,535. The same number of bytes are used for each, but with a signed value a single bit is used to denote the -/+ sign, thereby reducing the number of possible values it can hold.