(PDF) Security in Embedded Hardware - University of Twente - DOKUMEN.TIPS (2024)

Security in Embedded Hardware

Daniel ZienerComputer Architecture for Embedded Systems

University of TwenteEmail: [emailprotected]

5. Februar 2019

Contents

1 Motivation 11.1 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 ASICs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Definitions 92.1 Dependability and its Attributes . . . . . . . . . . . . . . . . . . . 9

2.1.1 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.4 Confidentially . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.5 Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.6 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . 112.1.7 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Fault, Error, Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.1 Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.3 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Means to Attain Dependability . . . . . . . . . . . . . . . . . . . . 142.3.1 Fault Prevention . . . . . . . . . . . . . . . . . . . . . . . 142.3.2 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . 142.3.3 Fault Removal . . . . . . . . . . . . . . . . . . . . . . . . 152.3.4 Fault Forecasting . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Security Flaws and Attacks . . . . . . . . . . . . . . . . . . . . . . 152.5 Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . 172.5.2 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . 182.5.3 Execution Time Overhead . . . . . . . . . . . . . . . . . . 18

2.6 IP Cores and Design Flow . . . . . . . . . . . . . . . . . . . . . . 19

3 Attacks on Embedded Systems 213.1 Code Injection Attacks . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Invasive Physical Attacks . . . . . . . . . . . . . . . . . . . . . . . 253.3 Non-Invasive Logical Attacks . . . . . . . . . . . . . . . . . . . . . 273.4 Non-Invasive Physical Attacks . . . . . . . . . . . . . . . . . . . . 27

4 Defenses Against Code Injection Attacks 314.1 Methods using an Additional Return Stack . . . . . . . . . . . . . . 31

iii

Contents

4.2 Methods using Address Obfuscation and Software Encryption . . . 324.3 Safe Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4 Code Analyzers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 344.6 Compiler, Library, and Operating System Support . . . . . . . . . . 35

4.6.1 Compiler Support . . . . . . . . . . . . . . . . . . . . . . . 364.6.2 Library Support . . . . . . . . . . . . . . . . . . . . . . . . 374.6.3 Operation System Support . . . . . . . . . . . . . . . . . . 38

4.7 Control Flow Checking . . . . . . . . . . . . . . . . . . . . . . . . 394.7.1 Software-Based Methods . . . . . . . . . . . . . . . . . . . 394.7.2 Methods using Watchdog Processors . . . . . . . . . . . . . 414.7.3 New Control Flow Checking . . . . . . . . . . . . . . . . . 47

5 IP Protection 635.1 Encryption of IP Cores . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1.1 Encrypted HDL or Netlist Cores . . . . . . . . . . . . . . . 655.1.2 Encrypted FPGA Configurations . . . . . . . . . . . . . . . 67

5.2 Additive Watermarking of IP Cores . . . . . . . . . . . . . . . . . 685.2.1 HDL Cores . . . . . . . . . . . . . . . . . . . . . . . . . . 685.2.2 Netlist Cores . . . . . . . . . . . . . . . . . . . . . . . . . 695.2.3 Bitfile Cores . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.3 Constraint-Based Watermarking of IP Cores . . . . . . . . . . . . . 1005.3.1 HDL Cores . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.3.2 Netlist Cores . . . . . . . . . . . . . . . . . . . . . . . . . 1025.3.3 Bitfile and Layout Cores . . . . . . . . . . . . . . . . . . . 103

5.4 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Bibliography 107

iv

1Motivation

Since the invention of the transistor, the complexity of integrated circuits continues togrow rapidly. First, only basic functions like discrete logic gates were implementedas integrated circuits. With improvements in chip manufacturing, the size of thetransistors was drastically reduced and the maximum size of a die was increased.Now, it is possible to integrate more then one billion transistors [Xil03] on one chip.

In the beginning, electric circuits (e.g., a central processing unit) consisted of dis-crete electronic devices which were integrated on printed circuit boards (PCBs) andconsumed a lot of power. The invention of integrated circuits in the end of the 1950slaid also the cornerstone of the development of embedded systems. For the first time,the circuits were small enough and consumed less power, so that applications embed-ded into a device, like production machines or consumer products became possible.An embedded system is considered as a complete special purpose computer that mayconsist of one or more CPUs, memories, a bus structure and special purpose cores.

The first integrated circuits were able to integrate basic logic functions (e.g., AND-,OR-gate) and flip-flops. With further integration, complex circuits, like processors,could be implemented into one chip. Today, it is possible to integrate a whole systemwith processors, buses, memories and specific hardware cores on a single chip, a socalled system-on-chip (SoC).

These small, power and cost efficient, but manifolded applicable embedded sys-tems finally took off on their triumphal course. Today, embedded systems are in-cluded in most electrical devices, from the coffee machine over stereo systems towashing machines. The application field of embedded systems spans from consumerproducts, like mobile phones or television sets, over safety critical applications, likeautomotive or nuclear plant applications, to security applications, such as smart cardsor identity cards.

As integration density grew, problems with heat dissipation arose. The embeddingof electronics into systems with small place and reduced cooling possibility, or theoperation in areas with extreme temperature, intensify this problem. Furthermore,an embedded system which is integrated into an environment with moving parts isexposed to shock. Thermic and shock problems have a high influence on the relia-bility of the system. On the other hand, a system that steers big machines or controlsa dangerous process must have a high operational reliability. These are all reasons

1

1. Motivation

that design for reliability is gaining more and more influence on the development ofembedded systems.

However, what is the need for reliability, if everyone may alter critical parametersor shut down important functions? To solve these problems, we need access controlto the embedded system. But, today, embedded systems are also used to grant accessto other systems or buildings. One example are chip cards. Inside these cards, asecret key is stored. It is important that no unauthorized persons or systems are ableto read this secret key. Thus, an embedded system should not only be reliable butalso secure.

Integration of functions for the guarantee of reliability and security features in-creases also the complexity of the integrated system enormously and thus designtime. On the other hand, the market requires shorter product cycles. The only solu-tion is to reuse cores, which have been designed for other projects or were purchasedfrom other companies. The number of reused cores constantly increases. The advan-tages of IP core (Intellectual Property cores) reuse are substantial. For example, theyoffer a modular concept and fast development cycles.

IP cores are licensed and distributed like software. One problem of the IP coresdistribution, however, is the lack of protection against unlicensed usage, as cores canbe easily copied. Future embedded systems should also be able to prevent the usageof unlicensed cores or the core developers should be able to detect their cores insidean embedded system from third party manufactures.

Considering todays embedded systems, the integration of reliability and securityincreasing functions depends on the application field. In the area of security-criticalsystems (e.g., chip cards, access systems, etc.), several security functions are im-plemented. We find additional reliability functions in systems where human life orvaluable assets are at stake (e.g., power plants, banking mainframes, airplanes, etc.).On the other hand, the problem of all these additional functions is the requirementfor additional chip area. For cost-sensitive products which are produced in huge vol-umes, like mobile phones or chip cards, the developer must rethink to integrate suchadditional functions.

1.1 Security

Security becomes more and more important for computers and embedded systems.With the ongoing integration of personal computers and embedded systems into net-works and finally into the Internet, security attacks on these systems arose. These net-worked, distributed devices may now compute sensitive data from the whole worldand the attacker does not need to be physically present. Also, the increased complex-ity of these devices increases the probability of errors which can be used to breakinto a system. Figure 1.1 shows a classification of different types of attacks related tocomputer systems. This information is obtained form the CSI Computer Crime and

2

1.1 Security

Security Survey [Ric08], where 522 US-companies reported their experience withcomputer crime. Further, the integration of networking interfaces into embeddeddevices, for which it would not be obviously necessary lead to strange attacks, forexample that someone can break into the coffee machine over the Internet and alterthe composition of the coffee [Wri08].

������������� �

�����������

���� �������

�������������� �

����

����� ��������

�����������

�����������������

��������

�������������������������

������!��������!��∀

#���������� �����

∃������#�������� �����

%��

�&������ ∀

�������������������

∋�!���������

���������� �����������

( ) ∗( ∗) +( +) ,( ,) −( −) )(

∋�� ���������.���������/������)++0

Figure 1.1: Security attacks reported in the CSI Computer Crime and Security Sur-vey [Ric08], where 522 US-companies reported their experience withcomputer crime for the year 2008.

Within the last decade, the focus of the embedded software community paid moreattention onto security of software-based applications. Today, most of the softwareupdates fix security bugs and provide only little additional functionality. At the sametime, the number of embedded electronic devices including at least one processor isincreasing.

The awareness of security in digital systems lead to investigation of secure com-munication standards, for example SSL (Secure Socket Layer) [FKK96], the im-plementation of cryptographic methods, for example AES (Advanced EncryptionStandard) [Fed01], a better review of software code to find vulnerabilities, and theintegration of security measures into hardware. Nevertheless, Figure 1.2 shows thatthe vulnerability of digital systems increased rapidly over the last years. The maincause for vulnerability are software errors through which a system may be compro-

3

1. Motivation

mised. The software of embedded systems moves from monolithic software towardsmodule-based software organized and scheduled by an operating system. By meansof modern communication structures like the Internet, the software on embeddedsystems may be updated, partially or completely. These update mechanisms and thedifferent communication possibilities open the door for software based attacks on theembedded system. For example, the number of viruses and trojans on mobile phonesincreased rapidly over the last years. One main gateway for these attacks are bufferoverflows. A wrong jump destination or a wrong return address from a subroutinemight cause an execution of infiltrated code (see also Section 3.1).

���� ���� ���� ���� ���� ���� ���� ���� ��� ��� ���� ���� ����

����

����

���

���

����

����

����

����

����

�� �

����

��

����

������

�����

Figure 1.2: Vulnerability of digital systems reported to US-CERT between 1995and 2007 [US-08].

However, also hardware errors can lead to the vulnerability of a system. For exam-ple, Kaspersky shows that it is possible that the execution of appropriate instructionsequences on a certain processor can lead to an adoption of control of the systemby an attacker [KC08]. In this case, it does not matter which operation system orsoftware security programs are running on the system.

A common objective for attackers are sensitive data, which are stored inside a dig-ital system. To reach this objective, attackers are not only bound to software attacks.Hardware attacks, where the digital system is physically penetrated to gather infor-mation over the security facilities, or extract sensitive information are also practical.If an embedded device stores secure data, like a cryptographic key, attackers may try

4

1.2 FPGAs

������������

�� ��������� �����������

�� ����������

���������� ��������

��� ���������

��� ����������

��

��

��

!�

∀�

#�

∃�������������� %�������&�����∋

������������

�� ��������� �����������

�� ����������

���������� ��������

��� ���������

��� ����������

����

���

∀���

∃���

�����

�����

� ���

�∀���

�������()�∗

Figure 1.3: On the left side, the percentage of the usage of unlicensed software isshown in different areas of the world. On the right side the correspond-ing losses in million US-Dollars are depicted [All07].

to read out this secret data by physically manipulating the processor on the embeddeddevice. This may be done by differential fault analysis (DFA) [BS97] or by specificlocal manipulation on control registers inside the processor (see also Section 3.2).The attackers goal thereby is to execute infiltrated code or deactivate the protectionof the secured data which may result from the manipulation of the program counter.

Another relevant security aspect in embedded systems is intellectual property pro-tection (IPP). Due to shorter design cycles, many products can only be developedwith acquired hardware cores or software modules. Those companies selling thesecores and modules naturally have a high interest in securing their products againstunlicensed usage. Figure 1.3 shows the estimated percentage of unlicensed softwareused in different areas of the world. Also, calculated revenue losses are shown. Ad-ditionally, many unlicensed hardware IP cores are used in products. At the RSAconference in 1998, it was estimated, that the adversity of the usage of unlicensed IPcores approaches 1 Billion US$ per day [All00].

1.2 FPGAs

FPGAs (Field Programmable Gate Arrays) have their roots in the area of PLDs (Pro-grammable Logic Devices), such as PLAs (Programmable Logic Arrays) or PALs(Programmable Array Logics). Today, FPGAs have a significant market segment inthe microelectronics and, particularly in the embedded system area. The advantages

5

1. Motivation

of FPGAs over ASICs are their flexibility, the reduced development costs, and theshort implementation time. Also, developers have a limited implementation risk, a),because of the easy possibility to update an erroneous design and b), because of theawareness, that the silicon devices are proofed and the underlying technology oper-ates correctly under the specified terms.

The main advantage of FPGAs is their reconfigurability. The demand for flexi-bility through reconfigurability will rise according to ITRS [ITR07] from 28% of allfunctionalities in 2007 until to an estimated 68% in the year 2022. Note that ITRSalso takes into account software running on a microprocessor which can be updated.Furthermore, many FPGA devices support dynamic partial reconfiguration, whichmeans that during runtime, the design or a part of it can be reconfigured. With thisadvantage, we can envisage new designs with new and improved possibilities andproperties, like an adaptive design, which can adapt itself to a new operation envi-ronment. Unfortunately, dynamic reconfiguration is currently used rarely due to thelack of improved design tools which increases the development costs for dynamic re-configuration. But now, improved design tools for partial reconfiguration are startingto become available, like the ReCoBus-Builder [KBT08, KHT08] or Xilinx Plana-head [DSG05]. Nevertheless, dynamic reconfiguration for industrial designs is in itsinfancy, and it will take several years to use all the great features of FPGAs.

In the last years, the main application area of FPGAs were in small volume em-bedded systems and rapid prototyping platforms, where ASIC designs can be im-plemented and verified before the expensive masks are produced. Nevertheless, theFPGA usage in higher volume market rises, mainly due to lower FPGA price, higherlogic density and lower power consumption. Furthermore, due to shorter time-to-market cycles and rising ASIC costs, FPGAs are breaking more and more into tra-ditional ASIC domains. On the other hand, FPGAs are becoming competitors in the(reconfigurable) DSP domain with multi-core and coarse-grain reconfigurable archi-tectures, as well as from graphic processing units (GPU) where DSP algorithms areadapted to run on these architectures. Nevertheless, these architectures suffer fromthe lack of flexibility and today, only FPGA technology is flexible enough to imple-ment a heterogeneous reconfigurable system-on-a-chip.

1.3 ASICs

Besides the advantages and the success of FPGAs, there still exists a huge market fortraditional ASICs (Application Specific Integrated Circuit). ASICs are designed forhigh volume productions, where small cost-per-unit is important, as well as in lowpower and high performance applications and designs with a high logic density.

The implementation of a core on an ASIC instead of an FPGA (both 90 nm tech-nology) may require 40 times less area, may speed up the critical path by a factorbetween 3 and 4, and may reduce the power by a factor of about 12 [KR06]. Here,

6

1.3 ASICs

we see that the big advantage of ASICs over FPGAs is the higher logic density, whichresults in significantly lower production cost per unit. The disadvantages of ASICsare the higher development and the higher initial production costs (e.g., masks, pack-age design, test development [Kot06]). Therefore, the decision for using ASICs orFPGAs due to minimization of the total costs is highly dependent on the produc-tion volume. Figure 1.4 shows a comparison of the total costs between ASICs andFPGAs in different technology generations over the production volume. The ASICgraphs start with higher costs due to the high initial production costs, but with a lowerslope due to cheap production costs per unit. The initial cost of ASICs increases fromtechnology generation to generation, mainly because of the increasing chip and tech-nology complexity and logic density. FPGA designs have lower initial costs, buthigher costs per unit. In summary, the total costs of a design using FPGA technologyis lower until reaching a certain production volume point. However, according to Xil-inx [RBD+01] this point is shifting for each technology generation in the directionof higher volumes.

Figure 1.4: This figure from [RBD+01] shows a comparison of the total costs ofFPGAs and ASICs in different technology generations over the produc-tion volume. With every new technology generation, the break evenpoint between the total costs of FPGAs and ASICs designs is shiftedmore and more to the ASIC side. As on implication, one may expectthe market for FPGAs to grow.

Nevertheless, besides the total costs discussion, there exist many design solutions,especially in the area of embedded systems, which can only be implemented using

7

1. Motivation

ASIC technology. Examples include very low power designs and high performancedesigns.

Before summarizing the major contributions of the thesis with respect to the abovetopic, a set of definitions is in order.

8

2Definitions

In this section, we introduce necessary definitions of terms with respect to securityand reliability of embedded systems that will be throughout this thesis. First, defi-nitions in the field of dependability and the difference between defects, faults, anderrors are outlined. After the categorization of faults and errors, definitions stem-ming from the area of security attacks are presented. Finally, different types of over-head, which are indispensable for additional security and reliability functions, aredescribed.

2.1 Dependability and its Attributes

The dependability of a system is defined by the IFIP 10.4 Working Group on De-pendable Computing and Fault Tolerance as: “... the trustworthiness of a comput-ing system which allows reliance to be justifiably placed on the service it delivers...” [IFI90]. According to Laprie and others [ALR01], the concept of dependabil-ity consists of three parts: the threats to, the attributes of, and the means by whichdependability is attained (see Figure 2.1).

The attributes of dependability are a way to assess the trustworthiness of a sys-tem. The attributes are: availability, reliability, safety, confidentially, integrity, andmaintainability.

2.1.1 Availability

The availability is considered as the readiness for correct service [ALR01]. Thismeans that the availability is a degree of the possibility to start a new function ortask of the system. Usually, the availability is given in the percentage of time thata system is able of serving its intended function and can be calculated using thefollowing formula:

Availability =Total Elapsed Time−Down Time

Total Elapsed Time(2.1)

9

2. Definitions

����������� ����

������

�����

�����

��������

���������

������������

�������

������������

�������������

���������� ���

����������

������������

������

����

���

Figure 2.1: The relationship of dependability between attributes, threats and means[ALR01].

Availability is also often measured in “nines”. Two nines means an availability of99%, three nines means 99.9% and so on. Table 2.1 shows the maximal downtimewithin a year for different availability values.

Availability Percentage 8-hour day 24-hour dayTwo nines 99% 29.22 hours 87.66 hours

Three nines 99.9% 2.922 hours 8.766 hoursFour nines 99.99% 17.53 mins 52.60 minsFive nines 99.999% 1.753 mins 5.260 minsSix nines 99.9999% 10.52 secs 31.56 secs

Table 2.1: The maximal annual downtime of a system for different values of avail-ability, running either 8 hours or 24 hours per day [Rag06].

2.1.2 Reliability

Reliability is defined as the ability of a system or component to perform its requiredfunctions under well-defined conditions for a specified time period [Ger91]. Laprieand others transcribe the reliability with the continuity of correct service [ALR01].Important parameters of reliability are the failure rate and its inversion, the MTTF

10

2.1 Dependability and its Attributes

(mean time to failure). Other parameters, like the MTBF (mean time between fail-ures) include the time which is necessary to repair the system. The MTBF is the sumof MTTF and the MTTR (mean time to repair).

2.1.3 SafetySafety is the attribute of a safe system. This means that the system cannot lead tocatastrophic consequences for the users or the environment. Safety is relative, theelimination of all possible risks is usually impossible. Furthermore, the safety ofa system cannot be measured directly. It is rather a subjective confidence of thesystem. Whereas availability and reliability avoid all failures, safety avoids only thecatastrophic failures, which are only a small subset.

2.1.4 ConfidentiallyThe confidentially of a system describes the absence of unauthorized disclosure ofinformation. The International Organization of Standardization (ISO) defines theconfidentially as “ensuring that information is accessible only to those authorizedto have access” [ISO05]. In many embedded systems (e.g., cryptographic systems),it is very important to secure the information (e.g., the secure key) stored inside thesystem against unauthorized access. But also the prevention of unlicensed usage ofsoftware programs or hardware cores are topics of confidentially. Confidentially is,like safety, subjective and cannot be measured directly.

2.1.5 IntegrityIntegrity is the absence of improper system state alternation. This alternation canbe an unauthorized access to alter system information inside the system, which arenecessary for the correctness of the system. Furthermore, the system state alternationcan also be a damage or modification of the system. System integrity assures that nopart of the system (software or hardware) can be altered without the necessary privi-leges. Also, the IP core verification to ensure the correct creator and the absence ofunauthorized supplementary changes can elevate the integrity of a system. Integrityis the precondition for availability, reliability and safety [ALR01].

2.1.6 MaintainabilityMaintainability is the ability to undergo repairs and modifications. This can be doneto repair errors, meet new requirements, make further maintenance easier, or to copewith a changed requirement or environment. A system with a high maintainabilitymay have a good documentation, a modular structure, is parameterizable, uses asser-tions and implements built-in self tests.

11

2. Definitions

2.1.7 Security

Security is defined as a combination of the attributes (1) confidentially (the preven-tion of the unauthorized disclosure of information), (2) integrity (the prevention ofthe unauthorized amendment or deletion of information), and (3) availability (theprevention of the unauthorized withholding of information) [ITS91]. An alternativedefinition for security could be the absence of unauthorized access to the system state[ALR01]. The prevention or detection of the usage of unlicensed software or IP corescan also be seen as a security aspect (confidentially) as well as the prevention of theunauthorized alteration of software or IP cores (integrity). Like safety, security shallprevent only a class of failures which are caused by unauthorized access or unautho-rized handling of information.

2.2 Fault, Error, Failure

Faults, errors, and failures are the threats which affect the dependability (see Figure2.1).

2.2.1 Failure

A system is typically composed of different components. Each component can befurther subdivided into other components. All of these system components may haveinternal states. If a system delivers its intended function, then the system is workingcorrectly. The intended function of a system can be described as an input/outputor interface specification which defines the behavior of the system on the systemboundaries with its users or other systems.

The system interface specification may not be complete. For example, it is spec-ified that an event occurs on the output of the system, but the time of this event tooccur is not exactly specified. So, the system behavior can vary without violating thespecification. If the specification is violated, the system fails. A failure is an eventwhich occurs when the system deviates from its interface specification (see Figure2.2).

2.2.2 Errors

If the internal state of a component deviates from the specification (the specificationof the states of the component), the component is erroneous and an error occurs.An error is an unintended internal state whereas a failure is an unintended interfacebehavior of the system. An error may lead to a failure. But it is also possible thatan error occurs and does not lead to a system failure, because of the component iscurrently not used or the error is detected and corrected fast enough. Errors can be

12

2.2 Fault, Error, Failure

������

���������������

� � �� ���������

������������

� ���� ������

� � �� ���� ���������������

�� ����

Figure 2.2: Faults may lead to an error, which may also lead to a system failure.

transient or permanent. Transient errors caused by transient faults usually occur insystems without feedback. In systems with feedback, an error might be permanentby affecting all following states. In this case, the error only disappears by a reset orby shut down of the system.

2.2.3 Faults

A fault is defined as a defect that has the potential to cause an error [Jal94]. All errorsare caused by faults, but a fault may not lead to an error. In the latter case, the faultis masked out and has no impact on the system.

For example, consider the control path of a processor core. A fault like a singleevent transient fault, caused by an alpha particle impact, occurs on one signal of theprogram counter between two pipeline stages. If the time of occurrence is near therising active clock edge, an error may occur. Otherwise, if the time of occurrenceis far away form the rising edge of the clock, the fault does not lead to an error.The erroneous program counter value can now lead to a system failure, if the wrongsubroutine is executed and the interface behavior differs from the specification. Oth-erwise, if an error detection technique, like a control flow checker, as introduced laterin Chapter 4.7.3, is used, the error can be detected after the fault appearance, and theerror may be corrected by a re-execution of the corresponding instruction. But, thisadditional re-execution needs several clock cycles to restore the error free state. Forreal-time systems with very critical timing requirements, the possible output eventsmight be too late and the system thus might still fail.

13

2. Definitions

2.3 Means to Attain Dependability

Means are ways to increase the dependability of a system. There exist four means,namely fault prevention, fault tolerance, fault removal, and fault forecasting.

2.3.1 Fault Prevention

Fault prevention deals with the question “How the occurrence or introduction of faultscan be prevented?”. Design fault might be prevented with quality control techniquesduring the development and manufacturing of the software and hardware of a system.Fault prevention is further closely related to maintainability. Transient faults, likesingle event effects, might be reduced by shielding, radiation hardening, or largerstructure sizes. Attacks might be prevented by security measures, like firewalls oruser authentication. To prevent the usage of unlicensed programs or IP cores, the code(source, binary, or netlist code) could be delivered encrypted and only the authorizedcustomer has the right cryptographic key to decrypt the code. To prevent the impartof the key, techniques like dongles or an authentication with MAC-Address can beused.

2.3.2 Fault Tolerance

A fault-tolerant system does not fail, even if an erroneous state is reached. Faulttolerance enables a system to continue operation in the case of one or more errors.This is usually implemented by error detection and system recovery to an error-freestate. In a fault tolerant system, errors may occur, but they must be handled correctlyto prevent a system failure.

The first step towards a fault tolerant system is error detection. Error detectioncan be subdivided into two classes: concurrent error detection and preemptive errordetection [ALR01]. Concurrent error detection takes place at runtime during theservice delivery, whereas preemptive error detection runs in phases where the servicedelivery is suspended. Examples for concurrent error detection are error codes (e.g.parity or CRC), control flow checking, or razor flip-flops [ABMF04].

Also, redundancy belongs to this class of error detection. One may distinguishthree types of redundancy: hardware, time and information redundancy. To detecterrors with hardware redundancy, we need at least two units where both results arefinally compared. If they divert, an error occurred. On time redundancy, the systemexecutes the same inputs twice, and both results are compared after the second ex-ecution. Information redundancy uses additional information to detect errors (e.g.,parity bits).

BISTs (Built In Self Tests) or start-up checks belong to the preemptive error detec-tion class.

14

2.4 Security Flaws and Attacks

The next step is the recovery from the erroneous state. Recovery consists of twosteps, namely error handling and fault handling. Error handling is usually accom-plished by rollback or rollforward. Rollback is done by using error-free states whichare stored on certain checkpoints to restore the state of the system to an older error-free state. Rollback is attended by delaying the operation. This might be a problemin case of real-time applications. Rollforward uses a new error-free state to recoverthe system.

If the cause of the error is a permanent or periodic temporal fault, we need faulthandling to prevent the system from running into the same error state repeatedly. Thisis usually done by fault diagnosis, fault isolation, system reconfiguration and systemreinitialization. For example, in case at permanent errors in memory structures, thefaulty memory column is identified and this column is switched to a reserved sparecolumn. After the switch over, the column content must be reinitialized.

It is important to notice that fault tolerance is a recursive concept. The techniquesand methods which provide fault tolerance should obviously themselves be resistantagainst faults. This can, for example, be done by means of replication.

2.3.3 Fault Removal

During the development phase and during the operational runtime, fault removalmight be performed. At the development phase, fault removal consists of the fol-lowing steps: verification, diagnostics, and correction [ALR01]. This is usually doneby debugging and/or simulation of software and hardware. For the verification offault tolerant systems, fault injection can be used.

Fault removal during the operational runtime is usually done by maintenance.Here, faults can be removed by software updates or by the replacement of faultysystem parts.

2.3.4 Fault Forecasting

Fault forecasting predicts feasible faults to prevent or avoid the fault or decreasethe effect of the fault. This can be done by performing an evaluation of the systembehavior with respect to fault occurrence and effects. Modeling and simulation of thesystem and faults are a common way to achieve this evaluation.

2.4 Security Flaws and Attacks

Faults affecting the security of a system are also called flaws [LBMC94]. In thiswork, the term flaw is used as a synonym of a fault, which leads to the degenerationof the security of a system. A flaw is therefore a weakness of the system which couldbe exploited to alter the system state (error). A threat is a potential event which might

15

2. Definitions

lead to this alternation and therefore to a security failure. The process of exploitingthe flaw by a threat is called an attack (see Figure 2.3). A security failure occurs whena security goal is violated. The main security goals are the dependability attributesintegrity, availability, and confidentially. The difference between a flaw and a threatis that a flaw is a system characteristic, whereas a threat is an external event.

������

���������� �����

��������� ������

����

��� ���� ��

����� �������

��������� ��������������

����� ���� ����

������

����

����

Figure 2.3: Flaws are security faults, which lead to errors if they are exploited byattacks. The state alternation in case of an attack may lead to a securityfailure.

A flaw can be intentional or inadvertent. Intentional flaws can further be mali-cious or non-malicious. An intentional malicious flaw is, for example, a trojan horse[And72]. An intentional non-malicious flaw could be a communication path in acomputer system which is not intended as such by the system designer [LBMC94].An inadvertent flaw could be, for example, a bug in a software program, which en-ables unauthorized persons with specific attacks to read protected data.

Like other faults, flaws can also be further categorized using the origin of the flawand the persistence. The origin can be during the development (e.g., the developerimplement a back door to the system), or during operation or maintenance. Usually,the flaws exist for a longer period of time (e.g., from the flaw arise until the flaw isdisappeared by a security update). But also special flaws exists, which only appear oncertain situations (e.g., the year 2000 problem; switching from year 1999 to 2000).

Attacks can be classified using the security goals or objective of the attack intointegrity, availability, and confidentially attacks. Integrity attacks break into the sys-tem and change part or the whole system (software or hardware) or of the data. Thegoal of availability attacks is to make a part or the whole system unavailable for user

16

2.5 Overhead

requests. Finally, the goal of confidentially attacks is to gather sensitive or protecteddata from the system.

Furthermore, if an attack is successful, new flaws can be generated as a result fromthe attack. For example, a flaw in software is exploited by a code injection attack (seeSection 3) and the integrity of the system is injured by adding a malicious softwareroutine. This software routine opens now intentional malicious flaws, which can beused by confidentially attacks to gather sensitive data.

To describe all attacks using this terminology is not easy. For example, a copyrightinfringement where someone unauthorized is copying an IP core. The result of theattack is a reversal of confidentially. Here, the sensitive data is the core itself. Theerroneous state is the unauthorized copy of the IP core. But what is the flaw whichmakes the attack possible? Here, we must assume that the ability to easily copy an IPcore is the flaw. This example teaches us that flaws exist even on reasonably securesystems and cannot be easily removed. On every system we must deal with flaws,which might affect the security as well as other faults which might affect the otherareas of dependability.

2.5 OverheadMethods for increasing security and reliability in embedded systems often have thedrawback of additional overhead. To evaluate the additional costs of these methods,we can use the following criteria:

• Area overhead (hardware cost overhead),

• Memory overhead, and

• Execution time overhead (CPU time).

Analysis and quantification of the additional costs significantly depends on thegiven architecture and the implementation on a specific target technology.

2.5.1 Area Overhead

The straightforward method for measuring the area overhead of an additional core isto measure the chip area occupied by the core. Unfortunately, this method can onlycompare cores implemented in the same technology with exactly the same process(lateral dimensions). To become independent of this process, the transistor countmay be used. However, information about the area of the signal routing is not in-cluded here. In most of the technologies and processes, signal routing requires littleadditional area because of the routing layers are above the transistors (in the thirddimension). This also depends on the specific process.

17

2. Definitions

The number of transistors, however, is a reasonable complexity indicator, only ifthe compared cores use the same technology (e.g., CMOS, bipolar). To compare thehardware costs of a core mapped onto different technologies, the gate count can beused. Here, the number of logical (two input) gates is used to describe the hardwarecosts. For comparing cores between ASIC and FPGA technologies, the count ofprimitive components or operations, like n-bit adders or multipliers, can be used.

Using primitive components or operations for the description of the overhead, oneis independent of the underlying technology and process. Describing hardware costin a higher hierarchy level, like primitive components or operations, however, is moreinaccurate with respect to the real hardware costs than describing the overhead in chiparea. The resulting chip area of the used primitive components depends highly on thetechnology and process and also on the knowledge of the chip designer or the qualityof the tools.

2.5.2 Memory Overhead

The memory overhead for methods increasing the security and reliability can be mea-sured by counting the additional ram bits used by the corresponding method. Mem-ories embedded on the chip, so called internal memories, use hardware resources onthe chip and so they contribute to the area overhead. Nevertheless, the content ofmemories can be relatively easily shifted to a cheaper external memory, for examplean off-chip system DRAM. So, we decided to handle the memory overhead sepa-rately. It must be taken into account that internal memory has higher hardware costsat the same size, but a lower latency. External memory is usually cheaper, but it in-volves additional hardware costs on the chip, as for example a DRAM controller. If aDRAM with the corresponding controller already exists on the chip, these resourcesmight be shared to reduce the hardware cost.

2.5.3 Execution Time Overhead

Some methods for increasing the security and reliability have additional latency. Thismeans that the result of the protected core or software appears later on the outputsthan on the unprotected one. For hardware cores, latency is usually counted in ad-ditional clock cycles. For software programs, latency can be expressed in additionalinstructions which must be executed by the processor or in additional execution timeof the processor. For example, some existing methods for control flow checking[GRRV03] generate additional instructions that are inserted into the original programrunning on the processor which is monitored. This might cause a timing impact forthe user program which impact can be measured by additional execution time of theprocessor. The execution time depends on the processor and the number of executedadditional instructions.

18

2.6 IP Cores and Design Flow

Also, if no additional software is executed on the processor and the processor isenhanced with additional hardware, some methods can stall [ZT09] the processorpipeline, slow down the execution of the user program, or insert additional pipelinesteps [Aus99] without executing additional instructions.

For processor architectures, the execution time overhead can be measured by count-ing the additional pipeline steps. If the processor architecture executes one instruc-tion in one pipeline step (in the best case one clock cycle), the number of additionalexecuted instructions are also given in the number of additional pipeline steps.

2.6 IP Cores and Design Flow

The reuse of IP cores is an important step to decrease the productivity gap, whichemerges from the rapid increase of the chip complexity and the slower growth of thedesign productivity. Today, there is a huge market and repertoire of IP cores whichcan be seen in special aggregation web sites, for example [Reu] and [Est], whichadministrate IP core catalogs.

The delivery format of IP cores is closely related to the design flow. The designflow consists of different design flow or abstraction levels which transfer the descrip-tion of the core from the level where the core is specified into the final implementa-tion. The design flow is dependent from the target technology. The FPGA and theASIC design flow look similar, however, there exist differences at some steps.

Figure 2.4 shows a general design flow for electronic designs with FPGA and ASICtarget technologies. This design flow view can be embedded into a higher systemmodel view for hardware/software co-design, for example the double roof model in-troduced by Teich [TH07]. The depicted design flow implements the logic synthesisand the following steps in the double roof model. Furthermore, the different abstrac-tion levels are derived from the Y-diagram, introduced by Gaijski [GDWL92].

The different abstraction levels are the register-transfer level, the logic level, aswell as the device level. Designs specified at the register-transfer level (RTL) are usu-ally described in Hardware Description Languages (HDLs) like VHDL or Verilog,whereas designs at the logic level are usually represented in netlists, for example,Electronic Design Interchange Format (EDIF) netlists. At the device level, FPGAdesigns are implemented into bitfiles, while ASIC designs are usually represented bytheir chip layout description. The transformation of an HDL-model into an netlist-model is called logic synthesis, whereas the transformation of a netlist-model into atarget depended circuit is called implementation. The implementation consists of theaggregation of the different netlist cores and subsequent place and route step. Thetechnology mapping can be done in the synthesis or in the implementation step, orin both. For example, the Xilinx FPGA design flow maps the logic to device de-pendent primitive cells (LUT2, FF, etc.) in the synthesis step, whereas the mapping

19

2. Definitions

��������������� ��������������

�� ��������

�����������

�������

������������

���������� �������

��������������������������� ����� ����������������

Figure 2.4: A general design flow for FPGA and ASIC designs with the synthesisand implementation steps and the different abstraction levels.

of these primitive cells to slices and configurable logic blocks (CLBs) is done in theimplementation step [Xilb].

IP cores can be delivered at all different abstraction levels in the correspondingformat: on the RTL as VHDL or Verilog code, on logic level as EDIF netlist, oron the device level as mask files for the ASIC flow or as FPGA depended (partial)bitfiles.

IP cores can be further categorized into soft and hard cores. Hard cores are readyto use and are offered into a target depended layout or bitfile. All IP cores which aredelivered into an HDL or netlist format belongs to the soft cores. These cores needfurther transformations of the design flow to be usable. The advantages of soft coresare their flexibility for different target technologies and their can be parameterizable.However, the timing and the area overhead are less predictable compared to hardcores due the impact of the needed design tools. Analog or mixed signal IP cores areusually delivered as hard cores.

20

3Attacks on Embedded Systems

There exist two ways for categorization of attacks. The first way is to categorize at-tacks by the violated security goals. The other way is to describe how the attackis realized and which way the attacker chose to compromise the system [Rag06,RRKH04].

Using the first categorization schema, the main security goals are integrity, avail-ability, and confidentially (see Figure 3.1 above, and Section 2.4). Attacks whichcompromise integrity can be further subdivided into manipulation of data, manipu-lation of software or IP cores, as well as forging of authorship. Attacks which mayparalyze a system compromise the availability. Attacks to compromise the confiden-tially of a system can be subdivided into gathering of sensitive data like passwords,keys, program code, or IP cores, and getting access control to a system. Additionally,copyright infringement compromises the confidentiality of the author of the core.

The means used to launch the attacks or the ways how the attack is realized canbe categorized into invasive and non-invasive attacks (see Figure 3.1 below). Bothgroups can further be subdivided into logical and physical attacks [RRKH04]. Phys-ical attacks typically require relatively expensive equipment and infrastructure, espe-cially physical invasive attacks. Whereas for logical attacks, usually only a computeror the embedded system is needed.

3.1 Code Injection AttacksCode injection assaults are attacks where the code integrity of a system is compro-mised. This can be the integrity of software as well the integrity of executed bitfilesin a reconfigurable system, such as FPGAs. The goals of code injection attacks aremanifolded. The demolished integrity of further program code or sensitive data, theparalysis of the system as well as getting access to sensitive data are in the foregroundof the attacker.

Code injection attacks bring the system under the control of the attacker. Programsinserted by the attacker, may easily read or alter the sensitive data and forward thedata to interfaces where the data can be collected.

To gain control over a system, the attacker must first insert a routine, which per-forms the intended behavior, into memory. This routine may, for example, read out

21

3. Attacks on Embedded Systems

���������

������������� ����

����������������

�������������������

��������������

��� ��� ���

����������������

����������� ���

����������������������

��������������������

��� �����!!��������

�������������������

������� �����

∀�������� ���

��������������� ������

��!���#���

∃������%����������

��������

������������������

������

�������

�������������!���

�����&

��∋�(����!∋�

�������������������

������ %�����������

����)��������

���!∋�

������

���������������

��������

����������

��� ��

����� ��

��� ��

����� ��

������������������� �� �

!������ �����������

Figure 3.1: Attacks can be categorized with the compromised security goals or theattack goals (above) and with the means to launch the attack (below).The different means of attacks can invalidate different security goals.

22

3.1 Code Injection Attacks

secured data, deactivate security protection, open gateways for the attackers, or loadanother infiltrated code from the Internet. The malicious code can be inside the pro-cessed input data which is loaded into the memory by the processor. The second stepis bringing the processor in a state to execute the inserted attacker’s code. This canbe done by manipulation of the program flow.

One way to achieve this is by utilizing buffer overflows for smashing stacks. Mostprograms are structured into subroutines with its own local variables. These variablesand also the arguments and the return address are stored in memory segments calledstacks. The return address is usually the first on the stack and the local variablesare concatenated on the bottom. Normally, like in the C programming language,the content of array variables are written from bottom to the top, and if the rangeis not checked, the return address can be overwritten (see Figure 3.2). The attackercan manipulate the input data in a way that the return address is overwritten withthe address of his malicious code. On the return, the malicious code is executed[Ale96, PB04]. Another possibility is to overwrite the frame pointer address insteadof the return address [klo99].

������������ ���������������

����

���������

������������

�������������� �����������������

Figure 3.2: On the left side, a part of the program memory is shown. Normally,the subroutine is called and after its execution, the program counterjumps back to the main program after the call instruction. However,if the return address in the stack is overwritten by a buffer overflow ofthe vector a[] (see right side), the erroneous return destination maybecome the entry point of the malicious code (dashed line).

Heap-based buffer overflows are another class of code injection attacks. The mem-ory heap is the dynamically allocated memory, in C managed by malloc() andfree(), in C++ by new() and delete(). The heap consists of many memory

23

3. Attacks on Embedded Systems

blocks which are usually chained together by a double linked list. These memoryblocks are managed by the dynamic memory allocator, which can insert, unlink ormerge blocks together. The information (pointer to the previous and next block) ofthe linked lists is stored in a header for each block.

A heap-based buffer overflow may overwrite this header information in a way thatone pointer of the double linked list points to the stack segment before the returnaddress [Rag06]. If this block is now freed by the dynamic memory allocator, theheader information of the previous and next block are updated. Because one pointerpoints to the stack segment due to the attack, the stack is updated in a way that thereturn address is overwritten with the address of a heap block, which can now beexecuted after the control flow reaches a return [Rag06, PB04]. There exist manyother different possibilities to utilize heap-based buffer overflows [Con99, Dob03].

Arc injection or return-into-libc is an attack where a system function is exploitedto spawn a new process which performs the attacker’s desired operations. The namearc injection came from the inserting a new arc (control flow transfer) into the controlflow graph (CFG) of a program. In the standard C library (libc on UNIX-based sys-tems), there exists a function called system(). This function validates a processcall given as argument and after successful validation starts its execution as a newthread. The memory location of this standard function is usually known, and there-fore also the starting address to bypass the validation of the argument. The returnpointer of the stack can now be manipulated by using a stack-based buffer overflowto jump to the desired destination in the system function to execute a malicious pro-cess. The name of the malicious process can be transferred to the system function byusing registers [PB04]. This attack is useful if the architecture or operating systemprevents the stack or heap memory area from execution.

Shacham generalized the return-into-libc attacks to show that it is possible to domalicious computation without injecting malicious code [Sha07]. The idea is thatdue to shared libraries, e.g., libc, many analyzable and attackers known instructionsnippets are in the memory. Shacham proposes that an attacker can build an arbitraryprogram from these snippets which can do arbitrary computation. This can be doneby analyzing, for example, the libc library for code snippets which end with a returninstruction. Moreover, Shacham shows that for the x86 architecture, it is possibleto use only parts of instructions. The return instruction of the x86 architecture isa one byte instruction encoded with 0xc3. However, other instructions which arelonger consist also of this byte value. By starting the sequence in the middle of aninstruction, the original instruction alignment is bypassed which enables the attackerthe usage of additional new instruction sequences. From these building block, theattacker can build a program by chaining these snippets together by overwriting theregister which stores the return address. This so-called return-oriented programminghas been successfully transferred to other processor architectures, e.g., SPARC. In[BRSS08], a compiler is introduced which is able to construct return-oriented exploits

24

3.2 Invasive Physical Attacks

from a general propose language. In summary, Shacham shows that preventing theinjection of code is not sufficient for preventing malicious computation.

Pointer subterfuge is an attack technique where pointer values are changed. Thereexist four varieties: function pointer clobbering, data pointer manipulation, excep-tion handler hijacking and virtual pointer smashing [PB04].

Function pointer clobbering modifies a function pointer so that the pointer directsto the malicious code. When the control flow reaches the modified function call, theattacker’s function is called and his code is executed.

Data pointer modifications can be used for arbitrary memory writes. This tech-nique can be combined with other code injection attacks to launch complex attacks.

Exception handler hijacking modifies the thread environment block (in MS Win-dows) that points to the list of registered exception handler functions. Because ofthe fact that the list is stored on the stack, the entries can be easily manipulated toutilize stack based buffer overflows. This technique can be put to work to transfer thecontrol flow to a malicious function. Within Linux, function pointers in the fnlist canbe replaced to have a similar effect.

Virtual pointer smashing replaces the virtual function table used in the C++ imple-mentation of virtual functions. The virtual function table is used in C++ at runtime toimplement dynamic dispatch. Every C++ object has a virtual pointer, which pointsto the appropriate virtual function table. By modifying the virtual pointer to direct toan attacker’s virtual function table, malicious functions can be called when the nextvirtual function is invoked.

3.2 Invasive Physical Attacks

Invasive physical attacks physically tamper the embedded system with special equip-ment. Trivial physical attacks only compromise the availability of the system ordamage a part or the whole system by physical destruction. Also, switching off thepower supply voltage or cutting wires belongs to these trivial attacks.

Other invasive physical attacks aim to read out confidential data or the implemen-tation of IP cores as well as the manipulation of the circuit or data to get access tosensitive data. These attacks have in common that expensive special equipment isused. The realization of these attacks requires days or weeks in specialized labora-tories. The first step is the de-packing of the circuit chips. This is usually done withspecial acids [KK99, Hag].

After removing the packaging, the layout of the circuit can be discovered with op-tical microscopes and cameras. By removing each metal layer the complete layoutof the chip can be captured in a map [KK99]. The gathered informations of the layerreconstruction can be used for reverse engineering the circuit, which gives competi-tors the possibility to optimize their product or to obtain more information about theimplementation to launch other attacks.

25

3. Attacks on Embedded Systems

Further information can be collected by micro-probing the circuit. This can bedone by manual micro-probing, where metal probes have electrical contact to signalwires on chip. This is usually done on a micro-probing workstation with an opticalmicroscope [KK99].

Due to the decreased lateral structure dimensions in todays circuit technologies,manual micro-probing is nearly impossible. But there exist advanced technologies,like ion or electron beams, as well as infrared laser which make micro-probing alsopossible in todays chip manufacturing technologies. With a focused ion beam (FIB)the chip structure can be scanned in a very high resolution. The beam hits the chipsurface where electrons and ions are sputtered off and can be detected by a mass spec-trometer [DMW98]. With increased beam intensity, the beam can also remove chipmaterial with high resolution (see Figure 3.3). This can be used to cut signal wires ordrill holes to reach signal wires in underlying layers. These holes can be filled withplatinum to bring the signal to the surface, where it can be easily micro-probed. Withan electron beam tester, the activity on the signal wires can be recorded, if the clockfrequency is drastically reduced (under 100 kHz). Finally, with the infrared laser,the chip can be scanned from rear, because the silicon substrate is transparent in theinfrared wavelength range. With the photon current, internal states of transistors oractivity on signal wires can be read out [AK96, Ajl95].

Figure 3.3: A secondary electron image recorded with a focused ion beam (FIB).The FIB previously interrupts a signal wire [Fra].

These advanced technologies can be used to launch a variety of attacks. In focusare smart cards with implemented cryptographic algorithms. Most of the time, it isthe attackers goal to read a secret key. One example is to read out the memory content

26

3.3 Non-Invasive Logical Attacks

using bus probing. The problem here is the generation of the successive addressesto get a linear memory trace. The attacker can bypass the software by destroyingand deactivating the transistor gates which are responsible for branches and jumpswith an FIB. The result is a program counter with can only linearly count up, whichfits perfectly for this attack [KK99]. Other attacks are reading ROM, reviving andusing test modes, ROM overwriting by using a laser cutter, EEPROM overwriting,key retrieval using gate destruction, memory remanence, or probing single bus bits,as well as EEPROM alternation [KK99, Hag].

3.3 Non-Invasive Logical Attacks

To the non-invasive logical attacks belong the following attacks: phishing, authentic-ity forging, attacking cryptographic weaknesses, and copying. The goal of phishingis to gather sensitive information, like passwords, credit card numbers, or whole iden-tities. By means of social engineering, such as fake web sites, emails, or instant mas-sages to imitate a trustworthy person. The victim gives sensitive data away, believingthat the attacker is not a harmful person. Phishing belongs to the authenticity forgingattacks. Other authenticity forging attacks are DNS or certificate spoofing (manipu-lation). The difference to phishing is that systems and not persons are cheated.

Cryptographic attacks exploit weaknesses of cryptographic algorithms, e.g., sym-metric ciphers, asymmetric ciphers, or hashing algorithms as well as protocol stacks.The goals of these attacks are access to sensitive data or to break into a system. Moreabout cryptographic attacks can be found in [FS03] or [RRKH04].

Finally, copying attacks are attacks were sensitive data, like health data, personaldata and works, which are protected by copyright, such as music, texts, programs, orIP cores, are copied without authorization. These attacks, especially the gathering ofsensitive data and copyright infringement, target the security goal confidentially.

3.4 Non-Invasive Physical Attacks

Eavesdropping and side-channel attacks belong to the class of non-invasive physicalattacks which normally do not impair the system. Eavesdropping is the interceptionof conversations or data transmissions by unintended recipients. The attacker cangather sensitive information which is transmitted using electric media, e.g., email,instant messenger, or telephone. Sometimes a combination of eavesdropping andcryptographic weakness attacks are used to monitor sensitive data. For example,sniffing passwords in a WEP (Wired Equivalent Privacy) encrypted WLAN (WirelessLocal Area Network).

Information of cryptographic operations in embedded systems can be gathered byside-channel attacks. Usually, the goal is to get the secret key or information about

27

3. Attacks on Embedded Systems

the implementation of the cryptographic algorithm. Cryptographic embedded sys-tems are particularly vulnerable to these attacks, because the attacker has full controlover the power and clock supply lines. The different side-channel attacks are timinganalysis, power analysis, electromagnetic analysis, fault analysis, and glitch analysis[Hag, KLMR04, RRKH04].

Timing analysis attacks are based on the correlation of output data timing behav-ior and internal data values. Kocher [Koc96] showed that it is possible to determinesecret keys by analyzing small variations in the execution time of cryptographic com-putations. Different key bit values cause different execution time, which makes a readout and reconstruction of the key possible.

Power analysis attacks are based on the fact that different instructions cause varia-tions in the activities on the signal lines, which result in different power consumption.The power consumption can be easily measured by observing the current or the volt-age on a shunt resistor. With simple power analysis (SPA) [KJJ99], the measuredpower consumption is directly interpreted to the different operations in a crypto-graphic algorithm. With this technique, program parts in a microprocessor, for ex-ample DES rounds or RSA operations, can be identified. Because of the executionof these program parts depend on a key bit, the key bits can be read out. Differentialpower analysis (DPA) [KJJ99] is an enhanced method which uses statistical analy-sis, error correction and correlation techniques to extract exact information about thesecret key.

Similar to power analysis techniques, information about the key, data, or the cryp-tographic algorithm or implementation can be extracted by electromagnetic radiationanalysis [AARR03].

During different fault analysis (DFA) attacks, the system is exposed to harsh en-vironment conditions, like heat, vibrations, pressure, or radiation, to enforce faultswhich result in an erroneous output. Comparing this output to the correct one, the an-alyst gains insight into the implementation of the cryptographic algorithms as well asthe secret key. With DFA attacks, it was possible to extract the key from a DES imple-mentation [BS97] as well as public key algorithm implementations [BDH+98, BS97].The last one shows that a single bit fault can cause fatal leakage of information whichcan be used to extract the key. Pellegrini and others show that using fault analysis,where the supply voltage of an processor is lowered, it is possible to reconstruct sev-eral bits from a secret key of the RSA cryptographic algorithm [PBA10]. For thisattack, they used the RSA implementation of the common OpenSSL cryptographiclibrary and a SPARC Leon3 core, implemented on a Xilinx Virtex-II Pro FPGA. Thesupply voltage of the FPGA is lowered till sporadic bit errors occur on the calculationof the signature using the FPGA’s hardcore multiplier.

Glitch attacks also belong to the class of DFA attacks. Here, additional glitchesare inserted into the clock signal to prevent the registering of signal values on thecritical path. The simplest attack is to increase the clock frequency. One goal of glitchattacks can be the prevention of branches in a microprocessor program, because of the

28

3.4 Non-Invasive Physical Attacks

calculation and registering of the new branch target address is a long combinatorialpath on many processor implementations [KK99].

29

3. Attacks on Embedded Systems

30

4Defenses Against Code Injection

Attacks

In this section, we show measures against different code injection attacks, as intro-duced in Section 3.1. A good overview of defenses against code injection attacksis further given in [Rag06] and [Erl07]. The related work in this section is dividedinto six groups: Methods using an additional return stack, software encryption, safelanguages, code analyzer, anomaly detection, as well as compiler, library and oper-ation system support. Control flow checking methods, which combine security andreliability issues are discussed in Section 4.7.

4.1 Methods using an Additional Return Stack

Almost all code injection attacks manipulate the memory-based return stack. The re-turn stack can be protected by an additional hardware-based return stack. Hardware-based return stacks are usually used for indirect branch prediction [KE91]. Xu andothers propose a secure return address stack (SRAS) which redundantly stores a copyof the memory-based stack [XKPI02]. If the return address from the SRAS differswith the processor-calculated address from the memory return stack, an exception israised which is handled by the operation system to determine whether it stems froma misprediction or from a code injection attack.

Lee and others propose a similar SRAS approach [LKMS04, MKSL03]. Addi-tionally, scenarios are considered when the control flow of a return from subroutinedoes not comes back to the point the function was called from. Lee suggests either toprevent these situations or to introduce additional instructions which manipulate theSARS to manually resolve such situations.

Ozdoganoglu and others [OVB+06] present another SRAS method called Smash-Guard. In some situations, the behavior of the correct control flow differs from thelast-in first-out (LIFO) return stack scheme. In these situations which are often re-ferred to as setjmp or longjmp calls, the control flow mostly returns to a previous callwhich is deeper in the stack. Ozdoganoglu resolves these situations by searching thetarget address of the currently executed return instruction in the complete stack.

31

4. Defenses Against Code Injection Attacks

Furthermore, Xu and others propose methods to divide the stack into a control anda data stack in [XKPI02]. Inside the control stack, the return addresses and stackpointers and inside the data stack variables, e.g., buffers are stored. This approacheffectively solves the problem of buffer overflows. To achieve the stack split, Xupresents two different techniques, one modifies the compiler and the other is a hard-ware technique which modifies the processor.

4.2 Methods using Address Obfuscation andSoftware Encryption

Bhatkar and others [BDS03] and Xu and others [XKI03] propose methods for addressobfuscation. To exploit buffer overflows and achieve the execution of malicious code,the attacker must know the memory layout. Due to address obfuscation, the achieve-ment of such information about the memory structure is enormously complicated forthe attacker. In these methods, the program code is modified so that each time thecode is executed, the virtual addresses of the code and data are randomized. Theseapproaches randomize the base address of the stack and heap, the starting address ofthe dynamic linked library, as well as the location of static data. Also, the order oflocal and static variables as well as the functions are permuted. For objects whichcannot be rearranged, Bhatkar inserts random gaps by padding, e.g., in stack frames.

Shao and others proposed hardware assisted protection against function pointerand buffer smashing attacks [SZHS03, SXZ+04]. The function pointers are XORedwith a randomly assigned key for each process which is hard to be reconstructed bythe attacker. This is a countermeasure for function pointer clobbering (see Section3.1). Furthermore, Shao introduces a hardware-based boundary checking method toavoid stack smashing. On each memory write, it is checked if the write destination isoutside the current stack frame and if so, an exception is raised.

If the software is loaded encrypted into the memory and decrypted in the fetchstage, code injection attacks are impossible, because the attacker needs to inject hiscode encrypted. The key for de- and encryption is different for each process, hence itis impossible for the attacker to encrypt his code properly. Injection of unencryptedcode produces data garbage after decryption and results in a crash of the process.Barrantes and others propose a method which uses an x86 processor emulator forsimulate the decryption in the fetch stage [BAFS05]. The process is encrypted atload time, whereas Kc and others present an approach where the executable is storedencrypted on the hard disk [KKP03]. The proper key is stored in the executableheader which is loaded into a special register for decryption. However, the key is alsoeasily extractable for an attacker which lowers the effectiveness of this approach.

32

4.3 Safe Languages

4.3 Safe Languages

Many attacks can be launched due to the inherent security flaws which exist in the Cand C++ language. Programming in C or C++ allows a lot of programming close tothe hardware and the memory layout makes these languages very flexible. However,the programmer must consider many facts if he would like to produce invulnerablecode. Safe languages, like Java, are capable of some implementation vulnerabilitieswhich are discussed in Section 3.1. Nevertheless, C or C++ are preferred languagesfor low-level or even for high-level programming, especially in embedded systemswhich makes a safe implementation of these languages reasonable.

Cyclone is a C dialect which statically analyses given C code at compile-timeand inserts dynamic checks at places where it cannot ensure that the code is safe[JMG+02, GHJM05]. Cyclone is designed to avoid buffer overflows, format stringattacks, and memory management errors. However, the C syntax and semantics aswell as the capability of low-level programming are preserved. Insecure constructsare refused to compile until more information is provided to make these constructssecure. However, Cyclone programs need existing libraries, like the GNU C librarylibc which usually compiled with a standard C compiler [Rag06]. To secure libraryfunctions, the libraries should also be compiled with Cyclone.

Another approach is CCured which is a source to source translator for C [NCH+05].The used techniques are similar to Cyclone, which includes static analysis and dy-namic checks on these points where static analyses are not possible. CCured usespointer and type analysis to make casts secure. However, these techniques makeCCured programs incompatible with existing libraries. This obstacle can be solvedby introducing library wrappers.

4.4 Code Analyzers

Code analyzers can be categorized into two groups: static code analyzers and dy-namic code analyzers. Static code analyzer approaches check either the source codeor the compiled object code for vulnerabilities without executing the program. Itis impossible to detect buffer overflows statically. Therefore these tools use heuris-tics whose detection rates are never complete. The term analyzer corresponds to awide area of automatic tools ranging from only considering the behavior of simplecode statements to consider the complete source code. Some static code analyzerapproaches need annotations to the source code whereas other approaches need noannotations.

For example, an annotated static code analyzer is Splint [EL02]. It is a lightweighttool which uses annotations to check properties of objects, e.g., the range of a vari-able. Dor and others introduce the C String Static Verifier (CSSV) which is able todetect string manipulation errors [DRS03]. This tool also uses annotations that have

33

4. Defenses Against Code Injection Attacks

pre-, post-, and side-effect conditions. Furthermore, it analyzes pointer interactionand performs integer analysis. A non-annotated static code analyzer for detectionof buffer overflows is described by Wagner and others in [WFBA00]. The analysisis done by formulating buffer overflows as an integer range problem. Another non-annotated analyzer approach is PREfix [BPS00] and its extension PREfast [LBD+04].These methods build an execution model of the analyzed code which includes all pos-sible execution paths of the program.

Other static approaches use lexical analysis which can be implemented as an editorextension. The written source code is compared to database entries of vulnerable codesnippets. If such an entry is found, the tool might examine them further and reportthe security impact. Approaches using such lexical analysis are ITS4 [VBKM00] andFlawfinder [Whe].

Dynamic code analyzers add further information to the source code and performtest runs in order to detect vulnerabilities. However, not all vulnerabilities might bedetected, because the used input stimulus for the test runs might not cover all situa-tions. Purify [HJ92] is a tool which tests software to detect memory errors like unini-tialized memory access, buffer overflows, or improper freeing of memory as well asmemory leakages. The tool is commercially available from IBM known as RationalPurify [IBM]. Haugh and Bishop introduce a dynamic buffer overflow detection toolfor C programs, called STOBO [HB03]. This tool instruments program code in or-der to keep track with memory buffers and checks function arguments. If a bufferoverflow occurs in a test run, a warning is printed. Ghosh and O’Conner present theFault Injection Security Tool (FIST) in [GO98]. This tool injects malicious strings inbuffers and observes the application response to detect vulnerabilities.

4.5 Anomaly Detection

Anomaly detection refers to methods which compare the actual application behaviorto a specified application profile. Any deviation from the profile will raise an excep-tion which triggers further measures. The application profile can be user-specified orlearned from past executions of the program. A disadvantage of these methods is thehigh false-positive rates due to the identification of any unusual behavior as an attack.

Hofmeyr and others and Forrest and others propose a method for monitoring sys-tem calls for UNIX processes [HFS98, FHSL96]. If the system call pattern deviatesfrom the previous recorded pattern, subsequent actions like program terminations canbe taken. A similar approach is presented from Wagner and Dean [WD01]. The oc-currence of system calls is also monitored and checked with a system call model. Themodel is built statically from the control flow graph of a program, where the controlflow graph is transformed into a system call graph, which models the sequence ofthe occurrence of system calls. Sekar and others also use a system call model foranomaly detection [SBDB01]. The model, however, is generated dynamically with

34

4.6 Compiler, Library, and Operating System Support

system call recording in a learning phase. Furthermore, techniques using sliding win-dows which analyze the system calls inside the window are presented by Forrest andothers [FHSL96] and Wagner and others [WD01]. Forrest uses a dynamic learningphase whereas Wagner uses static information derived from the control flow graph.

Feng and others use, besides the system call information, the return address fromthe stack for anomaly detection [FKF+03]. Like the other methods, the checks aredone on system calls. During a learning phase, so-called virtual paths are recorded.A virtual path can be built with all return addresses, gathered from the stack, on asystem call. These return addresses correspond to all unreturned functions. Duringthe execution of the detection phase, the virtual path is checked on every system callto detect anomaly behavior.

Zhang and others present a hardware approach for detecting anomalies in the pro-gram behavior [ZZPL04]. In this approach, the detection is done on the control flowinstruction level which has a finer granularity than the other system call-based ap-proaches. Jump and branch information, like target addresses or favored conditionalbranch decisions, are stored additionally in the system memory. Fast memory accessis assured through common cache structures of the processor. This method has somesimilarities with our method which will be introduced in Section 4.7.3. However, thisapproach does not store control flow graphs, rather each branch or jump is separatelylooked up using a context addressable memory (CAM). Hereby, a hash of the branchor jump address is calculated which acts as index for the branch table. Although thisapproach needs more memory and has a higher latency as our approach, there is noneed for synchronization with the control flow of the executed program. The aim ofthe method is to recognize attacks by detecting anomalous behavior. Therefore dur-ing a learning phase, the decisions of conditional branches are recorded and storedin the branch table. If the recorded decisions differ from the control flow behaviorin the detection phase, a warning signal will be risen, whereas if the control flowdiverges from the stored jump and branch information, a threat is signaled. The ap-proach was extended with an anomalous path detection which compares sequences ofbranch decisions of the executed program with the decisions recorded in the learningphase [ZZPL05]. In the second approach, general indirect jumps (non returns fromsubroutine) are considered as well.

4.6 Compiler, Library, and Operating SystemSupport

In this section we discuss countermeasures for code injection attacks through en-hancement of compilers, libraries, or the operation system.

35

4. Defenses Against Code Injection Attacks

4.6.1 Compiler Support

Compilers are the most convenient place to insert countermeasures for code injectionattacks without changing the programming language. Most attacks exploit bufferoverflows to overwrite stack based content. Therefore, many approaches proposestack frame protection measures. In the stack, mainly the return address or framepointers are in focus of attackers. Other items which can be protected with securityenhanced compiler support are pointers in the program code. Buffer overflows occur,if there is more data written to the buffer, than its capacity can hold (see Section 3.1).Therefore, boundary check methods are in focus of this section.

There exist many different methods to protect the return address inside the stack,for example StackGuard [CPM+98, CBD+99], Stack Shield [Ven00], or Return Ad-dress Defender (RAD) [CH01]. StackGuard places a so-called canary word1 betweenthe return address and the local variables inside the stack. Before executing the returninstruction, the canary word is checked and verified whether it is intact. By exploit-ing buffer overflows for return address alteration, the canary word is also overwrittenwhich can be detected. Stack Shield uses a redundant return address which is copiedin the data segment in the beginning of the function. Before leaving the functionwith the return jump, the return address is compared to the copy. If the addresses dif-fer, the program will be terminated. A similar technique is used by RAD. However,the redundant copy of the return address is stored in an array in the data segmentwhich is called return address repository (RAR). The RAR is further protected byso-called mine zones or read only techniques. Mine zones are the read only arrayboundaries, which protect the RAR from buffer overflow attacks. All these returnaddress protection methods can be applied as a compiler patch for the gcc compiler.However, attacks described in [BK00] and [Ric02] are able to cancel the Stack Shieldand StackGuard protection. Foremost, StackGuard is vulnerable if the attacker usesa pointer which directly points to the return address which allows the return addressalteration without destroying the canary word.

Cowan and others introduce a compiler extension for pointer protection, knownas PointGuard [CBJW03]. The technique protects pointers through encrypting themwhile they are in the memory. Additional en- and decryption operations are insertedin pointer read and write sequences at compile-time. For example, by accessing apointer, the pointer is decrypted to a processor register which is safe against mali-cious overwriting. If a pointer is altered by overwriting the memory during a bufferoverflow attack, the decrypted result points to a different location which prevents theaccess of malicious code.

1The term canary word corresponds to the miner’s canary which was used in coal mines as an earlywarning system. If there were toxic gases in the mine, the birds died before the miners wereaffected. Canaries sing a lot, which made them very suitable for a visual and audible warningsystem. The last canaries in mines were phased out in 1986 in the UK [BBC05].

36

4.6 Compiler, Library, and Operating System Support

Lhee and others propose a compiler extension which inserts additional buffer sizechecks to prevent buffer overflows at runtime [LC02]. The buffer size informationis read out of a compilation with debugging information of the program. Using thisinformation, additional checks are automatically inserted into the source code.

Erlingsson and others propose a fine-grained software-based memory access con-trol technique called XFI [EAV+06, ABEL09]. This technique enriches the programcode to grant access to an arbitrary number of memory regions. Furthermore, theentry and exit point of a program can be controlled using XFI. Budiu and others pro-pose additional instructions to extend the instruction set architecture (ISA) for XFIhardware support [BEA06].

Jones and Kelly propose a method to identify out-of-bound pointers [JK97]. Everyresult of a pointer arithmetic must reference the same object as the original pointer. Ifnot, the pointer is out-of-bounds. Such pointers can be identified dynamically by ad-ditional instructions which are included at compile-time and a new object table whichis maintained during the execution. If a pointer is out-of-bounds, this pointer value isset to ’-2’. The problem of this approach is that out-of-bound accesses are not allowedin ANSI C, however, such pointers are used in many programs. Therefore, Ruwaseand Lam extend this approach with an out-of-bound object and call this approachC Range Error Detector (CRED) [RL04]. If a pointer becomes out-of-bounds, it isredirected to a special out-of-bound object which keeps the original pointer value andthe referenced data. This approach prevents buffer overflows, because all data writtenover the bounds of the buffer are automatically redirected to other memory locationsmanaged by the out-of-bound object.

4.6.2 Library SupportMany buffer overflows are caused by mishandling vulnerable standard C libraryfunctions. Particularly, string handling functions are vulnerable for buffer overflowattacks. Therefore, the obvious solution is to design safer libraries. Safe string func-tion replacements are strlcpy() and strlcat() [MdR99] and SafeStr [MV05]which are immune to buffer overflows and format string vulnerabilities. Format-Guard is a patch for the glibc library to protect the printf() function from formatstring vulnerabilities [CBB+01].

Baratloo and others introduce two methods against buffer overflows which arecompletely transparent: libsafe and libverify [BST00]. Both approaches are imple-mented as dynamic link libraries under the Linux operating system. The library lib-safe intercepts all calls to vulnerable functions of the glibc library and substitutesthese calls with alternative functions which are not vulnerable to buffer overflow orformat string attacks. The library libverify uses binary re-writing of the process mem-ory to verify critical elements of the stack frame before they are used. The verificationand protection against buffer overflows is similar to the StackGuard [CPM+98] ap-proach, however the implementations differ. Whereas StackGuard is applied during

37

4. Defenses Against Code Injection Attacks

the compilation, libverify embeds the verification code at the start of the process. Theadvantage is that the code does not have to be recompiled which makes this approachcompletely transparent to the user.

Robertson and others [RKMV03] and Krennmair [Kre03] propose countermea-sures for heap-based attacks, described in Section 3.1. The allocation and dealloca-tion routines of the standard C library are modified to protect the header of the heapsegment. Robertson includes a padding mechanism and a checksum in the headeron frame allocation and verifies these information, if the segment should be freed.Krennmairs technique, called ContraPolice, protects the heap pointer in the headerof each heap segment by randomly generating canaries like the StackGuard approachfor stack-based headers.

4.6.3 Operation System Support

Finally, the operating system can be enhanced to protect programs from code injec-tion attacks. Non-executable stack prevents the execution of malicious code, injectedinto the stack. However, this approach prevents some allowed situations where codeis executed in the stack. Examples are functional programming languages which gen-erate code during runtime in the stack, function trampolines for nested functions usedby the gcc compiler, or stack-based signal handling which is used by Linux. A patchfor a non-executable stack for the Linux operating system was provided in [Des97]which also handles the above mentioned executions by disabling the protection incase of these situations. However, this approach is defeated by Wojtczuk [Woj98].

Lately, processor vendors have introduced hardware support to prevent the execu-tion of code from the stack. With a new flag, the so called NX (No eXecute) bit, mem-ory regions can be declared page-wise as non-executable areas which are excludedfrom execution by the hardware. Non-executable stack approaches for the Linux op-erating system, like PaX [PAX03] or Exec Shield [vdV04] are able to use this NXbit, or emulate it on processors which have no NX bit support. The technique canbe combined with write protection to achieve that no memory location in the processcan be marked as writable (’W’) and executable (’X’). This so called W⊕X protectionprevents attackers from injecting malicious code with subsequent execution. Never-theless, Shacham demonstrated that it is not necessary to inject code in order to domalicious computations [Sha07] (see also Section 3.1).

StackGhost is an operation system-based approach to protect the stack frame forsystems running on the SPARC architecture [FS01]. This method utilizes specialSPARC features like the windowed register file and provides a redundant copy of thereturn address. StackGhost is available as a patch for the OpenBSD kernel.

38

4.7 Control Flow Checking

4.7 Control Flow CheckingControl flow checking (CFC) denotes the task of checking the control flow of a pro-gram according to a given specification. The specification is derived mostly stati-cally at compile-time from the program. Control flow checking combines reliabilityand security issues and is a countermeasure against single event effect, degenerationfaults, code injection and invasive physical attacks.

Related work on control flow checking can be divided into completely software-based approaches and approaches using an additional hardware checker unit or awatchdog processor. Usually in these approaches, the program code is first structuredinto basic blocks2. Other approaches achieve control flow checking by redundantlydecoding the control flow instructions in an additional checker unit (e.g., [RLC+07]).

4.7.1 Software-Based Methods

Software-based control flow checking techniques belong to the so-called softwareimplemented hardware fault tolerance (SIHFT) methods. A famous software-basedCFC technique is called Control Flow Checking using Assertions (CCA) [KNKA96,MN98]. After the creation of the control flow graph (CFG, see also Section 4.7.3),special control instructions are inserted into the program code at the beginning and theend of a basic block (see Figure 4.1). The approach introduces two identifiers whichare set and checked with these instructions. At the entrance of a basic block, a basicblock identifier is assigned to a variable. Moreover, corresponding to the control flowgraph, a special control flow identifier is checked and subsequently set for the nextbasic block. At the end of a basic block, both identifiers are verified, so erroneousjumps or branches from or in the middle of a basic block are detected. By checkingthe control flow identifier, the correct processing order of the basic blocks is ensured.The advantage is that no hardware modules are required, however this approach hasimpact on the performance of the program code and the erroneous jumps can only bechecked at the transitions of the basic blocks.

The approach is enhanced for real-time distributed systems to achieve a lower per-formance overhead and faster error detection in [ANKA99]. The additional instruc-tions are inserted at an intermediate level of the compiler which makes this technique,called Enhanced Control Flow Checking with Assertions (ECCA), language indepen-dent.

Another software-based control flow checking approach called Block SignatureSelf Checking (BSSC) is introduced by Miremadi and others [MKGT92]. The codeis also structured into basic blocks and additional instructions are added at the entry

2A basic block is a sequence of code which is executed successively without any jumps or branchesexcept, possibly, at the end. The basic block can only be left at the end of a block and can onlybe entered at the beginning. Only the last instruction can be a jump or branch and only the firstinstruction can be a jump or branch destination (see Section 4.7.3).

39

4. Defenses Against Code Injection Attacks

����������

������ �������

������ �������

���

���

���

Figure 4.1: Control instructions are usually inserted before and after a basic block(BB1-BB3) in software-based control flow checking approaches. Theadditional control flow instructions check the executed control flow ac-cording to the control flow graph.

and at the end of each block. Checking is done by storing a signature, e.g., the currentaddress, into a variable at the basic block entrance. Before leaving the basic block,this signature is verified. The method can verify the subsequent linear execution ofthe basic block. However, the processing of the correct basic block order cannot beverified.

Oh and others introduce a technique called Control Flow Checking by SoftwareSignatures (CFCSS) [OSM02]. A unique signature is assigned to each basic blockand the signature is embedded with the signature difference to the predecessor blockin the code. During the execution, a runtime signature is calculated and stored ina general purpose register. The signature of the last block and the stored signaturedifference are used to calculate and verify the current runtime signature at each basicblock entrance. In other words, this approach checks if the correct predecessor ofthe current basic block, according to the CFG, was processed. However, if a basicblock has more than two predecessors, the method is not applicable. In this case, anadditional runtime variable is introduced to resolve the problem. Borin and otherspropose error classification for control flow checking and analyze the error coverageof the most existing software-based approaches [BWWA06]. Furthermore, they in-troduce two methods which are enhancements to the original CFCSS method. Thefirst method is called Edge Control Flow Checking (EdgCF) and the second is calledRegion Based Control Flow (RCF) technique.

Other similar approaches are SWIFT [RCV+05] and YACCA [GRRV03, GRRV05].All these method insert control instructions at the basic block borders into the pro-gram code, as depicted in Figure 4.1.

40

4.7 Control Flow Checking

Bagchi and others introduce the Preemptive Control Signature (PECOS) checking,which is able to prevent jumps or branches in case of an error [BLW+01, BKIL03].The program code is equipped with additional checker instructions before each con-trol flow instruction. The runtime target address is determined and verified with thevalid target addresses, extracted from the compiled code. The list of valid target ad-dresses is stored inside the code, whereas the runtime target address is determined byloading the control flow instruction into a register and decode it by software. If theruntime target address is not in the list with valid addresses, an exception is triggeredwhich prevents the execution of the erroneous jump or branch. The drawback of thisapproach is that only the integrity of the control flow instruction in the memory ischecked. Transient faults, such as single event effects in the control path cannot bedetected by this method.

Abadi and others introduce a software-based CFC technique for security issuescalled Control Flow Integrity (CFI) [ABEL05, ABEL09]. This method focuses onindirect calls and returns. The destination of indirect calls and returns are determinedat compile-time, and each of these jump destination are labeled with a unique iden-tifier in the code. Instructions to check the identifier of the destination are added tothe program code before an indirect jump. Only if the identifier is correct, the jumpis executed. Budiu and others present an instruction set architecture (ISA) extensionwhich introduces new instructions for CFI hardware support [BEA06].

4.7.2 Methods using Watchdog ProcessorsA watchdog processor is a simple coprocessor which is able to monitor the behaviorof a main processor in order to detect errors [Lu82, MM88]. The predecessor of thewatchdog processor is the watchdog timer [CPW74, OCK+75]. A watchdog timeris reset by the program running on the main processor at certain intervals. If themonitored program hangs, the timer is not reset anymore. Therefore, the timer pro-duces an overflow which generates an interrupt. Inside the interrupt service routine,countermeasures can be started, e.g., the erroneous program can be terminated.

A watchdog processor is initialized with information about the main processor orthe process which should be monitored. At runtime, the watchdog processor concur-rently collects information about the execution of the program in the main processorand compares the gathered with initial information to detect errors. The informationmay include the memory access behavior, the control flow, the control signals, or thereasonability of results. Mahmood and McCluskey give a survey over error detec-tion with watchdog processors in [MM88]. Traditionally, a watchdog processor iscoupled with the main processor via a system bus. However, other approaches existwhere the watchdog processor is directly attached by dedicated signals.

The advantages of watchdog processors are the lower overhead than the duplicationof the main processor, the possibility of concurrently checking the execution whichresults in none or only little performance degeneration and the detection of common

41

4. Defenses Against Code Injection Attacks

or related design errors of the program or the processor due to the diversity of theprocessors architectures.

Watchdog processors, when used for control flow checking, have a watchdog pro-gram which is derived from the control flow of the checked program. The controlflow of a program can be represented in a graph whose nodes represent sequencesof code, e.g., basic blocks or whole functions, and the edges between the nodes rep-resent the control flow. An identifier, often called signature which is known by thewatchdog program is attached to each node. The signatures can be assigned at ran-dom or they can be derived form the instructions inside a node. Techniques usingthe arbitrarily assigned signatures are called assigned-signature control flow check-ing and techniques using the derived signature are called derived-signature controlflow checking [MM88]. The different watchdog processor approaches can be furthercategorized by the storage of the watchdog signatures. Therefore, the methods canbe divided into two groups, called Embedded Signature Monitoring (ESM) and Au-tonomous Signature Monitoring (ASM). ESM methods embed the watchdog signa-tures into the code of the checked program. To verify a signature, the correspondingsignature must first be transferred to the watchdog or to the main processor, depend-ing on the comparison location. The watchdog processors for the ASM methods havetheir own memory to store the signatures. Therefore, the watchdog must be initial-ized with all watchdog signatures before the program execution. In summary, thereexist four categories for control flow checking with watchdog processors (see Table4.1).

signature storage locationESM ASM

Assigned-Signature CFC SEIS [PMHH93]SIC [Lu82],

ESIC [MH91]

Derived-Signature CFCPSA [Nam82], Cerberus-16 [Nam83],

SIS [SS87] RMP [ES84]

Table 4.1: Four different categories for control flow checking using watchdog pro-cessors with some example references. The methods are categorized bydifferent watchdog signature storage locations (embedded into the pro-gram: ESM; in additional memories for the watchdog processor: ASM)and the different type of signatures (derived, assigned) according to[MHPS96].

Watchdog-processor-based CFC approaches can be further categorized accordingto their error detection capability. The first category checks that the nodes are pro-cessed in an allowed sequence whereas approaches of the second category verify the

42

4.7 Control Flow Checking

instruction sequence inside a node. The third category includes schemes which doboth [MM88].

Assigned-Signature Control Flow Checking

During the execution, the arbitrarily assigned signatures used for assigned-signatureCFC are transferred to the watchdog processor for verification. Usually, the trans-ferred signatures are compared by the watchdog processor to the watchdog signa-tures, stored in a separate watchdog memory (ASM method). The advantages ofthese methods are the ease of implementation and the possibility to perform runtimechecks asynchronously. However, the drawbacks are the performance impact, due tothe program-based transfer of the signatures to the watchdog with additional controlflow instructions, and the low error coverage since only the sequence of the signaturesis checked.

The first known method is introduced by Yau and Chen [YC80] which assignsprime numbers to loop-free intervals which are checked at runtime. Lu proposes amethod called Structural Integrity Checking (SIC) [Lu82]. The method assigns labelsto high-level control flow structures which are verified by the watchdog processor.The approach is enhanced by Majzik and Hohl which is called Extended StructuralIntegrity Checking (ESIC) [MH91].

An embedded signature monitoring approach for assigned-signature CFC is intro-duced by Pataricza and others, called Signature Encoded Instruction Stream (SEIS)[PMHH93, MHPS96]. In this approach, each basic block is assigned a unique signa-ture which further encodes the successor basic block. The signatures are transferredto the watchdog processor during the execution which verifies the control flow ofthe program only using the information encoded in the signatures. Therefore, thewatchdog processor needs no signature storage memory and initialization phase.

Derived-Signature Control Flow Checking

Derived-signature CFC uses a signature calculated from the properties of the ex-ecuted instructions inside a node. To check all instructions, a signature, e.g., anXOR, hash or CRC value, of all instructions of a basic block is calculated offline (atcompile-time). At runtime, a checker unit calculates the signature of the executedinstruction in a basic block. When leaving a basic block, both signatures can becompared and errors inside the basic block can be detected. The derived-signatureCFC methods can also be categorized by the storage of the precalculated (golden)signature in ESM and ASM methods.

Embedded Signature Monitoring Derived-signature ESM methods store theoffline calculated signature (golden signature) in the program code with additionally

43

4. Defenses Against Code Injection Attacks

inserted instructions at the end or the beginning of each basic block. During run-time, the calculated signatures from the watchdog processor are compared to theseembedded signatures. The advantage of these methods is that all instructions can bechecked and a new program already contains the corresponding signature (see Fig-ure 4.2). The disadvantages are the performance impact and that a fault can onlybe detected at the end of a basic block which may be too late. Also, a single eventupset during the execution of the additionally inserted instructions can lead to a falsedetection or spoofing of an error.

���

�������

������ �������

������ �������

���

���

���

����������������

Figure 4.2: In the derived-signature ESM CFC approaches, a signature is calculatedfrom the executed instructions by a watchdog processor. The goldensignatures are calculated at compile-time and embedded with controlinstructions in the code. The additional control instructions read out theruntime calculated signature from the watchdog processor and compareit to the golden signature.

According to [MM88], the first derived-signature CFC method is introduced byNamjoo [Nam82] and is called Basic Path Signature Analysis (Basic PSA). The sig-natures are calculated by XORing over all instructions inside each basic block. Afterthe calculation of the signatures at compile-time, the signatures are stored at the be-ginning of each basic block. The watchdog processor monitors the instruction streamand identifies the loading of the signature from the instruction memory. During theexecution of the basic block, the watchdog processor calculates the runtime signatureby XORing the processed instructions and, at the end of a basic block, the signaturesare compared. A very similar technique is proposed by Sidhar and Thatte in [ST82].

Other approaches use linear feedback shift registers (LFSR) [DS90, DS91] orchecksums [SM90] as signatures or try to lower the number of used signatures by us-ing larger blocks which include multiple branches [SS87, WS90]. Gaisler enhancedhis ERC32 processor with an ESM CFC technique where the signatures are embed-

44

4.7 Control Flow Checking

ded into NOP instructions [Gai94]. Meixner and others store the signatures for theArgus-1 checker into unused instruction bits of the SPARC ISA to reduce the perfor-mance and memory overhead of their ESM method [MBS07, MBS08]. If insufficientunused bits are available, they also embed the signature into NOP instructions.

Upadhyaya and Ramamurth propose a derived-signature CFC technique using m-of-n codes [UR94]. An m-of-n code is an n-bit code whose bit values have m ones. Atcompile-time, the signature of a basic block is calculated, for example, by XORingthe instructions. If the intermediate result is an m-of-n code, then this instructionis tagged. At runtime, the watchdog calculates the signature, recognizes the taggedinstructions and verifies on the tagged instructions if the runtime signature is an m-of-n code. At the basic block borders, an additional byte is inserted which adjuststhe current signature to an m-of-n code in order to force a check. The advantage isthat the signature must not be stored in the program code. However, one additionalbyte per branch is necessary to force a check in order to restart the runtime signaturecalculation at a new basic block. A similar approach is presented by Ohlsson andRimen called Implicit Signature Checking (ISC) [OR95]. The implicit signaturesare the current start addresses of the basic blocks. This can be achieved by usingadditional justified signatures embedded into the code.

Autonomous Signature Monitoring The golden (compile-time calculated) sig-natures of derived-signature ASM methods are stored in a separate memory for thewatchdog processor. The comparison between the golden and the runtime calculatedsignature is implemented in hardware (see Figure 4.3). If the control flow graphis mapped into the instruction memory of the watchdog processor, the jumps andbranch destinations can also be checked. The advantages are that the program codedoes not need to be altered and that there is no performance impact. Also, all in-structions can be monitored. The disadvantages are that extra memory is requiredfor the checker unit and the synchronization between the CPU and the checker unit isdifficult. Therefore, interrupts, multi threading, and indirect jumps cannot be coveredcompletely.

One of the first approaches using the ASM scheme is the Cerberus-16 watchdogprocessor [MM88, Nam83]. The control flow graph and the corresponding signa-tures are mapped in the microinstructions which are stored into the watchdog pro-cessor memory. The Cerberus-16 processor only has control flow instructions withencoded signatures and instructions for the communication with the main processor.The processing of the control flow of the main and the watchdog processor are com-pletely synchronized. The approach is extended by Michel and others by a branchaddress hashing (BAH) technique, now called Watchdog Direct Processing (WDP)which reduces the memory overhead for the watchdog processor memory [MLS91].

An asynchronous ASM approach is presented by Eifert and Shen [ES84, ST87]which is called Roving Monitor Processor (RMP). At compile-time, the control flow

45

4. Defenses Against Code Injection Attacks

���

�������

����������������

��������������������

Figure 4.3: In the derived-signature ASM approach, the watchdog processor hasa separate memory for storing the control instructions. The executionmust be synchronized between the CPU and the watchdog processor.

of the program is extracted and the signatures (CRC values) are calculated. Duringthe execution, the runtime signatures are calculated with an additional signature gen-eration unit and, at the end of a block, the calculated signature is sent to the watchdogprocessor. The watchdog compares the received signature to the signatures achievedfrom the control flow graph and stores then in the watchdog memory. At branches,the received signature is compared with the two successor signatures of the currentnode in order to determine the next signature. This approach can also be used tocheck multi-processor systems where each processor has its own signature genera-tion unit and sends the signature via a signature queue to the watchdog processorwhich is responsible for the whole system. The approach is extended by Madeira andSilva who introduce the Online Signature Learning and Checking (OSLC) technique.In this approach, the golden signature is generated during a learning phase [MS91].The learned signatures are stored in the watchdog memory of an RMP-like watchdogprocessor.

Arora and others describe an ASM approach for security applications in [ARRJ06].This hardware approach consists of three parts: the Inter-Procedural CFC, the Intra-Procedural CFC, and the Instruction Stream Integrity Checker. The Inter-ProceduralCFC verifies the function calls and returns by implementing the function call graphin hardware using content addressable memories (CAMs) and an FSM. The Intra-Procedural CFC checks the basic blocks by a compile-time generated control flowgraph, implemented in checker memories. Finally, the Instruction Stream IntegrityChecker is similar to those of other ASM methods, however, they use hash functionsto generate and verify the signatures.

Our new control flow checking approach introduced in Section 4.7.3 belongs tothe class of derived-signature ASM methods. We propose a term checker unit for the

46

4.7 Control Flow Checking

watchdog processor, because our unit is very simple with only few hardware over-head. However, like in the Cerberus-16 or the WDP approach, the control flow graphis mapped into microinstructions which are stored in a separate memory. Furtheradvantages of our control flow checker are the fast error detection due to the tight in-tegration into the processor, the error recovery possibility, and the expandability withmodules which support more control flow instructions or detect more errors. More-over, like the other ASM-methods we have no performance impact on the error-freecase and the program must not be altered.

4.7.3 New Control Flow Checking

In this section, new methods for control flow checking in embedded processors andIP cores are presented. First, a classification of control flow instructions in embeddedRISC processors is given. Subsequently, two different methodological concepts forcontrol flow checking of direct jumps and branches are introduced and possibilities tocheck indirect jumps are discussed. Then, methods for repairing a corrupted programpath are proposed. Finally, methods for checking Finite State Machines (FSMs) ingeneral IP cores are analyzed.

Branches and Jumps

Control flow instructions (CFI) can be categorized into conditional branches and un-conditional jumps. Conditional branches depend on the result of a logical or an arith-metic operation. On most processor architectures, the arithmetic operation affects aregister, called integer condition codes (icc). This register consists of flags whichdescribe properties of the result, for example whether the result is greater than zero,or negative. Conditional branches evaluate this register for the decision to take or nottake the branch. The way of evaluation (e.g., branch if the zero flag is set) is staticallycoded in the instruction itself, whereas the evaluation of the condition is performedat runtime.

Both groups of control flow instructions can be further subdivided into direct(static) and indirect (dynamic) jumps or branches. The destination of direct branchesor jumps is fixed at compile-time and is encoded into the jump or branch instructionin an absolute or relative address. For indirect jumps or branches, the destinationaddress is determined during program execution. The destination address is given inabsolute or relative manner by either a register value or as the result of an operationwith registers or the result of an operation with a register and a constant value whichis encoded into the instruction.

In summary, four types of control flow instructions exist:

• (Unconditional) direct jumps (e.g., call, goto),

• (Conditional) direct branches (e.g., if .. then .. else),

47

4. Defenses Against Code Injection Attacks

• (Unconditional) indirect jumps (e.g., return from subroutine), and

• (Conditional) indirect branches3.

Furthermore, the class of unconditional indirect jumps can be subdivided into re-turns from subroutine, register indirect calls and other jumps. A return from sub-routine is an example of an indirect jump, because the program counter jumps tothe address where the routine is called from, and this address is only known at run-time. Register indirect calls are calls where the address of the called subroutine isdetermined at runtime. This usually happen in C++ if a virtual function is called.

Finally, jumps which are not triggered by an instruction can occur such as inter-rupts and traps. The destinations of interrupts are typically given by the start addressof the main interrupt service routine, and so, interrupts belong to the class of directjumps. Traps occur on exception conditions (like divide by zero). Here, the programredirects to the address of an exception handler, and so, traps can be treated as directjumps, too.

Table 4.2 presents an analysis of the quantity of these different types of branchesand jumps in the code on the SPARC V8 [SPA] architecture for the SPEC CINT2000benchmark [SPE] for a given list of programs. As can be seen, indirect calls andjumps occur relatively rarely as opposed to direct branches and jumps.

SPEC all direct indirectprogram instructions branches jumps returns calls other jumps

gzip 19979 1426 599 111 4 0gcc 566280 54791 22446 2236 140 273vpr 51771 2764 2012 269 2 7mcf 3881 288 82 26 0 0

crafty 82891 4814 4074 108 0 13parser 36862 3189 1701 320 0 2

gap 236181 18733 4158 828 1262 5vortex 174567 12537 8491 913 15 21bzip2 12162 748 380 73 0 0twolf 102899 5701 2060 189 0 2

Table 4.2: Accumulated number of all and different kinds of control flow instruc-tions of benchmarks of the SPEC CINT2000 test suite [SPE] when com-piled to the SPARC V8 [SPA] architecture.

3Note that conditional indirect branches are not supported by any instruction set architecture that weknow of.

48

4.7 Control Flow Checking

Methods for Checking Direct Jumps/Branches

In SoCs, a CPU often executes only a few specified programs over its lifetime. Thisholds true particularly for embedded applications where the system is often onlyprogrammed once, and the code is never changed during the lifetime of the product,except for the update of the SoC with a new firmware and software. Furthermore, itis well known that in many computational intensive problems, most of the executiontime is spent in only few subroutines. So, it is beneficial to analyze these subroutinesfor branches and jumps statically.

If we assume that only direct jumps and branches exist in a given code segment,we will show that we are able to verify the control flow of this code and guaranteethe correct execution of each direct control flow instruction as well as the (succes-sively) linear execution of all the other instructions (the program counter value isincremented by one word address after each instruction). To verify the correct execu-tion of control flow instructions, we need to check whether the address of the controlflow instruction and the target address are correct. The program counter value beforeand after the execution of a control flow instruction can be compared to these ad-dresses. If there is a mismatch, an error signal is raised. To check a non control flowinstruction, the program counter before and after the execution of the instruction canbe compared. If the second one is not an increment of the first one, the error signal isalso raised.

In the following, we propose two alternative methods to obtain the correct ad-dresses of control flow instructions of a given machine program and the correspond-ing targets. The first method is called basic block or control flow method (CF). Thesecond method is called control flow instruction method (CFI).

Control Flow Method First, a given compiled machine code is separated intoa set of basic blocks (BB). A basic block is a sequence of code which is executedsuccessively without any jumps or branches except, possibly, at the end. The basicblock can only be left at the end of a block and can only be entered at the beginning.Only the last instruction can be a jump or branch and only the first instruction can bea jump or branch destination. The following instructions define the beginning of abasic block [TH07]:

• the first instruction in a program or segment,

• the instruction following a control flow instruction,

• the instruction which is a destination of a control flow instruction.

From this information, the control flow graph CFG(BB,T ) is built: Each nodeBBi ∈ BB of the control flow graph represents a basic block. The nodes are sortedwith increasing start address of the corresponding basic block in ascending order.Each edge t j ∈ T represents a transition of the control flow from one basic block to

49

4. Defenses Against Code Injection Attacks

another. If the last instruction of a basic block BBi is a direct branch instruction, thebasic block has two successors. One is the basic block next in the list BBi+1 (if thebranch is not taken), and to a basic block where the first instruction is the branchdestination (if the branch is taken). Jumps have only one successor, and if the lastinstruction is not a control flow instruction, the successor basic block is always thenext basic block BBi+1. An example program and the corresponding CFG are shownin Figure 4.4 which is separated into basic blocks.

��

����

����������������������

���������������������������

��������������������

���������

����

����

������������������ ��������

������������������!∀��� �������

������������

��������� ��������

����

� ��

� �

� �

� ��

� �

� ��

� �#

� �∃

� �%

� ��

� ��

� ��

Figure 4.4: An example program code is given on the left hand side together withthe corresponding assembler code. The CFIs are denoted A to C, andthe CFI destination addresses a to c. D denotes the end of the programor segment to be checked. Furthermore, the code is divided into ba-sic blocks BBi, i = 1, . . . ,6. On the right hand side, the correspondingcontrol flow graph (CFG) is shown.

With the given CFG, we have all information to check a sequence of programcounter values for correctness leading to the specification of a proper control flowchecker unit as follows: The information of the CFG can be either used to directlydefine a finite state machine (FSM) to check the correctness of a sequence of con-trol flow instructions. Alternatively, an implementation using micro-instructions of amicro-programmed circuit can be deducted from the CFG.

For an implementation of the checker unit as a micro-programmed circuit, theinformation of the CFG can be stored inside memories. For each basic block, weneed to store the start and the end address and also the indices of the successor basicblocks. The start address of each basic block is the end address of the previous basicblock incremented by one. To minimize the memory overhead, we can store only the

50

4.7 Control Flow Checking

end address and a global start address. Also, we only need to store one successor ofa basic block for branches because if the branch is not taken, the basic block with thenext index (BBi+1) is always executed.

With these memory overhead improvements, we need three memory items for eachbasic block inside the memory:

• One for the basic block end address (addr),

• one for the index of the branch taken successor basic block (suc), and

• a flag ( f lag) which identifies the type of the last instruction of the basic block.

The flag is needed to choose the right transition to the next basic block (see Figure4.5). Note that if the last instruction of a basic block is not a CFI, the successor basicblock index (suc) is not needed.

���� ��� ���

����

����

����

����

����

����

���

Figure 4.5: Three memory areas are necessary to store required information foreach CFG. In the first column (addr), the address of the last instruc-tion of the basic block is stored. The successor basic block for a takenbranch is stored in the second column (suc). In the third column ( f lag),a flag is stored which identifies the type of the last instruction of a basicblock. An N denotes a non control flow instruction, whereas a B de-notes a branch. This example memory stores the values for the exampleprogram in Figure 4.4.

The control flow checking algorithm is depicted in C language in Listing 4.1. Forchecking the control flow, we need the current program counter (PC) and the follow-ing program counter (nPC). The algorithm, implemented as a C function, returns 0if the control flow for the program counter and its successor is correct, and −1 ifthe control flow differs from its specification. Further, the index i of the current basicblock and the three memories (addr, suc, and f lag) are needed. The function addr(i)returns the entry with index i of the memory addr.

51

4. Defenses Against Code Injection Attacks

Listing 4.1 Control flow (CF) checking algorithm

1 int check_cf( PC, nPC) {2 static int i; // index i3 if (addr(i) == PC) { // PC is BB end4 if (flag(i) == ’J’) { // uncond jump5 if ((addr(suc(i)-1)+1) == nPC) { // correct?6 i = suc(i);7 return 0;8 } else return -1;9 } else if (flag(i) == ’B’) { // cond branch

10 if (((addr(suc(i)-1)+1) == nPC) { // branch taken11 i = suc(i);12 return 0;13 } else if (PC +1 == nPC)) { // branch not taken14 i++;15 return 0;16 } else return -1;17 } else { // non CFI18 if (PC +1 == nPC) { // correct?19 i++;20 return 0;21 } else return -1;22 }23 } else { // non BB end24 if (PC +1 == nPC) return 0; // correct?25 else return -1;26 }27 }

By looking up the basic block end address in the addr memory (addr(i)), we knowwhen the basic block end is reached (Line 3). If the basic block end is not reached,the address of the next program counter must be the current address incremented byone (Line 23). If not, an error occurs. If the basic block end is reached, we mustdistinguish between the different types of the last instruction inside the basic block(Line 4, 9, and 17). If this is an unconditional jump, like a call, only the jumptarget must be checked for correctness. The corresponding target address is the startaddress of the successor basic block, given by its index. To get this address, theend address of the basic block with the previous index is fetched and the address isincremented (BBi−1+1 or Line 5). Furthermore, the current index i must be updatedto the successor basic block index (Line 6). If the last basic block instruction isa conditional branch, two possible successor basic blocks exist. If the branch istaken (Line 10), the handling is the same as for an unconditional jump. If the branchis not taken (Line 13), the next program counter value should be the current valueincremented by one (nPC == PC + 1). Hence, the next instruction belongs to thesuccessive basic block and also the basic block index i must be updated (Line 14).

52

4.7 Control Flow Checking

Finally, if the last instruction of a basic block is not a CFI, the checking behavior isthe same as on conditional branches, where the branch is not taken (Line 18).

One very similar approach of a CF method is described in [ARRJ06]. Here, theCFG is implemented in hardware by a finite state machine and a lookup table forresolving the control flow instruction addresses and indices (in memory). The disad-vantage of this approach is that the checker unit must be synthesized new for eachprogram. Here, in our memory-based approach, only the contents of the memoriesneed to be reconfigured in order to check a new program.

Control Flow Instruction Method In contrast to the control flow (CF) method,the control flow instruction (CFI) method is based on storing control flow instructionsinstead of basic blocks. In case of direct branches and jumps, the start and targetaddress are known at compile-time. So, it is possible to extract this informationfrom the binary or the disassembled program code by decoding the instructions. Thecontrol flow instructions are then sorted by increasing addresses in ascending addressorder.

Then, the control flow instruction graph CFIG(CFI,T ) is built: Here, each controlflow instruction in the code which should be checked represents a node (CFIi ∈CFI).Directed edges t j ∈ T of the CFIG denote transitions to the next following controlflow instruction in the given code.

Like in a CFG, each node can have a maximum of two successors: two for abranch instruction and one in case of a jump instruction. For a branch instructionCFIi, one successor is CFIi+1 (branch is not taken). The other successor of a directbranch and jump instruction is CFIn which is the next control flow instruction inthe program code after the branch destination (branch is taken). The CFIG of theexample program code form Figure 4.4 is shown on the left side of Figure 4.6. Notethat D is not a CFI, rather it refers to the end of the checking segment or function.

Like in the CF method, the information of the CFIG can be used as a specificationof the correct branching behavior inside a control flow checker unit and implementedeither directly by an FSM or by micro-instructions of a micro-programmed circuit. Incase of a micro-programmed circuit implementation, we store for each CFI the startand the target address in memory (addr and target in Figure 4.6). Also, the index ofthe successor CFI must be stored inside this memory (suc in Figure 4.6). For directbranches, we store the successor CFI for taken branches. If the branch is not taken,the successor CFI is CFIi+1. Finally, we need a flag ( f lag) to distinguish betweenthe different CFI types.

A proper control flow instruction checking algorithm is shown in Listing 4.2. Likethe CF algorithm, the inputs are the current program counter PC and the next programcounter nPC and the output is a 0 in case of a correct control flow, or a −1 in caseof an error. The checking algorithm needs the four memory columns, introduced in

53

4. Defenses Against Code Injection Attacks

���� ��� ���

����

����

���

���

���

�����

���

���

����

Figure 4.6: For the example program code in Figure 4.4, the corresponding CFIG isshown on the left hand side. The nodes correspond to the control flowinstructions, whereas the edges denote transitions. On the right handside, the four memory areas are shown which are necessary to store theCFIG. In the addr memory, the address of the CFI is stored and thecorresponding target address is stored in the target column. In the thirdcolumn (suc), the successor CFI index is stored. Finally, the kind ofinstruction is stored in the last column ( f lag).

Figure 4.6 and the index i, which denotes the next CFI from the current program flowposition.

The algorithm is quite similar to the CF method, with the difference of accessingthe jump or branch targets and the missing check of basic block ends with a nonCFI. In Line 3, we check if the current executed instruction is a CFI. If it is not, thelinear successive program flow is checked (Line 18). If the current program counterreferences to a CFI, we must also distinguish between the different types of CFIs(Line 4 and 9). If the CFI is an unconditional jump, the next program counter shouldbe the value stored in the target memory column (target, Line 5). Also, we mustupdate the index i to the index of the successive CFI (suc(i), Line 6). If the currentCFI is a conditional branch, we must check if the branch is taken or not (Line 10 and13). If the branch is taken, the same checking strategy as in the case of unconditionaljumps is used. If the branch is not taken, the next program counter must be thecurrent one, incremented by one (Line 14). In both cases, the index i must be recentlyupdated.

Memory Overhead Discussion In the following, the different memory over-heads of both methods shall be compared. The example program in Figure 4.4 has6 nodes in the CFG and 4 nodes in the CFIG. For the CF method, we need to storeonly one address for a CFG node (basic block end address addr) and the index of the

54

4.7 Control Flow Checking

Listing 4.2 Control flow instruction (CFI) checking algorithm

1 int check_cfi(PC, nPC) {2 static int i; // index i3 if (addr(i) == PC) { // PC is CFI4 if (flag(i) == ’J’) { // uncond jump5 if (target(i) == nPC) { // correct?6 i = suc(i);7 return 0;8 } else return -1;9 } else { // cond branch

10 if (target(i)) == nPC) { // branch taken11 i = suc(i);12 return 0;13 } else if (PC +1 == nPC)) { // branch not taken14 i++;15 return 0;16 } else return -1;17 }18 } else { // non CFI19 if (PC +1 == nPC) return 0; // correct?20 else return -1;21 }22 }

successor block. In the CFI method, we need to store two addresses (control flow in-struction address addr and target address target) and the index of the successor blockfor each CFIG node. Both methods also need bits to store the flags for distinguishingthe different CFI or basic block types. Usually, the index needs less bits than theaddresses of instructions, so the CF method uses less memory than the CFI methodfor this example.

For measuring the memory overhead for standard user programs, we use the pro-grams from the SPEC CINT2000 [SPE] benchmark in the following (see Section4.7.3). Table 4.3 shows the memory overhead caused to implement the CF and CFImethod for the SPEC CINT2000 benchmark when compiled to the 32-bit SPARCV8 [SPA] architecture. The smallest possible index bit width is chosen for the givenprogram to calculate the memory overhead in bits.

Also, the memory overhead of the checking methods are compared with the mem-ory usage of the test programs. The number of instructions of the test programs arepresented in Table 4.2. On the SPARC V8 architecture, each instruction needs 32-bit of memory space. The additional memory overhead of the checker methods areshown in absolute values and in percentage of the memory usage of the test programin Table 4.3.

The results in Table 4.3 show that the CF method usually produces a lower memoryoverhead than the CFI method and in a range of typically less than 20%. Note that

55

4. Defenses Against Code Injection Attacks

SPEC CF method CFI methodprog. # Ind. w. Overhead # Ind. w. Overhead

BB [bits] [bits] [%] CFI [bits] [bits] [%]gzip 2615 12 109830 17.2 2140 12 154080 24.1gcc 90819 17 4268493 23.6 79886 17 6151222 33.9vpr 6029 13 259247 15.6 5054 13 368942 22.3mcf 514 10 20560 16.6 398 9 27324 22.0

crafty 10205 14 449020 16.9 9009 14 666666 25.1parser 6384 13 274512 23.3 5212 13 380476 32.3

gap 30484 15 1371780 18.1 24986 15 1873950 24.8vortex 24978 15 1124010 20.1 21977 15 1648275 29.5bzip2 1502 11 61582 15.8 1201 11 85271 21.9twolf 9827 14 432388 13.1 7952 13 580496 17.6

Table 4.3: Required memory overhead of the programs of the SPEC CINT2000benchmark in bits for the CF and CFI method. Also, the number of basicblocks and control flow instructions, and the corresponding index widthis shown. The memory overhead is shown in absolute values and in per-centage of the memory usage of the corresponding test program.

the shown overhead is for checking the whole program, with all subroutines which isnot always the best way. By restricting the checking to only few subroutines whichare executed very often and should have a high reliability and security, the memoryoverhead can be significantly reduced.

Instruction Integrity Checker The instruction integrity checker (IIC) is an ex-tension to the control flow method to check all types of instructions, not only controlflow instructions. A CRC (cyclic redundancy check) or hash value may be calculatedoffline for all instructions inside a basic block (at compile-time) and online inside thechecker unit [MLS91]. The offline calculated CRCs are stored inside an additionalmemory which extends the other checker memories (see Figure 4.5). For each basicblock, we store the end address addr, the index of the next basic block suc (for ajump or a taken branch), the flags f lag and additionally the CRC or hash value in thememory iic (instruction integrity check).

The checker unit calculates a CRC from the instructions during the execution ofa basic block. At the last instruction of the basic block, the calculated CRC can becompared with the offline calculated CRC, stored inside the iic memory. If the CRCsare not equal, one or more bits are false in the instruction stream. This error can besignaled to the operating system by an interrupt or the system might be rebooted by

56

4.7 Control Flow Checking

a hardware reset. A re-execution or correction is not or hardly possible, because weare able to detect an error inside a basic block only at the end of the block.

The instruction integrity checking is not applicable for the CFI method, becausethere may exist more than one path to a CFI node, whereas in the CF method a basicblock is always traversed on the same path. As an example, consider the if clausein the program in Figure 4.4. In the CF method, if the branch (if) is not taken, onlybasic block 5 is traversed. If the branch is taken, basic blocks 4 and 5 is transversed,but basic block 5 is executed on the same way as if the branch was not taken. Unlikein the CFI method, if the branch is taken or not, always the CFI C is the successor.However, through the way to C the program flow takes different paths, depending onwhether the branch is taken or not.

Walking through different paths to a node results in different CRC or hash values.With the IIC method, we are only able to store one CRC or hash value in the addi-tional memory column for one node. Surely, we could extend the memory to store avalue for each possible path, but this would result in a huge memory overhead, be-cause we must reserve memory space for each node. This shows that the instructionintegrity checker is not practical to the CFI method.

Conclusions Both introduced methods can only check direct branches and jumps,where start and destination addresses can be extracted from the compiled code.

The advantage of the CF method is that in most cases, fewer additional memoryresources are needed than for the CFI method. The disadvantage of the CF method isthat memory handling is more difficult. On many processor architectures, the fastestexecution of one instruction is one clock cycle. Consider Algorithm 4.1, where weneed access to the addr memory for each control flow instruction twice, once for theend address of the basic block (Line 3) and once for the start address of the successorbasic block (Line 5 and 10). To achieve this in a single clock cycle, we need adual-port memory which is more expensive than single port memories. Furthermore,for the second access to the memory, we need first the successor index from thesuc memory. To do both memory accesses in one clock cycle is nearly impossibleon high-clocked processors. Furthermore, the access to the suc memory cannot bescheduled one clock cycle before, because if the current basic block consists onlyof one instruction, and the previous basic block ends with a branch instruction, thecurrent index i depends on the result of the executed branch (taken or not). This showsus that we need at least two clock cycles to check a transition in CFG. To ensure thaton a basic block end the correct start and destination addresses are available, wemight pre-read both values. This can also be done with a single ported addr memory.On the first clock cycle, the basic block end address is read from the addr memoryand the successor basic block index is read from the suc memory. On the secondclock cycle, the target address is read from the addr. But this pre-read can only bedone if the basic block consists of more than one instruction. If a basic block consists

57

4. Defenses Against Code Injection Attacks

of only one instruction, we must stall the processor pipeline to verify the control flowinstruction to prevent possible erroneous behavior. Fortunately, basic blocks withonly one instruction are very rare.

The CFI method, on the other hand, requires only one memory access for eachmemory. In Line 3 of the Algorithm 4.2, we access the addr memory to get the nextCFI address. In the same clock cycle we can access the target memory to fetch thecorrect destination address (Line 5 and 10). With the CFI method, it is possible tocheck a transition in the CFI graph with at least one clock cycle. Therefore, the CFImethod has no execution time overhead at all.

The advantages of the CFI method are that the checker unit is very simple and usesonly few logic resources. Also, we have no performance impact, because the correctcontrol flow instruction address and target address may be loaded from the memoryin a single clock cycle. The disadvantages are that usually more memory resourcesare needed as for the CF method and that we are not able to check the integrity of noncontrol flow instructions.

Finally, both introduced concepts for control flow checking have the big advantageover [ARRJ06] in being reprogrammable. Thus, only the memory of the controlflow checker unit needs to be reprogrammed so to check a different program. Noadjustments of the hardware are thus necessary. Moreover, we have no performanceimpact for verify the control flow like the software-based methods.

Methods for Checking Indirect Jumps/Branches

Checking indirect jumps or branches is more difficult than direct branches or jumps,because the jump destination cannot be determined from the compiled program code.In fact, according to the instruction specification of indirect jumps, all possible targetswhich are inside the reachable area of the jump, are allowed. From the hardwareside, also a falsified indirect CFI which jumps to a wrong address is in accordance tothe processor specification. Almost all code injection attacks target this behavior bymanipulating indirect jump targets (the return stack). However, from the software orlogical side, there are certainly some restrictions of indirect jump targets: Compilersuse indirect jumps in a stylized manner which can be analyzed [LB94]. Almost allindirect jumps which are compiled from a modern program language, like C, C++,or Java, are returns from subroutine, or either belong to a switch case clause,which is implemented using a jump table, or are indirect calls which are mainly usedin object-oriented languages, like C++ or Java. Indirect jumps which appear in hand-written code are nearly impossible to analyze. Fortunately, hand-written assemblercode is used more and more rarely today.

The results reported in Table 4.2 show that returns from subroutine are clearly themain usage of indirect jumps. Upon a call, the address of the back-jump is storedinside a register or a memory stack, and on a return from subroutine, a back-jump tothis address is initiated.

58

4.7 Control Flow Checking

Indirect jumps are also used for jump tables to efficiently implement switchcase clauses. Here, the alternative case targets are assembled in a jump tablewhich is addressed by the previously calculated operator. Furthermore, the targetsmay be direct jumps which lead the control flow to the desired code segment (seeFigure 4.7). Another way to use a jump table is to call different functions, dependingon an input. Here, the alternative function pointers are stored inside a jump table,whereas the index of the table is calculated with the input value. The address of thedesired function is fetched from the table and is called with an indirect jump. Notethat jump tables are not often used by compilers. Usually, switch case clausesare translated to an if .. else if tree. But, depending on the compiler andoptimization parameters, indirect jumps might nevertheless occur. Indirect jumpswhich result from jump table implementations are listed in Table 4.2 under the cate-gory “other jumps”.

���

���� �������������������������

�����������������

������������������

�����������������

������������������������������������������� ����������������������������� ����������������������������� ��������

Figure 4.7: An pseudo C example code of a switch case clause is shown onthe left side. On the right side, a possible implementation in assemblerwith a jump table and an indirect jump is depicted. The upper caseletters (A-D) are direct jump instructions and the corresponding targetsare depicted in lower case letters (a-d).

Finally, the indirect jumps are also used as indirect calls (see Table 4.2). Duringindirect calls, the address of the target function is loaded inside a register and with anindirect jump the function will be called. This occurs mainly in object-oriented pro-gramming languages that support function pointers and virtual functions. However,functions called by a jump table also use indirect calls.

The methods for checking indirect jumps that will be described in the followingcan be categorized into methods using information which are gathered by analysis orsimulation at compile-time and methods which are only using runtime information.

59

4. Defenses Against Code Injection Attacks

Methods Using Compile-time Information If we are able to analyze the tar-gets of indirect jumps at compile-time, we can extend our hardware checking unitsto support multiple jump destinations for monitoring program code that includes in-direct jumps. Cifuentes and Emmerik [CE99] present a method to identify indirectbranch targets, if the indirect jump is used within a jump table. Furthermore, simula-tions with different input stimuli may also be helpful to identify indirect jump targets.However, to get the possible jump targets by simulation requires a high effort.

Another approach is to convert all indirect jumps into direct jumps and branches.Bergstra and Middelburg [BM07] present a method to convert most indirect jumpsin a compiled program, including jump tables and returns, into direct jumps andbranches. The length of program code could be extremely increased and the perfor-mance could be reduced by this method.

Methods Using Runtime Information

Methods using runtime information do not need information from the compiled code.Here, we monitor the control flow at runtime to decide if the execution of the indirectjump is correct or not.

Most indirect jumps are returns from subroutine (see Table 4.2). By executing thereturn from subroutine instruction, the program counter jumps to the next addressafter the instruction, were the subroutine was called. The return address is typicallystored in a register inside the CPU so the return instruction is a special indirect jump.Returns can be verified by implementing an additional hardware stack [KE91]. On acall (direct or indirect), the return address is stored in the stack and when the returninstruction is executed, the back-jump can be verified.

Furthermore, indirect branch prediction units can be used to evaluate an indirectjump address. Branch prediction is used in pipelined processors to avoid pipelinestalls on branches. A prediction is made if a branch will be taken or not and the nextinstructions will be fetched according to the prediction. If the prediction was right,no stall occurs, if not, the pipeline must be stalled and the right instructions must befetched.

Indirect branch prediction units predict destinations of indirect jumps. The predic-tions are made based on the jump behavior in the past [CHP97, SFF+02, JMKP07].The result of an indirect branch predict unit might be used to evaluate how reliablethe jump destination is. If the prediction is correct, then the probability that this jumpis correct is high, but if the prediction is incorrect, the jump destination could be false.A non-predicted indirect jump target has a lower trustworthiness. With this method,no exact proposition can be made, but, for example, a higher level autonomic oper-ating system can evaluate this jump confidentially to increase the reliability of thewhole system.

60

4.7 Control Flow Checking

Methods for Handling a Corrupt Control Flow

In the sections above, methods for autonomous monitoring the control flow weredescribed. However, what can we do if an error is detected? There are three opportu-nities:

• The faulty instruction can be re-executed.

• The CPU can be transferred into a secure state.

• The CPU can continue executing the code at a lower reliability level.

If an error in the control flow occurs, the faulty instruction might be re-executedas follows: The error should be detected fast enough to ensure that the state of theCPU is not altered by the erroneous instruction execution. To guarantee this, a pos-sible checker unit must monitor the program counter in the first pipeline stage of agiven RISC CPU. Unfortunately, in most architectures, the jump or branch instruc-tions need more than one cycle to execute. So, until the error is detected, someother instructions after the jump might be executed already. After error detection,the program counter is reset to a value previous to the error by looping back the pro-gram counter value from a subsequent pipeline step or by a calculated value from thechecker unit. The details of the re-execution process depend highly on the processorarchitecture and design.

For example, the SPARC V8 architecture allows to execute one instruction aftera branch instruction or two instructions after a jump instruction before the branchor jump is performed (see SPARC Architecture Manual [SPA]). If an error is de-tected and the jump or branch instruction must be re-executed, also these followinginstructions must be re-executed. It must also be ensured that these instructions can-not alter the state (e.g., register content or memory operations) of the CPU beforere-execution. If the retry also fails, the instruction cache can be invalidated to ensurethat on the next re-execution, the instructions are transferred again from the memory.If a predefined number of retries fail, the checker unit can lead the CPU into a securestate. Also, the number of retries can be reported to the operating system to showhow reliable the CPU is.

Another possibility to react in the case of an error is to transfer the CPU into asecure state. This state can be the reset state or any other state until the program wasexecuted correctly. After reaching this state, the operating system can initialize theCPU with correct data and the CPU can start to execute from this clean state. Theinvalidation of the data resulting from the erroneous task can be also done by theoperating system.

However, the CPU might continue executing the code at lower reliability level withdeactivated checker if the task has a low reliability requirement or is further checkedby another process.

61

4. Defenses Against Code Injection Attacks

In all cases, the operating system should be informed about the error and updatethe internal reliability state of the CPU. If many errors occur, the CPU should onlybe allowed to execute tasks with low reliability requirements or unimportant tasks, orshould finally be excluded by the dispatcher and shut down.

Additional Reading

In [ZT08a, ZT09, Zie10, SBE+07b, SBE+07a, MWB+10], more information aboutthis control flow checker unit can be found.

62

5IP Protection

Intellectual property (IP) denotes the absolute right on an intangible asset, like music,literature, artistic works, discoveries, inventions, words, phrases, symbols, designs,software, or IP cores. The owner of the IP can license his work to other peopleor companies. IPs are protected by law with patents, copyrights, trademarks, andindustrial design rights.

Drimer defines the following protection or defense categories against IP theft orfraud [Dri09]: social, reactive, and active protections.

Social protection means that IP works are protected by laws, non-disclosure agree-ments, copyrights, trademarks, patents, contracts, and so on. The deterrents are con-viction by a court of law and the loss of a good reputation. However, these deterrentsare only effective if the misconduct can be proven and the appropriate laws exist.Furthermore, the laws must be enforced which is handled differently from country tocountry.

Reactive protection means that the theft or fraud cannot be prevented, however, itcan be detected and delivers evidence of the misconduct. Some reactive protectionmechanisms deliver only suspicious facts which, however, may be enough to triggerfurther investigations. Furthermore, the persistence of reactive protection mecha-nisms might deter would-be attackers.

Active protection means that physical or cryptographic mechanisms prevent thetheft or fraudulent usage of the protected work. This category has the highest deter-rent degree. However, these mechanisms can be broken by attacks. Often the attackcan be proven if the misconduct is detected.

In this work, we concentrate on the protection of the IP of hardware cores. Theseso called IP cores are distributed like software and can easily be copied. Some coresuppliers encrypt their cores and deliver special development tools which can handleencrypted cores. The disadvantage is that common tools cannot handle encryptedcores and that the shipped tools can be cracked so that unlicensed cores can be pro-cessed. Another approach is to hide a signature in the core, a so-called watermark,which can be used as a reactive proof of the original ownership. There exist manyconcepts and approaches on the issue of integrating a watermark into a core.

In general, hiding a unique signature into user data, such as pictures, video, au-dio, text, program code, or IP cores is called watermarking. Embedding a watermark

63

5. IP Protection

into multimedia data is achieved by altering the data slightly at points where humansense organs have lower perception sensitivity. For example, one can remove fre-quencies which cannot be perceived by the human ear by coding an audio sequenceinto an MP3 file. Now, it is possible to hide a signature into these frequencies withoutdecreasing quality of the coded audio sequence [BTH96].

One problem of watermarking is that for verification, the existence and the charac-teristic of a watermark must be disclosed, which enables possible attackers to removethe watermark. To overcome this obstacle, Adelsbach and others [ARS04] and Li andothers [LC06] presented so-called zero-knowledge watermark schemes which enablethe detection of the watermark without disclosing relevant information.

The watermarking of IP cores is different from multimedia watermarking, becausethe user data, which represents the circuit, must not be altered since functional cor-rectness must be preserved. A fingerprint denotes a watermark which is varied forindividual copies of a core. This technique can be used to identify individual autho-rized users. In case of an unauthorized copy, the user, the copied source belongs to,can be detected and the copyright infringement may be reconstructed. Watermark-ing procedures can be categorized into two groups of methods: additive methods andconstraint-based methods.

In additive methods, the signature is added to the functional core, for example, byusing unused lookup-tables in an FPGA [LMSP98]. The constraint-based methodswere originally introduced by [KLMS+01] and restrict the solution space of an op-timization algorithm by setting additional constraints which are used to encode thesignature.

A survey and analysis of watermarking techniques in the context of IP cores isprovided by Abdel-Hamid and others [AHTA04]. Further, we refer to our own surveyof watermarking techniques for FPGA designs [ZT05]. A survey of security topicsfor FPGAs is given by Drimer [Dri09] who also maintains the FPGA design securitybibliography website: http://www.cl.cam.ac.uk/˜sd410/fpgasec/.

In order to compare different watermarking strategies, some criteria are defined inthe following [HP99]:

Functional correctness: This is the most important criteria. If the watermarkprocess destroys the functional correctness, it is useless to distribute the core.

Resource overhead: Many watermarking techniques need some extra resources.Some to generate and store the watermark itself, some because of the degradation ofthe optimization results from the design tools. The ratio between the original and thewatermarked core’s resource demand is defined as the resource overhead.

64

5.1 Encryption of IP Cores

Transparency: The watermark procedure should be transparent to the designtools. It should be easy to integrate the watermarking step into the design flow, with-out altering common design tools.

Verifiability: The watermark should be embedded in such a way that the author-ship can be verified easily. It should be possible to read out the watermark only withthe given product and without any further information from the design flow whichmust be requested from a company suspected of IP fraud.

Difficulty of removal: The watermark should be resistant against removal. Theeffort to remove the watermark should be greater than the effort needed to develop anew core, or the removal of the watermark should cause corruptness of the function-ality of the core. Watermarks which are embedded into the function of the core arein general more robust against removal than additive watermarks.

Strong proof of authorship: The watermark should identify the author with astrong proof. It should be impossible that other persons can claim the ownership ofthe core. The watermarking procedure must be resistant against tampering.

In this section, we first discuss IP protection methods using core encryption. Af-ter that, related work using additive and constraint based watermarking methods ispresented.

5.1 Encryption of IP Cores

The goals of active IP protection for cores are, first, that the core cannot be usedwithout a proper license and, second, that the core is protected from unauthorizedmodifications. The cores can be delivered in encrypted form and are decrypted bydesign tools. Other approaches for FPGAs use an encrypted configuration bitfilewhich is decrypted on the FPGA.

5.1.1 Encrypted HDL or Netlist Cores

One solution is to deliver encrypted IP cores to the customers and integrate de- andencryption functions into the EDA tools. The customer buys the encrypted core andobtains the appropriate key from the IP core developer or vendor. This technique isapplicable for IP cores of all abstraction or technology levels (RTL – HDL cores,logic level – netlist cores, device level – bitfile/layout cores). However, if, e.g., anHDL core should be protected at all abstraction levels, the synthesis tool must pro-duce an encrypted netlist. This must be done for all steps: decryption of the core,processing, and encryption. It is important that the customer only has access to the

65

5. IP Protection

encrypted data, which means that the EDA tool routines must be protected againstread out attacks.

The problem is that no consistent industrial standard exists which handles en-crypted IP cores [Dau06]. This complicates the interoperability of IP cores and EDAtools.

Today, symmetric and asymmetric cryptographic approaches are used. Using sym-metric cryptographic approaches, the en- and decryption is done with the same key.The advantage of this approach is the reduced computational complexity comparedto asymmetric approaches. One problem is the secure distribution and communica-tion of the key. Furthermore, EDA tools must deal with different keys for differentIP vendors, and if one key is cracked, usually all IP cores of the corresponding ven-dor have lost their protection. Nevertheless, this approach is used, for example, byXilinx to encrypt some of their parameterizable HDL IP cores, e.g., the Microblazeprocessor softcore [Xild].

Methods using asymmetric cryptography are also known as public key cryptog-raphy which need two keys, the private and the public key. The private key is fordecryption inside the EDA tools, where as the encryption key is publicly availableand is used by the IP core vendor. The EDA vendor creates the key pairs and embedsthe private key in his tools. The IP core developer can now use the public key for theencryption. The advantage is that the private decryption key may not be transferredover untrusted communication channels and is only known by the EDA vendor. Thedisadvantage is that asymmetric approaches have a high computational complexitywhich results in long runtime for decryption up to several hours for IP cores [Dau06].Another drawback is that the IP vendor must create a separate version for each EDAtool, which is encrypted with the corresponding public key of the EDA vendor.

Dauman, Vice President of the Synopsys’ Synplicity Business Group, introduced ahybrid approach [Dau06]. The IP core is encrypted with a symmetric cryptographicmethod, like Triple-DES, or AES using a key which is generated by the IP vendor.This key, now referenced as the data key, is encrypted with an asymmetric crypto-graphic method, like RSA [RSA78], with the public key of the EDA vendor. Thisapproach is similar to the PGP approach [Zim95] for cryptographic privacy and au-thentication of messages. The decryption is done with the (decrypted) data key andthe cryptographic method which is specified by the IP vendor. Inside the EDA tools,there exist different symmetric cryptographic routines for the decryption of the core.The advantage is that the decryption with a symmetric algorithm is very fast and thecomputational complex asymmetric method is only used for the data key which isvery small compared to the whole IP core. Synplicity suggested this approach asfuture industry standard and includes this method called ReadyIP into the productSynplify Premier [Syn].

In 2007, a industry-wide panel discussion [Wil07] provided some insight into theperception of encrypted IP cores of the EDA industry. The conclusion was that thecurrent social-based protection works well for large cooperations. A better solution

66

5.1 Encryption of IP Cores

is desirable but not necessarily urgent. However, they express their reservation tosmall companies or startups which are not known in the community and might not bewilling to sell IP cores to these companies.

Barrick argued against the usage of encrypted netlist cores due to their hiddencosts [Bar]. The disadvantages are the fixed constraints, the prevention to reuse partsof the logic for other cores, slower simulation speed or inaccurate behavioral models,restriction of the choice of EDA tools, and fewer debugging possibilities. However,sometimes encryption can be worth due to reduced acquisition costs.

5.1.2 Encrypted FPGA Configurations

Another kind of IP protection is the encryption of the FPGA configuration or bitfile.The bitfile is stored in a non-volatile memory, e.g., a PROM, and transferred to theFPGA encrypted. Inside the FPGA during the configuration, the bitfile is decrypted.This approach prevents copy attacks for bitfile designs and protects the bitfile fromreverse engineering.

The first suggestion of this method was in 1995 by a patent from Austin [Aus95].The first FPGA devices which offered configuration encryption was the Actel’s 60RSfamily. However, all FPGAs had the same permanent key, which prevents no copy at-tacks. Furthermore, the key was also stored in the software. Consequently, it was easyfor attackers to extract the key from the software. Xilinx introduced configuration de-cryption with a Triple-DES hardcore for Virtex-II devices in the year 2000. The userdefined key can be stored and updated into an FPGA internal battery-backed SRAM.Today, bitfile encryption is supported by many high-end FPGA families. Some FPGAdevices, such as the Altera Stratix II/III, can be configured to always perform decryp-tion. This prevents the configuration with bitfiles which are not encrypted with theproper key.

There exist two different key storing techniques: volatile and non-volatile. Volatilekey storing uses low power SRAMs which are powered by an external battery. At-tackers must keep powering the key storage during the attack, which is more com-plicated. On an attacker’s error, the key is cleared and the bitfile cannot be loaded.The disadvantage is the increased printed circuit board space and costs for the exter-nal battery. Non-volatile key storage uses fuses, flash, or EEPROMs. The problemis that these technologies must be combined with the latest CMOS technology onthe same chip, which affords in a non-standard manufacturing step. The results areincreasing costs and more complex verification strategies.

An important aspect of methods using encrypted FPGA configuration bitfiles is thekey management which includes the generation and the distribution of keys. Keansuggests a method where the FPGA can encrypt and decrypt bitfiles with hardwarecores and a permanent embedded key [Kea01]. The FPGA is able to encrypt the bitfileon the first programming and store this encrypted bitfile in a non-volatile memory.

67

5. IP Protection

Upon every FPGA configuration during the power-up cycle, the bitfile is loaded anddecrypted in the FPGA. The advantage is that the key never leaves the FPGA.

Bosset and others [BGB06] propose a method for using partial reconfiguration foren- and decryption of FPGA bitfiles by user-defined soft cores. At power-up, thedecryption core is initially loaded form the PROM which decrypts the bitfile with theuser logic. Soudan and others [SAH] propose a method for the encryption of partiallyreconfigurable bitfiles using device-specific keys.

5.2 Additive Watermarking of IP Cores

Additive methods are watermarking procedures, where a signature is added to thecore. This means that the watermark is not embedded into the function of the core.Nevertheless, the watermark can be masked, so it appears to be part of the functionalpart. Additive watermarks can be embedded into HDL, netlist, bitfile or layout cores.

5.2.1 HDL Cores

Additive watermarking for HDL cores seems to be very complicated, because of thehuman-readable structure of the HDL code. Hiding a watermark there is very dif-ficult, because on the one hand, an attacker may easily detect the watermark, andon the other hand, subsequently used design tools might remove the watermark dur-ing circuit optimization. However, it is not impossible to include an additive HDLcomponent into the core, which may not removed by the design tools.

Castillo and others hide a signature into unused space of dedicated lookup tablebased memory [CPG+06]. To extract the signature, an additional logic monitors theinput stream for a special signature extraction sequence. If this sequence is detected,the signature is sent to the outputs of the core. This approach was later generalizedfor other memory structures in [CPG+08]. The drawback is that distribution as anHDL core is not possible, because the signature extracting logic is easy to detect andto remove.

Oliveira presents a general method for watermarking finite state machines (FSMs)in a way that on occurrence of a certain input sequence, a specific property exhibits[Oli01]. The certain input sequence corresponds to the signature which is previouslyprocessed by cryptographic functions. A similar approach is presented by Torunogluand others in [TC00] which explores unused transitions.

Fan provides a method where the watermark or signature is sent as a preamble ofthe output of the test mode [FT03]. Some ASIC circuits provide a special test modewhich stimulates the core with special input patterns. To analyze the correctness ofthe core, the output of these input patterns are measured and compared to the correctpatterns. The idea is to send the watermark sequence over the output port before thetest sequence starts.

68

5.2 Additive Watermarking of IP Cores

The disadvantage of these approaches is the usage of ports for signature verifica-tion. This works only if the ports are reachable. If the core is embedded into othercores, the ports of the watermarked core can be altered which falsifies or preventsthe detection of the signature in the output stream. This applies also to the signatureextraction sequence in the input stream.

5.2.2 Netlist Cores

To the best of our knowledge, there exist no publications on the use of additive wa-termarking at the level of netlist cores. In [ZT05] we presented the first two examplesof how additional watermarking for netlist cores can look like. The first idea is to ap-ply redundant logic in some paths of the core according to a signature. To verify thewatermark, one can optimize the core so that the redundant logic is removed, showthe differences and reconstruct the signature.

The second idea is to add false paths in the design which do not affect the followinglogic. The weakness of both ideas is that the design tools applied in subsequent stepsuse transformations which may destroy the watermark. Therefore, these ideas are notapplicable.

Identification of Netlist Cores by Analysis of LUT Contents

In this approach, we do not add any signature or watermark. The core itself remainsunchanged, so the functional correctness is given and no additional resources areused. We compare the content of the used lookup tables from the registered core IL1

with the used lookup tables in an FPGA design IB from the product of the accusedcompany. If a high percentage of identical content is detected, the probability thatthe registered core is used is very high.

The synthesis tool maps the combinatorial logic of an FPGA core to lookup tablesand writes these values into a netlist. After the synthesis step, the content of thelookup tables of a core is known, so we can protect netlist cores which are deliveredat the logic level. The protection of bitfile cores at the device level is also possible.

After the core IL1 is purchased, the customer can combine this core with othercores: IB = TL→B(IL1 ◦ IL2 ◦ IL3 ◦ . . .). In the following CLB mapping step, it is pos-sible that lookup tables are merged across the core boundaries or are removed by anoptimizing transformation. This happens when different cores share logic or whenoutputs of the core are not used. These lookup tables cannot be found in the FPGAbitfile IB, but experimental results show that the percentage of these lookup tablescompared to the number of all lookup tables in the core is typically low for the usedmapping tool (Xilinx map).

If a company is accused of using unlicensed cores in a product, the bitfile of theused FPGA can be extracted. After reading out the content and the positions of thelookup tables from the bitfile and comparing them with the lookup table contents

69

5. IP Protection

from the original core, the ownership of the core can be proven by evaluating a de-tector function D(IB, IL1).

Identifying the Core After the extraction of the content of lookup tables from abitfile, we can compare the obtained values with the information in the netlist. Theextraction of all lookup table contents from a bitfile is done as described in [Zie10]:LB(IB) = {xB1 ,xB2, . . . ,xBq}. The content of the lookup tables can easily be read outfrom a netlist file: LL(IL1) = {xL1,xL2, . . . ,xLr}. For example, in an EDIF netlistfor Xilinx FPGA devices, the lookup table contents appear after the INIT propertyfor the lookup table instances. Unfortunately, the mapping tools do not necessarilyadopt these values. The mapping tool may merge lookup tables from different corestogether, convert one, two or three input lookup tables to four input lookup tables andpermute the inputs to achieve a better routing.

All lookup tables of an FPGA have nl inputs. On most FPGA architectures, lookuptables have nl = 4 inputs. In a core netlist, also lookup tables with less than nl inputsmay exist. These lookup tables must be mapped onto nl input lookup tables. Ifone input is unused, only half of the memory is needed to store the function andthe remaining space must be filled. In the case that a function uses less inputs thanthe underlying technology of the FPGA provides, it is desirable to turn the unusedinputs into don’t cares. Intuitively, this can be achieved rather easily by replicatingthe function table as it is demonstrated in Figure 5.1.

���

���

���

���

���

���

���

���

��

��

��

��

���� ����

����

������� �

Figure 5.1: Converting a two input lookup table into a three input lookup table withunused input i2.

The mapping tool can permute the inputs of the lookup tables, for example, toachieve a better routing. In most FPGA architectures, the routing resources forlookup table inputs are not equal, and so a permutation of the lookup table inputs

70

5.2 Additive Watermarking of IP Cores

can lower the amount of used routing resources. Permutation of the inputs signifi-cantly alters the content of a lookup table. For nl inputs, nl! permutations exist andthus up to nl! different lookup table values for one so-called unique function. Tocompare the contents of the lookup table from the netlist and the bitfile, it must bechecked if one of these possible different lookup table values for one unique functionis equal to the value of the lookup table in the bitfile. This is done by creating a tablewith all possible values of lookup tables for all unique functions (see Figure 5.2).

��������

������� �

�������������

��������

����������

���������������������

�������������

������������������������

���������������������

������

� �

Figure 5.2: Before the lookup table contents of the bitfile and the netlist are com-pared, they are mapped into unique functions.

Summary We have presented a new method to identify IP cores in FPGA bitfiles.Possible transformations of the mapping tools and the effect of the robustness of themethod were discussed. The experimental results show that it is possible to identifya core in the design with a high probability [ZAT06]. The identification process isbased on two parameters, namely the number of found lookup tables of the core inthe design and the mean distance to the core center. However, it must be taken intoaccount that lookup tables of the core are removed by optimization tools, if parts ofthe core are not used because outputs are unused or constant values are applied toinputs. More information can be found in [ZAT06].

71

5. IP Protection

Watermarks in Functional LUTs for Netlist Cores

After watermarking bitfile cores, we now watermark netlist cores. Netlist IP coresconsist of primitive cells (e.g., LUT4, DFF, XORCY) of a certain FPGA family whichcovers many different FPGA devices. For example, the whole Xilinx Virtex-4, orAltera Stratix-II family with all different FPGA sizes. This means, that one netlistcore can be deployed for the whole family without changing the file. Once again,we are using the Virtex-II and II Pro family to demonstrate this approach. However,using other FPGA families should also be possible by adapting the methods to theirprimitive cells. Another big advantage from netlist cores over bitfile cores is that thebitfile creator (e.g., product developer) can combine different cores.

As mentioned before in Section 5.2.2, FPGAs usually consist of the same type oflookup tables with regard to the number of inputs. For example, the Xilinx Virtex-IIuses lookup tables with four inputs whereas the Virtex-5 has lookup tables with sixinputs. However, in common netlist cores many logical lookup tables exist, whichhave less inputs than the FPGA type. These lookup tables are mapped to the physicallookup tables of the FPGA. If the logical lookup table of the netlist cores has fewerinputs than the physical one, the memory space which cannot be addressed remainsunused. We use this memory space to embed a watermark into functional lookuptables.

One problem of watermarking netlist cores is that the core further traverses the de-sign flow which includes different optimization steps. Additive watermarking meth-ods which use redundant structures or logic as watermark have the problem that theglobal optimization steps may detect and remove this redundancy. Todays designtools are very sophisticated to find redundant logic in a design. Even if a special re-dundant logic which can be used as watermark is not removed by today’s tools, it isnot granted that future versions or other tools may not detect and remove this logic.The challenge is to find an element or component which can be used as watermarkand is not altered by design tools. For Xilinx FPGAs such elements are shift registersand memories which are implemented in lookup tables.

In some FPGA architectures (e.g., all Xilinx Virtex architectures), the lookup ta-bles (LUTs) can also be used as a shift register or distrubuted memory [Xilf]. Forexample, a 4-input lookup table can be further used as a 16-bit shift register (see Fig-ure 5.3). The content of such a shift register can be further addressed by the lookuptable input ports. So, the shift register can also be used as a functional lookup table. Ifthe lookup table is used as a LUT primitive cell, the content is interpreted as logic bythe design tools and is in focus of optimization. However, if the same content is usedas a shift register or memory primitive cell, the design tools do not touch the con-tent. Using the unused memory space of functional lookup tables for watermarkingwithout converting the lookup table either to a shift register or distributed memoryturns out to be not applicable, because design flow tools identify the watermark asredundant and remove the content due to optimization. Converting the watermarked

72

5.2 Additive Watermarking of IP Cores

functional lookup table into shift registers or memory cells, prevents the watermarkfrom deletion due to optimization.

��

��

��

��

��

����

�������

��

��

��

��

����

����

Figure 5.3: In the Xilinx Virtex architecture, a lookup table (LUT4) can also beconfigured as a 16-bit shift register (SRL16).

Embedding of the Watermark In this approach we are use Virtex-II Pro FP-GAs and convert LUT1, LUT2, or LUT3 primitive cell which can be found in netlistsof IP cores into the shift register primitive cell SRLC16E. Note that LUT1 has oneinput, LUT2 two and so on. LUT4 has four input and uses the whole lookup tablememory for its function which make this type uninteresting for our approach. Thecontent of the physical 4-input lookup table in an FPGA stores 16 bits. A LUT3 prim-itive cell uses only 8 bits, a LUT2 4 bits, and LUT1 only 2 bits out of the 16 bits. TheXilinx mapping tool map duplicates the used memory area to the unused area if notall inputs are needed (see Section 5.2.2). Therefore, to use the unused memory spacefor embedding a watermark, we must restrict the memory reachability of the functionby clamping the unused inputs to constant values. In Figure 5.4, we demonstrate thisidea for an AND-function, implemented by a LUT2. By clamping input A3 and A4to zero, we can free 12 bits which can be used for carrying a watermark.

Another problem of watermarking netlist cores is that the published core is com-bined with other cores and undergoes further design flow steps, like the placementof the lookup table in the FPGA. Therefore, at the extraction of the watermark, wedo not know the locations of the watermarks. To reduce the effort for identifying thewatermarks after the extraction, we can cascade the watermarks over the shift in (D)and shift out (Q15) ports of the shift register cell. We assume, that design tools placethese chains of watermarks close together which extremely simplifies the extractionof the watermarks. Furthermore, for rebuilding the watermark from the individualextracted watermarked lookup tables, the sequence is important. To bring the dif-ferent watermarks, which have further different sizes according to the used originalfunctional lookup table cell, into the right order, we concatenate the watermark bits

73

5. IP Protection

0 0 1

0 0 0 1 ? ? ? ? ? ? ? ? ? ? ? ?

A1A2A3A4

O

A4 <= '0'

A3 <= '0' Free space for watermark

0 0 0 0 0 0 1 1 1 1 1 1 10 0 0 0 1 1 1 1 0 0 0 0 1 1 1 10 0 1 1 0 0 1 1 0 0 1 1 0 0 1 10 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

O <= A1 and A2

Figure 5.4: Example of implementing a two input AND gate using a four inputlookup table. Addressable storage is restricted by connecting the unusedinputs to zero [SZT08].

with a counter. Due to limited space, only few counter bits can be used, which resultsin a repetition of the counter values. Nevertheless, this method further simplifies thedetection and extraction of the watermark during the verification process.

The first step of embedding a watermark is to extract all lookup tables from a givennetlist core IL: LL(IL) = {lutL1, lutL2, . . . , lutLr}, where L denotes the logic abstractionlevel used for netlist cores (see Figure 5.5). Each element lutLi denotes a lookuptable primitive cell in the netlist (e.g. for Virtex-II devices, LUT1, LUT2, LUT3, orLUT4). A watermark generator GL(·, ·) must know the different lookup table cellswith the functional content as well as the unique key K to generate the watermarks:GL(K,LL(IL)) =WL.

From the unique key K a secure pseudo random sequence is generated. Some or allof the extracted lookup table primitive cells are chosen to carry a watermark. Usuallya core which is worth to be watermarked consists of many markable lookup tables.Transforming all of these lookup tables into shift registers restricts the optimizationdegree of the tools and results in non-optimal timing behavior. Therefore, only asmall subset of all suitable lookup table are chosen. Note that the shift registersmust never be shifted, because this alters the functional part of it. Nevertheless, weconnect the clock input with the clock, but the shift enable input to ground. Now, thetransformed shift registers are ordered and the first 4 bits of the free space are used forthe counter value. The other bits are initialized according to the position with valuesfrom the pseudo random stream, generated from the key K. Note that the numberof bits which can be used for the random stream depends on the original functionallookup table type.

The generated watermark WL consists of the transformed shift registers: WL ={srlL1,srlL2 , . . . ,srlLk} with k ≤ r. The watermark embedder EL inserts the water-

74

5.2 Additive Watermarking of IP Cores

����������� �

������������������

�������������

������������������

�������������

����������

�������������������

������������������������

��������������� �����

!�������∀�����

����#�����

�∃%∃&�∀������� �

��������������∀�����

��������������� �����

���������∋��� ��

����� ��������#�����

���������(����� ���

��������������� ����

������������� )∗������������������+

���,

�����������

!�������∀�������

!�������∀�������

∋� ���

���� �

Figure 5.5: The watermarking netlist core system. In the embedding system thelookup tables are extracted from the netlist core and the watermark gen-erator select suitable lookup table, transform it to shift register and addthe watermark. The embedder insert the watermark. A product devel-oper may obtain this watermarked netlist core an combine it with othercores into a product. The lookup tables from the product can be ex-tracted and transformed so, that the detector can decide if the watermarkis present or not.

75

5. IP Protection

marks into the netlist core IL by replacing the corresponding original functionallookup tables with the shift registers: EL(IL,WL) = IL. The watermarked work ILcan now be published and sold.

Extraction of the Watermark The purchased core IL can now be combined bya product developer with other purchased or self developed cores and implementedinto an FPGA bitfile: IB = TL→B(IL ◦ I′L1

◦ I′L2◦ . . .) (see Figure 5.5). An FPGA which

is programmed with this bitfile IB may be part of a product. If the product developeris accused of using an unlicensed core, the product can be purchased and the bitfilecan be read out, e.g., by wire tapping. The lookup table content and the content ofthe shift registers can be extracted from the bitfile: LB(IB) = {xB1, xB2, . . . , xBq}.

The lookup table or shift register elements xBi belong to the device abstraction levelB. The representation can differ from the representation of the same content in thelogic abstraction level L. For example, in Xilinx Virtex-II FPGAs the encoding ofthe shift register differs from the encoding of lookup tables. For shift registers the bitorder is reversed compared to the lookup table encodings. Therefore, the bitfile ele-ments must be transferred to the logic level by the corresponding decoding. This canbe done by the reverse engineering operator: TL←B(LB(IB)) = {xL1, xL2, . . . , xLq}.Reverse engineering lookup table or shift register content is however very simplecompared to reverse engineering the whole bitfile. Now, the lookup table or shift reg-ister content can be used for the watermark detectorDL which can decide if the water-mark WL is embedded in the work or not: DL(WL,{xL1, xL2, . . . , xLq}) = true/ f alse.

The detector DL searches the content of the watermarked shift register WL in theextracted lookup table contents from the bitfile. It might occur that certain water-marks will be found in more than one locations, because more of the same water-marks exist with an identical content, or a complete functional lookup table has, bychance, the value of a watermarked one. To simplify the extraction, the watermarksare chained together by the shift in and out ports. It is likely that these watermarksare placed close together. From the bitfile lookup table extraction LB, we also havethe locations of the possible watermarks. Using these locations we can in most casesidentify the right watermark, if duplicates exist. Note that this chaining approach isnot mandatory, but elevates the robustness of the approach against ambiguity attacks.

After the detection of the watermark WL inside the bitfile IB, the watermark mustbe verified similar to the watermarking approach for bitfile cores proposed in Section5.2.3.

Additional Reading More information about this method can be found in [SZT08].

76

5.2 Additive Watermarking of IP Cores

Power Watermarking

This section introduces watermarking techniques, where a signature is verified overthe power consumption pattern of an FPGA. These techniques may also be suitablefor ASIC designs, however, we concentrate on FPGA designs and develop severalenhancements which are exclusively related to the FPGA technology. The presentedidea is new and differs from [KJJ99] and [AARR03] where the goal of using poweranalysis techniques is the detection of cryptographic keys and other security issues.

For power watermarking methods, the term signature refers to the part of the wa-termark which can be extracted and is needed for the detection and verification of thewatermark. The signature is usually a bit sequence which is derived from the uniquekey for author and core identification.

First of all, a short introduction is given and the communication channel betweenthe generation and the detection of the watermark is discussed. Next, the basismethod is presented and afterwards, several enhanced methods which increase therobustness of decoding the watermark in case of external or internal disturbances areintroduced. Finally, multiplex methods are discussed which enable the detection ofmore than one watermark if multiple watermarked cores are present in the design.

Verification over Power Consumption There is no way to measure the rel-ative power consumption of an FPGA directly, only through measuring the relativesupply voltage or current. We have decided to measure the voltage of the core asclose as possible to the voltage supply pins such that the smoothing from the planeand block capacities are minimal and no shunt is required. Most FPGAs have ballgrid array (BGA) packages and the majority of them have vias to the back of thePCB for the supply voltage pins. So, the voltage can be measured on the rear side ofthe PCB using an oscilloscope. The voltage can be sampled using a standard oscil-loscope, and analyzed and decoded using a program developed to run on a PC. Thedecoded signature can be compared with the original signature and thus, the water-mark can be verified. This method has the advantage of being non-destructive andrequires no further information or aids than the given product (see Figure 5.6).

The consumed power of an FPGA can be divided into two parts, namely the staticand the dynamic power. The static power consumption is caused by the leakage cur-rent from CMOS transistors and does not change over time if the temperature staysconstant. The dynamic power consists of the power related to short circuit currentsand the power required of reloading the capacities of transistors and wires. The shortcircuit current occurs when the PMOS and the NMOS transistors are both in conduct-ing state for a short time during the switching activity. As shown in [SKB02], themain part of an FPGA’s dynamic power results from capacity reloading. Both partsof the dynamic power consumption depend on the switching frequency [CSB92].

What happens to the core voltage, if many switching activities occur at the sametime, at the rising edge of a clock signal? It is interesting to observe that the core

77

5. IP Protection

����

������ ��� �� ��� � ���

���� ������

��

� � ������� ���������

������ ���� �! ���� �∀����� ��

��������

#� ��

���∃

%��#��

Figure 5.6: Watermark verification using power signature analysis: From a signa-ture (watermark), a power pattern inside the core will be generated thatcan be probed at the voltage supply pins of the FPGA. From the trace,a detection algorithm verifies the existence of the watermark.

supply voltage drops and rises (see Figure 5.7). In the frequency domain, the clockfrequency with harmonics and even integer divisions are present (see Figure 5.8).The real behavior of the core voltage depends on the individual FPGA, the individualprinted circuit board and the individual voltage supply circuits.

In the following, we seek for techniques to encode a watermark such that the corevoltage is subject to change once the watermark is processed within a core. In thefirst method, the frequency of the voltage drops shall be influenced, in the second, theamplitude of the voltage drops shall be manipulated.

In the first case, a watermark can be identified if we produce another frequencyline in the spectrum of the core voltage which is not an integral multiple or a rationalfraction of the clock frequency. For achieving this, we need a circuit that consumesa considerable amount of power and generates a signature-specific power pattern,and a clock which can be identified in the spectrum. The power consumer can be,for example, an additional shift register. If we would derive the clock source fromthe operational clock, we would not be able to distinguish the frequency line in thespectrum from operational logic. Another opportunity is to generate a clock using

78

5.2 Additive Watermarking of IP Cores

���������

�� � � � � � � �

��

� � �������

����

��

��

��

��

�����

Figure 5.7: A measured voltage signal at the voltage supply pin of an FPGA. Thecore supply voltage drops and rises. Note that the DC component isfiltered out.

combinatorial logic. This clock could be identified as a watermark, but the jitter of acombinatorial clock source might be very high, and no clean frequency line could beseen in the spectrum. This means that we need a higher additional power consumerto make the watermark readable. Another drawback is that we have only limitedpossibilities to encode a signature reliably in these frequency lines.

In the following approaches, we alter the amplitude of the interferences in thecore voltage. The basic idea is to add a power pattern generator (e.g., a set of shiftregisters), and clock it either with the operational clock or an integer division thereof.Further, we control these power pattern generators according to the characteristics ofthe data sequence which should be sent, respectively detected. A logical ’1’ lets thepower consumer operate one cycle (e.g., perform a shift), a ’0’ causes no operation.We detect higher amplitudes in the voltage profile over time corresponding to theones and smaller amplitudes according to the zeros. Note that the amplitude for theno-operation state is not zero, because the operational logic and the clock tree is stillactive.

79

5. IP Protection

�� ����� ���

�������� ��� ����

����

���

����

����

����

����

����

����

����

���������

Figure 5.8: The spectrum of the measured signal in Figure 5.7. The clock frequencyof 50 MHz and harmonics can be seen. Also, a peak at the half of theclock frequency is visible which is caused by switching activities fromthe logic.

The advantage of power watermarking methods is that the signature can easily beread out from a given device. Only the core voltage of the FPGA must be measuredand recorded. No bitfile is required which needs to be reverse-engineered. Further-more, these methods work also for encrypted bitfiles whereas methods where thesignature is extracted from the bitfile fail. Moreover, we are able to sign netlist cores,because our watermarking algorithm does not need any placement information. So,also cores at this level can be protectedly watermarked.

Basic Method In this section, we describe the basic method for power water-marking of netlist cores. The concept, the embedding of the watermark, as well as thedetection and verification procedure are described. The encoding and decoding forsending the signature through the FPGA power communication channel is relativelysimple and straightforward in the basic method and will be refined later on with theenhanced methods. However, the basic concepts of embedding and the verificationare very similar in all methods.

80

5.2 Additive Watermarking of IP Cores

For power watermarking, two shift registers are used, a large one for causing a rec-ognizable signature-dependent power consumption pattern, and a shift register stor-ing the signature itself (see Figure 5.6 in Section 5.2.2). The signature shift registeris clocked by the operational clock and the output bit enables the power pattern gen-erator. If the output bit is a ’1’, the power pattern register will be shifted at the nextrising edge of the operational clock. At a ’0’, no shift is done. Therefore, the channelencoding is Z = {(γ,1,1),(γ,1,1)}. To avoid interference from the operational logicin the measured voltage, the signature is only generated during the reset phase of thecore.

As mentioned before in Section 5.2.2, a shift register can also be used as a lookuptable and vice versa in many FPGA architectures (see Figure 5.3 in Section 5.2.2).A conversion of functional lookup tables into shift registers does not affect the func-tionality if the new inputs are set correctly. This allows us to use functional logicfor implementing the power pattern generator. The core operates in two modes, thefunctional mode and the reset mode. In the functional mode, the shift is disabled andthe shift register operates as a normal lookup table. In the reset mode, the content isshifted according to the signature bits and consumes power which can be measuredoutside of the FPGA. To prevent the loss of the content of the lookup table, the outputof the shift register is fed back to the input, so the content is shifted circularly. Whenthe core changes to the functional mode, the content must be shifted to the properposition to have a functional lookup table for the core.

The amplitude of the generated power signature depends on the number and con-tent of the converted lookup tables. It will be assumed that the transitions betweenzeros and ones in the bit pattern of the lookup table contents are sufficient to pro-duce a recognizable pattern on the supply voltage. Experimental results in [Bau08]show that, on average, 8 of maximal 16 transitions are generated in functional 4 inputlookup tables of example cores if the content will be shifted.

To increase the robustness against removal and ambiguity attacks, the content ofthe power consumption shift register which is also part of the functional logic can beinitialized shifted. Only during the reset state, when the signature is transmitted, thecontent of the functional lookup table can be positioned correctly. So, normal coreoperation cannot start before the signature was transmitted completely. The advan-tage is that the core is only able to work after sending the signature. Furthermore, toavoid a too short reset time in which the watermark cannot be detected exactly, theright functionality will only be established if the reset state is longer than a predefinedtime. This prevents the user from leaving out or shorten the reset state with the resultthat the signature cannot be detected properly.

The signature itself can be implemented as a part of the functional logic in the sameway. Some lookup tables are connected together and the content, the function of theLUTs, represents the signature. Furthermore, techniques described in Section 5.2.2can be used to combine an additional watermark and the functional part in a singlelookup table if not all lookup table inputs are used for the function. For example,

81

5. IP Protection

LUT2 primitives in Xilinx Virtex-II devices can be used to carry an additional 12-bit watermark by restricting the reachability of the functional lookup table throughclamping certain signals to constant values. Therefore, the final sending sequenceconsists of the functional part and the additional watermark. This principle makes italmost impossible for an attacker to change the content of the signature shift register.Altering the signature would also affect the functional core and thus result in a corruptcore.

The advantages of using the functional logic of the core also as a shift register area reduced resource overhead for watermarking and the robustness of this method, be-cause these shift registers are embedded in the functional design and it is hard, if notimpossible, to remove the shift registers without destroying the functionality of thecore. Furthermore, our watermarking procedure is difficult to be detected in a netlistfile, because the main part of the required logic for signature creation depends on thefunctional logic for the proper core. Another benefit is that our watermark cannotbe removed by an optimization step during the mapping into CLBs (ConfigurableLogic Blocks). Nevertheless, if an attacker had special knowledge of the watermark-ing method and of the EDIF netlist format, he may reverse-engineer the alterationof the embedding algorithm and remove or disable the sending method. This canbe avoided by initializing the power pattern register with shifted lookup table con-tents (see above). If sending of the signature is prevented, the core will not functionproperly.

Embedding of the Watermark In this section, we describe the procedure ofwatermarking a core. The first step is to generate the watermark WL for embeddingat the logic abstraction layer L. As described in the last section, the watermark isa bit sequence, consisting either of random choice bits, of partly functional bits oflookup tables, or completely of functional bits. The watermark generation proceduredepends on the sequence type.

If only random choice bits are used, the watermark generated needs only the uniquekey K which identifies the author of the core: GL(K) =WL. The pseudo random out-put can be split into different shift registers: WL = {wL1,wL2, . . . ,wLm}. The numberof used shift registers m depends on the strength of the generated signature and theFPGA architecture. For example, a 128-bit signature can be stored in the Virtex-IIarchitecture in m = 8 shift registers.

If the content of functional lookup tables should be used as signature, the first stepis to extract all lookup tables form the netlist core: LL(IL) = {lutL1 , lutL2, . . . , lutLr}.The watermark generator GL searches for suitable functional lookup tables, trans-forms these into shift registers and either adds the watermark bits form the pseudorandom sequence GL(K,LL(IL)) = WL, or only uses the lookup table content as sig-nature: GL(LL(IL)) =WL.

82

5.2 Additive Watermarking of IP Cores

The watermark embedder EL(IL,WL) = IL consists of two steps. First, the core ILmust be embedded in a wrapper which contains the control logic for emitting thesignature. This step is done at the register-transfer level before synthesis. The secondstep is at the logic level after the synthesis. A program converts suitable lookup tables(for example LUT4 for Virtex-II FPGAs) into shift registers for the generation of thepower pattern and attaches the corresponding control signal from the control logic inthe wrapper (see Figure 5.9).

�������

����

��� �

�� � ����

��� ���

���� ����

���� ���

���������

����

����

��� �

�� � ����

��� ���

���� ����

���� ���

����������

�����

���� ���

��������

Figure 5.9: The core and the wrapper before (above) and after (below) the netlistalternation step. The signal “wmne” is an enable signal for shifting thepower pattern generator shift register.

The wrapper contains the control logic for emitting the watermark and the shiftregister, holding the signature. If functional lookup tables are used for implementingthe signature shift register, we add or convert this shift register in the second stepso that the wrapper contains only the control logic. Some control signals have nosink yet, because the sink will be added in the second step (e.g., the power patterngenerator shift register). So we must use synthesis constraints to prevent the synthesis

83

5. IP Protection

tool from optimizing these signals away. The ports of the wrapper are the same forthe core, so we can easily integrate this wrapper into the hierarchy. The controllogic shifts the signature shift register, while the core is in reset state. Also, thepower pattern shift register is shifted corresponding to the output of the signatureshift register. If the reset input of the wrapper gets inactive, the function of the corecannot start at the same cycle, because the positions of the content in the shift registerare not in the correct state. The control logic shifts the register content into the correctposition and leaves the reset state to start the normal operation mode.

The translation of lookup tables of the functional logic into shift registers is doneat the logic level. At Xilinx Virtex-II FPGAs, the usage of a LUT4 as a 16-bit shiftregister (SRL16) is only possible if the LUT4 is not part of a multiplexer logic, be-cause the additional shift logic and the multiplexer share common resources in a slice.Also, if the lookup table is a part of an adder, the mapping tool splits the lookup tableand the carry chain. In these two cases, additional slices would be required, so we donot convert these lookup tables into shift registers.

The embedding procedure for Virtex-II netlist cores is done by a program whichparses an EDIF netlist and writes back the modified EDIF netlist. First, the programreads all LUT4 instances and only select those that are not a “MUXF5”, a “MUXCY”or an “XORCY”. Then, the instances are converted to a shift register (SRL16), ifrequired, initialized with the shifted value and connected to the clock and the water-mark enable (wmne) signal according to Figure 5.9. Always two shift registers areconnected together to rotate their contents. Finally, the modified netlist is created.The watermarked core IL is now ready for purchase or publication.

Detection Algorithm A company may obtain an unlicensed version of the core ILand embeds this core in a product: IP = TL→B(IL ◦ I′L1

◦ I′L2◦ . . .). If the core developer

has a suspicious fact, he can buy the product and verify that his signature is inside thecore using a detection function DP(IP,WL) = true/ f alse.

Detecting the basic power watermark, the measured voltage will be probed, digi-tized and decoded by a signature detection algorithm (see Figure 5.10). To decodethe digitalized voltage signal, the sampling rate, the clock frequency of the shiftedsignature and the bit length of the signature is needed. The clock frequency can beextracted using the Fast Fourier Transformation (FFT) of the measured signal. Ourdetection algorithm consists of five steps: down sampling, differential step, accu-mulation step, phase detection and quantization (see Figure 5.10). After successfulextraction, the decoded signature can be compared to the signature inside the water-mark WL to establish the ownership. Furthermore, the signature must be verified bycryptographic methods with the author’s unique key K.

As mentioned before, the main characteristic caused by a switching event is thedrop of the voltage followed by a subsequent overshoot. This results in extremeslopes. The basic method detection algorithm can find each rising edge as follows:

84

5.2 Additive Watermarking of IP Cores

����������� ������

�����������

����������������������

���������������

��

���������������� �

��������������

�����������

����������������

���

��

���

Figure 5.10: The five steps of the watermark detection algorithm: downsam-pling, differential and accumulation step, phase detection and finallyquantization.

85

5. IP Protection

First, the measured signal will be sampled down from the recorded sample rate to thequadruple of the clock frequency, so each signature bit is represented by four samples.Then, the discrete derivative of the signal will be calculated. This transforms therising edges of the switching events into peaks. The easiest way to calculate thediscrete derivative at a discrete point in time SD[k] is to take the difference of twosubsequent samples over time (see Figure 5.11).

SD[k] = SDS[k]−SDS[k−1], (5.1)

where SDS is the down sampled probed voltage signal and k denotes the sample index.

0 2 4 6 8 10 12 14 16−4

−2

2

k

SD

S[k

]

0 2 4 6 8 10 12 14 16−4

−2

2

4

6

k

SD[k

]

Figure 5.11: An example voltage signal which represents the signature “0011”(above). The example voltage signal after the differentiation step(below).

Since the signature is repeated many times during the reset state, the signal canbe accumulated and averaged to reduce the noise level. To accumulate the coherentpattern, we need to know the bit length of the signature. If we record a longer signalsequence, we can accumulate more patterns and reduce noise as well as switchingevents which do not belong to the power consumption register of the watermarkingalgorithm. The disadvantage is that we would need a longer time for the reset phase.

86

5.2 Additive Watermarking of IP Cores

After this third step, we have a signal in which each signature bit is represented byfour samples. But only one sample carries the information of the rising edge. Sincethe measurement is not synchronized with the FPGA clock, the phase (position) ofthe relevant sample of a bit is unknown. We divide the signal into four new signals,where one signature bit is represented in one sample. The four signals have a phaseshift of 90o to each other. Let

SAS[k], k = 0,1, ..,4m−1 (5.2)

denote the sampled voltage signal after the accumulation step where m is the lengthof the signature. Then, we obtain the four following phase shifted signals

S0 = SAS[4k], k = 0,1, ..,m−1 (5.3)S90 = SAS[4k+1], ” (5.4)

S180 = SAS[4k+2], ” (5.5)S270 = SAS[4k+3], ” (5.6)

where SAS is the accumulated signal and S0, S90, S180, and S270 are the phase signals(see Figure 5.12).

We are able to extract the right phase of the signal if we calculate the mean value ofeach phase-shifted signal. The maximal mean value corresponds to the correct phase,because the switching event should cause the greatest rising edge in the signal.

Now, we have a signal in which each sample is represented by the accumulatedswitching activities of one bit of the signature. The decision if the sample correspondsto a signature bit ’1’ or ’0’ can be done by comparing the sample value with the meanvalue of the signal. If the sample value is higher than the mean value, the algorithmdecides a ’1’, in the other case a ’0’.

Robustness Analysis The most common attacks against watermarking are re-moval, ambiguity, key copy, and copy attacks. Once again, key copy attacks can beprevented by asymmetric cryptographic methods, and there is no protection againstcopy attacks.

Removal attacks most likely take place on the logic level instead of the device levelwhere it is really hard to alter the design. The signature and power shift registers aswell as the watermark sending control logic in the wrapper are mixed with functionalelements in the netlist. Therefore, they are not easy to detect. Even if an attackeris able to identify the sending logic, a deactivation is useless if the content of thepower shift register is only shifted into correct positions after sending the signature.By preventing the sending of the watermark, the core is unable to start. Anotherpossibility is to alter the signature inside the shift register. The attacker may analyzethe netlist to find the place were the signature is stored. This attack is only successful

87

5. IP Protection

2 4 6 8 10 12 14 16−5

5

k

SA

S[k

]

1 1.5 2 2.5 3 3.5 4−5

5

k

S0[k

]

1 1.5 2 2.5 3 3.5 4−5

5

k

S90[k

]

1 1.5 2 2.5 3 3.5 4−5

5

k

S180[k

]

1 1.5 2 2.5 3 3.5 4−5

5

k

S270[k

]

Figure 5.12: The example voltage signal after the accumulation step (above) andthe four phase shifted signals (below). Here, S180 corresponds to theright phasing.

if there is no functional logic part mixed with the signature. By mixing the randombits with functional bits, it is hard to alter the signature without destroying the correctfunctionality of the core. Therefore, this watermark technique can be considered asresistant against removal attacks.

In case of ambiguity attacks, an attacker analyses the power consumption of theFPGA in order to find a fake watermark, or to implement a core whose power patterndisturbs the detection of the watermark. In order to trustfully fake watermarks insidethe power consumption signal, the attacker must present the insertion and sendingprocedure which should be impossible without using an additional core. Anotherpossibility for the attacker is to implement a disturbance core which needs a lot ofpower and makes the detection of the watermark impossible. In the next sections,enhanced robustness encoding methods are presented which increase the possibilityto decode the signature, even if other cores are operating during the sending of thesignature. Although a disturbance core might be successful, this core needs areaand most notably power which increases the costs for the product. The presence of adisturbance core in a product is also suspicious and might lead to further investigationif a copyright infringement has occurred. Finally, the attacker may watermark another

88

5.2 Additive Watermarking of IP Cores

core with his watermark and claim that all cores belong to him. This can be preventedby adding a hash value of the original core without the watermark to the signaturelike in the bitfile watermarking method for netlist cores.

Enhanced Robustness Encoding Method Experimental results have shownthat the decoding of the signature with the basic method works well, but on sometargets, problems occur in the decoding of signatures with long runs of ’1’ followedby many zeros, like “1111111100000000”. The problem is intersymbol interference,because the transmitting slot for one symbol in the basic method might be smallerthan the symbol length. For the first eight bits, we see a huge amplitude in Figure5.13. Then, a phase in which the amplitude is faded out is observed. The phase canlast many clock cycles and may lead to wrong detection results of the following bits.

� ��� ���� ������ � ����� �

����

�����

������

����

�����

����

�����

�����

������

�����

������

����������

Figure 5.13: Measured voltage supply signal when sending “FFFF0000” with alarge power pattern generator shift register.

This fading out amplitude belongs to an overlaid frequency which might be pro-duced by a resonance circuit that consists of the capacitances and resistances fromthe power supply plane and its blocking capacitances. This behavior is dependent onthe printed circuit board and the power supply circuit.

To avoid such a false detection, the transmission time of one symbol is extended bythe time of the swing out of the printed circuit board by sending the same signature

89

5. IP Protection

bit multiple times: Z = {(γ,1,ω),(γ,1,ω)}. The repetition rate for each signaturebit is ω clock cycles. If we connect two SRL16 together, one period for this shiftregister needs 32 clock cycles. If the reset phase ends and we have finished sendingone bit, the content in the shift register which also represents a part of the logic of thecore is in the correct position.

The detection algorithm differs for this method. First, the signal will be sampleddown and the approximate derivation will be calculated as in the original method (seeSection 5.2.2). Now, we average the signal to suppress the noise. Here, the lengthof one signature word is the length of the signature (m) multiplied by the number oftimes each bit is sent (ω).

SD[k], k = 0,1, ..,Kmax−1 (5.7)

ns =⌊ Kmax

4ω ·m

⌋, (5.8)

S =1ns

ns−1

∑i=0

D[4ω ·m · i, ..,4ω ·n · i+4ω ·m−1], (5.9)

Here, SD is the voltage signal after the differential step with index k and ns being thenumber of repetitions of the pattern in SD.

The phase detection of the shift clock is the same as in the original method (seeSection 5.2.2), but we also need the position p where a new signature bit starts. Thisis done in a loop to detect this position. In the beginning, we assume that the startingposition is the beginning of our trace (p = 0). First, we accumulate ω successivevalues where ω is the repetition of one bit:

Sp[ j] =ω−1

∑i=0

Sφ [i+ p+ω j], j = 0,1, ..,m−1 (5.10)

Here, Sφ denotes the signal after the phase detection step. Now, we subtract the meanvalue and generate the absolute value and calculate the sum of it.

Fp =n−1

∑i=0

∣∣∣Sp[i]−1n

n−1

∑j=0

Sp[ j]∣∣∣ (5.11)

Fp identifies how good our signature bit starting position p fits the real position.Now, we shift our trace one value (p = 1) and calculate the fitting value again, andso on. This is done ω times. The starting position with the best fitting value will beused.

The decoding of the watermark signature is done like in the basic method (seeSection 5.2.2) by comparing the sample values with the mean value of the samples.

90

5.2 Additive Watermarking of IP Cores

BPSK Detection Method The enhanced robustness method introduced aboveworks well, but if other cores with the same clock frequency have a very high togglerate in the reset phase of the watermarked core, the quality of decoding may suffer.In the worst case, the decoding is not possible, because the watermarked signal is tooweak in contrast to the interferences with the same frequency generated by the hightoggle rate of the other cores.

To enhance the robustness of decoding our transmit signal in case of interferenceswith the same frequency, we combine a new sending scheme with a new detectionalgorithm. The basic idea is to shift the carrier frequency of our watermarking signalaway from the clock frequency of the chip, where we expect most of the interferencesto occur.

We introduce a new binary signal SBPSK with the frequency fBPSK , where the signa-ture bits are modulated using Binary Phase Shift Keying (BPSK) modulation. UsingBPSK modulation, each value of a signature bit (0,1) is represented by a phase (usu-ally 0◦ and 180◦). Practically, by sending a ’0’, the carrier signal is not altered, andis inverted by sending a ’1’ (see Figure 5.14).

� �

��

�������� ���

���

������������

�������������

Figure 5.14: Shown is a carrier signal Scarrier and the BPSK modulated signalSBPSK . The signature bit value ’0’ is decoded with 0◦ and the value’1’ is decoded with 180◦.

We generate the new frequency fBPSK by an on-off keying (OOK) modulation, abinary amplitude modulation (AM) of the clock frequency fclk. So, the frequencyfBPSK must be a rational fraction of the clock frequency fclk. However, interferencesfrom working cores have also an impact here, because these frequencies could alsobe produced by working cores with different toggle rates. The measurements sug-gest that frequencies f = fclk

2n may have high interference from working cores dueto whereas other frequencies have lower interference. The interferences decrease aswell at lower derived frequencies. In the following, we choose fBPSK = fclk

10 as ourcarrier frequency.

91

5. IP Protection

To generate the new watermark signal, the power pattern generator is driven by thesignal SBPSK and performs the OOK modulation. The encoding scheme for the signalSBPSK is: Z = {(γ,1,ω),(γ,1,ω)}, where ω is chosen 10 in our case. To send thesignal SBPSK for one period, we first send five ones (the power pattern shift register isshifted five times) and then five zeros (the power pattern shift register is not shifted)in case the signature bit is ’1’. If the signature bit is ’0’, first five zeros and thenfive ones are sent (see Figure 5.15). For each signature bit, we repeat this period 32times to ensure that the content of the power pattern shift registers which are alsofunctional lookup tables are in the correct positions after sending one signature bit.Repetition allows to detect the signature with a higher probability. The decreased bitrate results in a smaller bandwidth for our watermarking signal. Using this method,we need more time to send the signature than the previously presented methods. Thesignature bit rate fwm is:

fwm =fBPSK

32=

fclk

10 ·32=

fclk

320(5.12)

��

��������� ���

���������

����

� �

Figure 5.15: The signal SBPSK is the BPSK modulated signal of the signature above.The signal below is the voltage signal which is the OOK modulatedsignal of SBPSK . This figure also illustrate the different frequencies.

The watermark control inside the wrapper (see Section 5.2.2) is altered to controlthe power pattern generator in this way. Only few additional resources are used toimplement this enhanced watermark protocol.

If we look at the spectrum of the recorded signal (see Figure 5.16), we detect theclock frequency fclk and two side bands from the OOK modulation fclk− fBPSK andfclk + fBPSK .

The detection algorithm for this method is different from the previous methods.Only the first (down sampling) and the last steps (quantization) are identical (see

92

5.2 Additive Watermarking of IP Cores

�� ����

�� ��

������� ��������

��

��

��

��

��

��������

��

��

�����

Figure 5.16: The spectrum of a measured signal. The clock frequency of 50MHzand the two side bands of the modulated signal SBPSK are shown at45MHz and 55MHz.

93

5. IP Protection

Figure 5.17). After down sampling, the two side bands of the carrier signal are mixeddown into the base band (Ssb1 and Ssb2) and are combined (Scc) as follows:

SDS[k], k = 0,1, ..,Kmax−1 (5.13)

Ssb1[k] = SDS[k] · e− j2π·( 14−

140 )·k, (5.14)

Ssb2[k] = SDS[k] · e− j2π·( 14+

140 )·k, (5.15)

Scc[k] = Ssb1 +Ssb2, (5.16)

where SDS is the voltage signal after down sampling with index k. The clock fre-quency is fclk =

14 · fsample, and the frequency fBSPK = 1

10 · fclk =140 · fsample. The

sample frequency of the recorded voltage signal is fsample. After low pass filtering ofScc, we get the complex carrier signal SBPSK (see Figure 5.18).

����������� �������

����������

����������������������

�����������

������������

������������������������������

�������������

�� ��������������

��� ���������

���

���

���

�����

Figure 5.17: The different steps of the BPSK detection algorithm.

Scc is filtered using a matched filter to obtain the limits of one signature bit andthe correct sample point. All samples of SBPSK which belong to one signature bitare summed up into this sample point by the matched filter. At the down samplingstep, only these points are used to represent the signature bits. Now, the angle of the

94

5.2 Additive Watermarking of IP Cores

��� ����� ���

����� ��� �����

���

����

���

���

���

����

����

����

��

����� ���

�����

Figure 5.18: The constellation diagram of the down mixed complex signal SBPSK .Here, the two different BPSK constellation points for the signature bit’1’ and ’0’ are shown.

signal is calculated from the signature bit with the highest amplitude, and the signalis rotated into the real plane. From the real valued signal, the value of the bits andthe quality of the signal are determined similar to the other detection algorithms (seeSection 5.2.2).

The advantage of the BPSK method is its robustness with respect to interferencescoupled with the clock frequency. The disadvantages are the longer reset phase andthe fact that we can only detect bit value changes and not the signature bit value di-rectly due to the BPSK modulation. Using proper encoding methods and preambles,the bit values can be reconstructed.

Additional Reading This section gave an overview of multiplexing methods suit-able for power watermarking with many watermarked cores inside an FPGA send-ing simultaneous signatures. More details about these methods can be found in[ZT06, ZT08b, Zie10, Bau08].

95

5. IP Protection

5.2.3 Bitfile Cores

The approach of Lach and others watermarks bitfile cores by encoding the signa-ture into unused lookup tables [LMSP98]. At first, the signature will be hashed andcoded with an error correction code (ECC) to be able to reconstruct the signatureeven if some lookup tables are lost, e.g., during tampering. After the initial place androute pass, the number of unused lookup tables will be determined. The signature issplit into the size of the lookup tables and additional LUTs are added to the design.Then, the place and route process will be started again with the watermarked design.Later, the approach was improved by using many small watermarks instead of a sin-gle large one [LMSP99]. The size of the watermarks should be limited by the size ofa lookup table. The advantage is that small watermarks are easier to search for, andfor verification, only a part of all of watermark positions must be published. Withthe knowledge of the published position, the watermark can be easily removed byan attacker. At the verification process, only a few positions of the watermark needto be used to establish the ownership. A second improvement is that a fingerprintingtechnology is added to the approach that enables the owner to see which customer hasgiven the core away [LMSP01]. The fingerprinting technology is achieved by divid-ing the FPGA into tiles. In each tile, one lookup table is reserved for the watermark.The position of the mark in the tile encodes the fingerprint. For verification, it ispossible to read out the content of the lookup table from a bitfile. So, these methodsare easy to verify. It’s more difficult to determine the position of the watermark in atile, but it’s still generally possible. However, if an attacker knows the position of thewatermark, it is easy to overwrite it.

Saha and others present a watermarking strategy for FPGA bitfiles by subdividingthe lookup table locations into sets of 2× 2 tiles [SSK07]. The number of usedlookup tables in a set is used as signature. From an initial level, additional lookuptables are added to achieve the fill level according to the signature. The input andoutput are connected to the don’t care inputs of the neighboring cells. Kahng andothers show in [KLMS+98] that the configuration of the multiplexer of unused CLBoutputs in FPGA bitfiles can carry a signature. The signature is embedded after thebitfile creation and by knowing the encoding of the bitfile. These configuration bitscan be later extracted to verify the signature.

Van Le and Desmedt show that these additional watermark schemes for bitfilecores can be easily attacked by reverse engineering, watermark localization, and sub-sequent watermark removal [LD03]. A simple algorithm is introduced which iden-tifies lookup tables or multiplexers whose outputs are not connected to any outputpins. However, these attacks are only successful if reverse engineering of the bitfileis possible and the costs of reverse engineering are not too high.

Finally, Kean and others present a watermarking strategy where a signature is em-bedded into an FPGA bitfile core or design [KMM08]. The read out of the signature

96

5.2 Additive Watermarking of IP Cores

is done by measuring the temperature of the FPGA. This approach is commerciallyavailable as the product DesignTag from Algotronix.

Watermarks in LUTs for Bitfile Cores

In this section, we introduce our first watermarking technique for IP cores. The easi-est way to watermark an FPGA design is to place the watermarks into the bitfiles. Bit-files are very inflexible because they were specifically generated for a certain FPGAdevice type, however, it makes sense to sell bitfile IP cores for common developmentplatforms which carry the same FPGA type. Usually, a bitfile core is a whole designwhich is completely placed and routed and therefore ready to use. There also existpartial bitfiles which carry only one core. These partial bitfile cores can be com-bined into one FPGA which increases the flexibility of these cores and therefore mayincrease the trade possibilities.

In this approach, we hide our signature inside unused lookup tables. It is veryunlikely that a design or bitfile core uses all available lookup tables in an FPGA.Before a design reaches this limit, the routing resources are exhausted and the timingdegenerates rapidly. Therefore, many unused lookup tables exist in usual designs. Onthe other hand, lookup table content extraction is not difficult. Using lookup tablesfor hiding a watermark which are far away from the used ones, makes it easier for anattacker to identify and remove them. Even if an attacker is able to extract all lookuptables from a bitfile core, the lookup tables which carry the watermark should not besuspicious.

In Xilinx devices, lookup tables are grouped together with flip-flops into slices. Aslice usually consists more than one lookup table, e.g., the Virtex-II and Virtex-II Prodevices have two lookup tables in one slice. It is not unusual that only one lookuptable of a slice is used and the other remains unused. Hiding a watermark in theunused lookup table of a used slice is less obvious than using lookup tables in unusedslices. Even if the attacker is able to extract the lookup table content and coordinates,the watermarks are hard to detect.

The extraction and verification of the watermark is rather easy. First of all, thecontent and the coordinates of all used lookup table of the core are extracted. For theverification there exist two approaches: a blind approach and a non-blind approach.In the blind approach, the watermarks are searched in all extracted lookup table con-tents, whereas in the non-blind approach the location of the watermarks are known.Having the right coordinates, the watermarked lookup table content can be directlycompared to the watermarks of the core developer. The locations of the watermarksdelivered from the core developer, however, should be kept secret, because otherwiseit is very easy for an attacker to remove the marks.

Concept In the following, the watermark approach is described in detail. For wa-termarking a bitfile core, the watermarks which should be embedded into the unused

97

5. IP Protection

lookup tables must be generated. This is done by the watermark generator function:GB(K) =WB. The generator needs a unique key K which identifies the author as wellas the core and the authors private key as input. The output is a set of watermarksWB = (wB1,wB2, · · · ,wBm). Each element wBi must fit into a single lookup table. ForXilinx Virtex-II and II Pro FPGAs, which use 4-input lookup tables, the size is 16bit.

Additionally, the number of usable lookup tables which can carry a watermarkmust be determined. This can be done by extracting all lookup table contents andcoordinates: LB(IB) = {xB1 ,xB2, . . . ,xBq}. The next step is to find suitable locationcandidates which can carry a watermark. For Xilinx Virtex-II and II Pro FPGAs,possible candidates are unused lookup table in a used slice. Such candidates canbe easily determined, because they carry the initialization value 0xFFFF, whereasunused lookup tables in unused slices have 0x0000 as initialization value. Thehigher the number of location candidates and therefore watermarked lookup tablesis, the more reliable is the proof of authorship. For example, if only one lookup tablecandidate was found, only 24 = 16 different watermark values overall exist, whichmakes the proof of authorship contradictable.

The content of the chosen locations of the bitfile core IB can be replaced by thewatermarks WB with the embedder IB = EB(IB,WB) (see Figure 5.19). The result isthe watermarked bitfile core IB. The distance DistB(IB, IB) between the watermarkedand original core is low, because of the functional correctness and all electrical prop-erties of the core are preserved. Furthermore, if the watermarks are near to the func-tional lookup tables, the watermarks cannot be easily distinguished from the func-tional lookup tables.

For extracting the watermarks, we need the bitfile IB from the accused company,and the locations of the watermarks (see Figure 5.19). The first step is to extractthe content and coordinates from all lookup table in IB: LB(IB) = {xB1 , xB2, . . . , xBq}.Using the locations from the core developer, the watermarks WB can be identified.By comparing these watermarks to the watermarks WB of the core developer, thedetection process DB(IB,WB) = true/ f alse can be finished.

Robustness Analysis Attacks against the watermarking scheme are ambiguity,removal, and key copy attacks. The prevention of the copy attack, where an attackerwatermarks an IP core which he illegally obtained with his own signature, is almostimpossible. A possible solution of this dilemma is to watermark all published worksor register the core on a trusted third party institute.

In case of removal attacks, the attacker tries to remove the watermarks. If he knowsthe location of the watermarks this task is easy. Therefore, it is utmost important thatthe locations of the watermarked are kept secret. If the attacker does not know thelocations, he can try to analyze the bitfile. If he is only able to extract the lookup tablecontent and the locations of the lookup tables, it is almost impossible to detect the

98

5.2 Additive Watermarking of IP Cores

�������������� ���

����������� ���

������������������

�������������

����������

�������������������

������������������������

��������������� �����

!�������∀�����

� ����

�#∃#%�∀������� �

��������������∀��� ��

��������������� �����

���������∀�������

����� �������������

���������&����� ���

��������������� ����

������������� ∋(������������������)

���∗

Figure 5.19: The watermarking bitfile IP core approach consists of an embeddingsystem and an authentication system. The embedder needs the authorinformation and the bitfile core and the result is the watermarked coreIB which can be published. The authorship of the core can be estab-lished by extracting and comparing the watermark and verifying theauthentication of the watermark with the author information.

watermark, because the locations are near the functional lookup tables and the contentis not distinguishable from the other lookup tables. However, if the attacker is able toreverse engineer the bitfile core to the logic level (IL = TL←B(IB)), the watermarks areeasy to detect and can be removed. This task is, however, very expensive if no reverseengineering tool is available. For Virtex-II devices the Xilinx “reverse engineering”tool JBits [Xilc] is available, which is in fact able to remove the watermarks.

The attacker may analyze the bitfile core and search for lookup table content whichhe can present as his own watermark in case of ambiguity attacks. He can use the in-serted watermarks and assert that these watermarks belong to him. To be successfulwith such an attack, he must also present the procedure to generate the watermarks.Hereby, the attacker must generate a signatures or key which identifies him as theauthor and fits to the watermarks inside the core. This is very hard to achieve due to

99

5. IP Protection

the usage of one way cryptographic functions. Furthermore, the attacker can presentsome functional lookup tables as his watermarks. This should also be nearly impossi-ble due to the characteristics of one way cryptographic functions. Another possibilityto check this attack, is to remove the watermarks from the bitfile core. The correctwatermarks are inserted after the implementation of the core and therefore the coreshould keep the functional correctness. Whereas the removal of the wrong water-marks which are functional lookup table contents, destroys the core.

Using asynchronous public/private key cryptographic functions for the watermarkgeneration and verification and further storing information about the core into theunique key successfully prevents key copy attacks.

Additional Reading More information about this method can be found in [SZT08].

5.3 Constraint-Based Watermarking of IP CoresAll optimization problems have constraints which must be satisfied to achieve a validsolution. Solutions which satisfy this constraints are the solution space. Constraint-based watermarking techniques represent a signature as a set of additional constraintswhich are applied to the hardware optimization and synthesis problem. These addi-tional constraints reduce the solution space (see Figure 5.20) since the chosen solu-tion must also satisfy the additional constraints. The same solution could be achievedwith neglecting these additional constraints with probability Pc. The probability Pc ofthis event is given by the following formula:

Pc =nw

no(5.17)

where no is the number of solutions which satisfy the original constraints and nw isthe number of solutions which satisfy both the original and the additional constraints[KLMS+01, KHPC98]. If Pc is very small, a solution that also satisfies the additional(watermarking) constraints is a strong proof of the existence of the watermark.

Qu proposes a methodology to make a part of the watermark – for constraint-based watermarking, some additional constraints – public which should deter attack-ers [Qu02]. The other parts, called private watermark, are only known by the coreauthor and are used to verify the authorship in case that the public watermark was at-tacked. A similar approach is used by Qu and others to generate different fingerprintsby dividing the additional constraints into two parts [QP00]: The first part is a set ofrelaxed constraints which denote the watermark. By applying distinct constraints tothe second part, different independent solutions can be generated which may be usedas diverse fingerprinted designs.

Charbon proposed a technique to embed watermarks on different abstraction levelswhich he called hierarchical watermarking [Cha98]. The idea is, if an attacker is

100

5.3 Constraint-Based Watermarking of IP Cores

���������������������� ������ �

������� ������������������������ �����

Figure 5.20: The solution space of an original and a watermarked design. If a de-sign satisfies the original and the additional constraints, then the designis protected by a watermark. The probability that the additional con-straints are satisfied by chance should be low to have a strong proof ofauthorship.

able to remove a watermark, for example, embedded into the layout of a circuit, thewatermarks added at higher abstraction levels are still present. However, Charbonfocused more on layout, nets, and latch watermarking techniques which are onlyapplicable for ASIC layout cores.

The verification of a constraint-based watermark is usually done with the water-marked core as it is. This means the watermarked core can be purchased or publishedand from the distributed cores the watermark can be verified. However, if the coreis combined with other cores and traverses further design steps, the watermark infor-mation is usually lost or it cannot be extracted.

Van Le and Desmedt [LD03] present an ambiguous attack for constraint-basedwatermarking techniques. The authors add further constraints to the watermarkedsolution by allowing only a minimal increase of the overhead. The result is a slightlydegenerated solution which satisfies many additional constraints. This means that inthis solution, a lot of different signatures can be found which destroys the uniqueidentification of the core developer. They choose, for example, the constraint-basedwatermarking approach for graph coloring. Further, this attack might be applicableto other constraint-based watermarking techniques.

As it was the case with additive watermarking strategies, constraint-based water-marking strategies are applicable for HDL, netlist, and bitfile cores.

5.3.1 HDL Cores

HDL code is usually produced by human developers or high-level synthesis tools.Both can set additional constraints to watermark a design. One approach is to use awatermarked scan chain [KP98]. Scan chains are usually used in ASIC designs to

101

5. IP Protection

access the internal registers for debugging purposes. The use of scan chains in FPGAdesigns is rather unusual, but might be helpful in some cases. At first, a numberwill be assigned to each register, and the registers will be sorted. Now, a pseudorandom sequence will be generated from the signature. Registers are selected withan algorithm which uses a random sequence as input. For Kc scan chains, the firstKc selected registers are chosen as the first register in each chain. Depending onthe signature, we have a variation on the scan chains which can be used to detectthe watermark. It is possible that an unfortunately chosen start of a chain couldresult in the allocation of more routing resources. Moreover, the maximum clockfrequency for the scan chain can be limited. This approach is easy to verify, if thescan chains can be accessed from outside of the chip. Problems occur, if the scanchain is only used internally or is not connected to any device. In such a case, thereis no verification possibility.

Some work was done for watermarking digital signal processing (DSP) functions[RAMSP99, CD00]. This kind of watermarking has more in common with media wa-termarking instead if IP watermarking. Both approaches alter the function of the coreslightly by embedding a watermark. In [RAMSP99], the coefficients of finite impulseresponse (FIR) filters are slightly varied according to the watermark. Additionally,the authors use different structures to build the FIR filter which also corresponds tothe signature. In [CD00], these ideas are extended and proven correct by mathemati-cal analysis.

5.3.2 Netlist CoresAn approach to watermark netlist cores is to preserve certain nets during synthesis andmapping [KHPC98]. Synthesis tools merge signals or nets together and produce newnets. Only a few nets from the synthesis input will be visible in the synthesis result.The technology mapping tool also eliminates nets by assembling gates together in alookup table. Kirovski’s approach enumerates and sorts all nets in a design. The firstKc (see previous section) nets of the input are chosen by the synthesis tools accordingto a signature. These nets will be prevented from elimination by the design toolsby connecting these nets to a temporary output of the core. The new outputs fromadditional constraints for the synthesis tool, and the corresponding result is related tothe watermark. A disadvantage is that it is easy to remove the additional logic. If thecontent of the lookup table is synthesized again, the watermark will be removed.

Meguerdichan and others presented a similar approach for netlist cores where ad-ditional constraints are added during the technology mapping step of the synthesisprocess [MP00]. In this approach, critical signals are not altered which preserves thetiming and the performance of the core. The signature is encoded into the number ofallowed inputs of a certain primitive cell, e.g., a gate or a lookup table. The primitivecells which are not in the critical path are enumerated, and according to the signature,the number of usable inputs are constrained.

102

5.3 Constraint-Based Watermarking of IP Cores

Khan and others watermark netlist cores by doing a rewiring after synthesis [KT05].Rewiring means that redundant connections between primitive cells are added in thenetlist which makes other original connections redundant. These new redundant con-nections are removed.

Bai and others introduce a method for watermarking transistor netlists for full cus-tom designs [BGXC07]. The transistors are enumerated and sorted into a list like inthe approach above. Corresponding to the pseudo random stream generated from thesignature, the width of the transistor gate is altered. If the transistor is assigned a ’1’from the random stream, the transistor width is increased by a constant value.

5.3.3 Bitfile and Layout Cores

Additional placement, routing, or timing constraints can be added to watermark bit-file cores. To embed a watermark with placement constraints, Kahng and others placethe configurable logic blocks (CLBs) in even or odd rows depending on the signature[KMM+98]. In this approach, the signature is transformed into even/odd row place-ment constraints. The placed core will be tested on preserving the constraints and,if necessary, CLBs are swapped. This method has no logical resource overhead andthe additional costs of routing the resources are very small or tend to zero, becausethe placement is altered only marginally. The problem of verification is to extractthe CLB placement information. Only if knowing how the CLBs correspond to thesignature, the watermark can be verified. A strategy to achieve this is to uniquelyenumerate the CLBs in an FPGA from the top left corner.

Kahng and others [KMM+98] propose a second approach by adding constraintsto the router. The constraints achieve that a net selected by the signature is routedwith some additional, unusual routing resources. These unusual resources can be, forexample, wrong way segments. A wrong way segment is a segment in which the netgoes to the wrong direction and then back in the right direction to form a backstrap.The authors claim that this is unlikely for a normal router, and so such a net can beverified as a watermarked net.

Furthermore, additional timing constraints can be used to watermark a core. Tim-ing constraints limit the route and logic delay between two registers. Kahng andothers propose a technique to select paths which have timing constraints accordingto the signature. The timing constraints for these paths are split into two separateconstraints. For example, let a path have six logic gates and a timing constraint of10 ns. The new constraint is 4 ns for the first 3 cells and 6 ns for the rest [KLMS+01].

Another approach by Jain and others measures the delay on selected paths and addsnew timing constraints on these paths [JYPQ03]. The new constraints are chosenbased on the measured delay by setting the last digit to a value of a bit from thesignature. For example, let a path have a delay of 5.73 ns. If the coded bit is a ’1’, thenew constraint for this path is 5.71 ns, if the coded bit is ’0’, the constraint is 5.70 ns.

103

5. IP Protection

Narayan and others present a watermarking approach of the layout by modifyingthe number of vias and bends of certain nets [NNC+01]. Like in other approaches,the nets are enumerated, and additional vias or bends are inserted according to thesignature.

Saha and others present a watermarking scheme by altering the size of the repeatersaccording to the signature [SSK07]. In high performance ASIC designs, repeaters (abuffer for amplification of the signal) are inserted into critical nets to decrease thedelay.

5.4 Other Approaches

Many other approaches exist for protecting IP cores or designs from unlicensed usageor alteration. For example, the VHDL Obfuscator & Watermarker [VIS] is able toobfuscate VHDL cores in a way that the algorithm is hidden, but leaves the coresynthesizeable. This approach make reverse engineering and alteration of the coremuch harder. Further, by using different scrambling techniques, a watermark can beembedded in the obfuscated code. Clearly, this watermark can only be detected at theRTL in the HDL core and is lost once the core is synthesized.

Other approaches prevent the copying of bitfile designs by using unique FPGAor board identification. If the obtained bitfile is programmed on another board, thefunction will not work. The unique identification can be done by using a non-volatileexternal device, a unique key embedded into the FPGA, or by physical unclonablefunctions (PUFs) [Dri09].

Kessner uses a non-volatile CPLD for board identification [Kes00]. A bit sequenceis calculated by a cryptographic algorithm implemented on the FPGA and addition-ally on the CPLD. If the results of both implementations are the same, then the design“knows” that it is executed on the “right” board and starts its operation. Similarly,challenge-response approaches are published in an Altera white paper [Alt] and aXilinx application note [Xila]. A challenge consisting of a sequence produced froma random generator is sent to a cryptographic algorithm implemented into a non-volatile device. The response of the device calculated with a secure key is comparedwith the result of the same algorithm and key implemented on the FPGA. The appli-cation is enabled if both results are the same. Couture and Kent propose a methodwhere the IP core reads out a secure token, stored in a non-volatile memory in peri-odic time-lags [CK06]. Inside the token, the type and the life span of the license isencoded. For preventing the cloning of the bitfile, the token also includes a uniqueFPGA identification number.

In the Spartan-3A FPGA family, Xilinx implants a factory-set read-only uniquenumber called Device DNA in each device [Xile]. This 57-bit number can be used todevelop cores which allow execution only on specified FPGA devices.

104

5.4 Other Approaches

A physical unclonable function (PUF) returns a unique value, which is extractedfrom physical properties of an object. Silicon PUFs (SPUFs) generate this device-dependent value from different manufacturing-related variations of timing and delaybehaviors on nets of the silicon device [GCvDD02]. Related work of SPUFs canbe categorized into approaches using ring oscillators and approaches using so-calledArbiter-PUFs. Gassend and others propose a method using ring oscillators in FP-GAs for generating device-dependent values [GCvDD02]. The ring oscillators swingwith a certain frequency, and the output is used to enable a counter, clocked with theoperational clock. Lee and others [LLG+04] and Lim and others [LLG+05] presentSPUFs, realized with an Arbiter-PUF at the IC fabric. An Arbiter-PUF, shown inFigure 5.21, consists of two identical designed delay lines, one for a data signal andone for a clock signal which can be crossed with multiplexers. The challenge vectorenables a route through the multiplexers for both signals. Edges are generated andpropagate through the network according to the challenge vector. If the clock signalreaches the flip-flop, the current value of the data signal is registered. The result is a’1’ if the clock signal is faster than the data signal; otherwise the result is ’0’. Vary-ing the challenge vector will cause different results, which can only be reproducedon the same device. Using another device, the achieved results are completely dif-ferent. Mjzoobi and others [MKP09] show an implementation of an Arbiter-PUF forVirtex-5 FPGAs whereas Suh and Devadas [SD07] implement an Arbiter-PUF forVirtex-4. Holcomb and others [HBF07] and Guajardo and others [GKST07] presentapproaches where the initial state of SRAMs is used as a PUF. During the powerup of SRAMs, some memory cells switch to a ’1’, others to a ’0’, depending on theprocess variations. Guajardo and others reported that Block RAMs of some FPGAscan be used for generating a unique key which might be used for design protection.

Simpson and Schaumont propose an authentication system for software, running ina soft core on an FPGA, by using a PUF [SS06]. Later, Guajardo and others enhancedthis approach [GKST07].

105

5. IP Protection

����

������

��� �

������ ��� ���

Figure 5.21: An Arbiter-PUF consists of a flip-flop and two delay lines the routingof which can be altered by different challenge values. An edge propa-gates through the multiplexer network to the flip-flop. The registeredresponse is determined by which signal arrives first. The responses ofdifferent challenges are device dependent, hence to minimal uncon-trollable path delay variations of different devices.

106

Bibliography

[AARR03] Dakshi Agrawal, Bruce Archambeault, Josyula R. Rao, and Pankaj Ro-hatgi. The EM Side-Channel(s). In CHES ’02: 4th International Work-shop on Cryptographic Hardware and Embedded Systems, pages 29–45, London, UK, 2003. Springer-Verlag.

[ABEL05] Martın Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-flow Integrity. In CCS ’05: Proceedings of the 12th ACM Confer-ence on Computer and Communications Security, pages 340–353, NewYork, NY, USA, 2005. ACM Press.

[ABEL09] Martın Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-flow Integrity: Principles, Implementations, and Applications. ACMTrans. Inf. Syst. Secur., 13(1):1–40, 2009.

[ABMF04] Todd Austin, David Blaauw, Trevor Mudge, and Krisztian Flautner.Making Typical Silicon Matter with Razor. Computer, 37(3):57–65,2004.

[AHTA04] Amr T. Abdel-Hamid, Sofiene Tahar, and El Mostapha Aboulhamid. ASurvey on IP Watermarking Techniques. Design Automation for Em-bedded Systems, 9(3):211–227, 2004.

[Ajl95] Cheryl Ajluni. Two new Imaging Techniques Promise to Improve ICDefect Identification. Electronic Design, 43(14):37–38, 1995.

[AK96] Ross Anderson and Markus Kuhn. Tamper Resistance: A CautionaryNote. In WOEC’96: Proceedings of the 2nd conference on Proceedingsof the Second USENIX Workshop on Electronic Commerce, pages 1–11,Berkeley, CA, USA, 1996. USENIX Association.

[Ale96] Aleph One. Smashing the Stack for Fun and Profit. Phrack magazine,49(7), 1996.

[All00] VSI Alliance. Intellectual Property Protection White Paper: Schemes,Alternatives and Discussion Version 1.1. Issued by Intellectual Prop-erty Protection Development Working Group, Ver, 1.1, 2000.

[All07] Business Software Alliance. Fifth Annual BSA and IDC Global Soft-ware Piracy Study. Technical report, 2007.

107

Bibliography

[ALR01] Algirdas Avizienis, Jean-Claude Laprie, and Brian Randell. Funda-mental concepts of dependability. Technical Report Series – Universityof Newcastle upon Tyne Computing Science, 2001.

[Alt] Altera. FPGA Design Security Solution Using MAX II Devices. URL:http://www.altera.com/literature/wp/wp_m2dsgn.pdf.

[And72] James P. Anderson. Computer Security Technology Planning Study,1972.

[ANKA99] Z. Alkhalifa, V. S. Sukumaran Nair, Narayanan Krishnamurthy, andJacob A. Abraham. Design and Evaluation of System-Level Checks forOn-Line Control Flow Error Detection. IEEE Trans. Parallel Distrib.Syst., 10(6):627–641, 1999.

[ARRJ06] Divya Arora, Srivaths Ravi, Anand Raghunathan, and Niraj K. Jha.Hardware-assisted Run-time Monitoring for Secure Program Execu-tion on Embedded Processors. IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems, 14(12):1295–1308, 2006.

[ARS04] Andre Adelsbach, Markus Rohe, and Ahmad-Reza Sadeghi. Over-coming the Obstacles of Zero-knowledge Watermark Detection. InMM&Sec ’04: Proceedings of the 2004 workshop on Multimedia andsecurity, pages 46–55, New York, NY, USA, 2004. ACM.

[Aus95] Kenneth Austin. Data Security Arrangements for Semiconductor Pro-grammable Devices, February 7 1995. US Patent 5,388,157.

[Aus99] Todd M. Austin. DIVA: A Reliable Substrate for Deep Submicron Mi-croarchitecture Design. In MICRO 32: Proceedings of the 32nd annualACM/IEEE international symposium on Microarchitecture, pages 196–207, Washington, DC, USA, 1999. IEEE Computer Society.

[BAFS05] Elena Gabriela Barrantes, David H. Ackley, Stephanie Forrest, andDarko Stefanovic. Randomized Instruction Set Emulation. ACM Trans.Inf. Syst. Secur., 8(1):3–40, 2005.

[Bar] Scott Barrick. Designing Around an Encrypted Netlist: IsThe Pain Worth the Gain? D&R Industry Articles. URL:http://www.design-reuse.com/articles/18205/

encrypted-netlist.html.

[Bau08] Florian Baueregger. Identifikation von signierten Schaltungen an-hand von Leistungsverbrauchsmessungen. Dilpomarbeit, Departmentof Computer Science 12, University of Erlangen-Nuremberg, January2008.

108

Bibliography

[BBC05] BBC. 1986: Coal Mine Canaries made Redundant. URL:http://news.bbc.co.uk/onthisday/hi/dates/stories/

december/30/newsid_2547000/2547587.stm, 2005.

[BDH+98] Feng Bao, Robert H. Deng, Yongfei Han, Albert B. Jeng, A. DesaiNarasimhalu, and Teow-Hin Ngair. Breaking Public Key Cryptosys-tems on Tamper Resistant Devices in the Presence of Transient Faults.In Proceedings of the 5th International Workshop on Security Proto-cols, pages 115–124, London, UK, 1998. Springer-Verlag.

[BDS03] Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar. Address Obfus-cation: An Efficient Approach to Combat a Broad Range of MemoryError Exploits. In SSYM’03: Proceedings of the 12th conference onUSENIX Security Symposium, pages 8–8, Berkeley, CA, USA, 2003.USENIX Association.

[BEA06] Mihai Budiu, Ulfar Erlingsson, and Martın Abadi. Architectural Sup-port for Software-based Protection. In ASID ’06: Proceedings of the 1stworkshop on Architectural and system support for improving softwaredependability, pages 42–51, New York, NY, USA, 2006. ACM.

[BGB06] Lilian Bossuet, Guy Gogniat, and Wayne Burleson. Dynamically Con-figurable Security for SRAM FPGA Bitstreams. International Journalof Embedded Systems, 2(1):73–85, 2006.

[BGXC07] Fujun Bai, Zhiqiang Gao, Yi Xu, and Xueyu Cai. A WatermarkingTechnique for Hard IP Protection in Full-custom IC Design. In Interna-tional Conference on Communications, Circuits and Systems (ICCCAS2007), pages 1177–1180, 2007.

[BK00] Bulba and Kil3r. Bypassing Stackguard and Stackshield. Phrack Mag-azine, 2000.

[BKIL03] Saurabh Bagchi, Zbigniew Kalbarczyk, Ravishankar Iyer, and Y. Lev-endel. Design and Evaluation of Preemptive Control Signature(PECOS) Checking. IEEE Transactions on Computers, 2003.

[BLW+01] Saurabh Bagchi, Y Liu, Keith Whisnant, Zbigniew Kalbarczyk, Rav-ishankar K. Iyer, Y. Levendel, and Larry Votta. A Framework forDatabase Audit and Control Flow Checking for a Wireless TelephoneNetwork Controller. In DSN ’01: Proceedings of the 2001 Interna-tional Conference on Dependable Systems and Networks (formerly:FTCS), pages 225–234, Washington, DC, USA, 2001. IEEE ComputerSociety.

109

Bibliography

[BM07] Jan A. Bergstra and C. A. Middelburg. Instruction Sequences withIndirect Jumps. Electronic Report PRG0709, Programming ResearchGroup, University of Amsterdam, 2007.

[BPS00] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A Static An-alyzer for Finding Dynamic Programming Errors. Software-PracticeExperience, 30(7):775–802, 2000.

[BRSS08] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage.When Good Instructions go Bad: Generalizing Return-oriented Pro-gramming to RISC. In CCS ’08: Proceedings of the 15th ACM con-ference on Computer and communications security, pages 27–38, NewYork, NY, USA, 2008. ACM.

[BS97] Eli Biham and Adi Shamir. Differential Fault Analysis of Secret KeyCryptosystems. In CRYPTO ’97: Proceedings of the 17th Annual In-ternational Cryptology Conference on Advances in Cryptology, pages513–525, London, UK, 1997. Springer-Verlag.

[BST00] Arash Baratloo, Navjot Singh, and Timothy Tsai. Transparent Run-time Defense against Stack Smashing Attacks. In ATEC ’00: Proceed-ings of the annual conference on USENIX Annual Technical Confer-ence, pages 251–262, Berkeley, CA, USA, 2000. USENIX Associa-tion.

[BTH96] Laurence Boney, Ahmed H. Tewfik, and Khaled N. Hamdy. DigitalWatermarks for Audio Signals. In International Conference on Multi-media Computing and Systems, pages 473–480, 1996.

[BWWA06] Edson Borin, Cheng Wang, Youfeng Wu, and Guido Araujo. Software-Based Transparent and Comprehensive Control-Flow Error Detection.In CGO ’06: Proceedings of the International Symposium on CodeGeneration and Optimization, pages 333–345, Washington, DC, USA,2006. IEEE Computer Society.

[CBB+01] Crispin Cowan, Matt Barringer, Steve Beattie, Greg Kroah-Hartman,Mike Frantzen, and Jamie Lokier. FormatGuard: Automatic Protectionfrom printf Format String Vulnerabilities. In SSYM’01: Proceedingsof the 10th conference on USENIX Security Symposium, Berkeley, CA,USA, 2001. USENIX Association.

[CBD+99] Crispin Cowan, Steve Beattie, Ryan Finnin Day, Calton Pu, Perry Wa-gle, and Erik Walthinsen. Protecting Systems from Stack SmashingAttacks with StackGuard. In Linux Expo, 1999.

110

Bibliography

[CBJW03] Crispin Cowan, Steve Beattie, John Johansen, and Perry Wagle. Point-guardTM: Protecting Pointers from Buffer Overflow Vulnerabilities. InSSYM’03: Proceedings of the 12th conference on USENIX SecuritySymposium, Berkeley, CA, USA, 2003. USENIX Association.

[CD00] Roy Chapman and Tariq S. Durrani. IP Protection of DSP Algorithmsfor System on Chip Implementation. IEEE Transactions on Signal Pro-cessing, 48(3):854–861, 2000.

[CE99] Cristina Cifuentes and Mike Van Emmerik. Recovery of Jump TableCase Statements from Binary Code. In IWPC ’99: Proceedings of the7th International Workshop on Program Comprehension, pages 192–199, Washington, DC, USA, 1999. IEEE Computer Society.

[CH01] Tzi-Cker Chiueh and Fu-Hau Hsu. RAD: A Compile-time Solution toBuffer Overflow Attacks. In International Conference on DistributedComputing Systems, volume 21, pages 409–420. IEEE Computer Soci-ety; 1999, 2001.

[Cha98] Edoardo Charbon. Hierarchical Watermarking in IC Design. In Pro-ceedings of the IEEE Custom Integrated Circuits Conference, pages295–298, 1998.

[CHP97] Po-Yung Chang, Eric Hao, and Yale N. Patt. Target Prediction for Indi-rect Jumps. Proceedings of the 24th Annual International Symposiumon Computer Architecture, 25(2):274–283, 1997.

[CK06] Nathaniel Couture and Kenneth B. Kent. Periodic Licensing of FPGAbased Intellectual Property. In FPGA ’06: Proceedings of the 2006ACM/SIGDA 14th international symposium on Field programmablegate arrays, pages 234–234, New York, NY, USA, 2006. ACM.

[Con99] Matt Conover. w00w00 on Heap Overflows. URL: http://www.w00w00.org/files/articles/heaptut.txt, 1, 1999.

[CPG+06] Encarnacion Castillo, Luis Parrilla, Antonio Garcia, Antonio Loris, andUwe Meyer-Baese. IPP Watermarking Technique for IP Core Pro-tection on FPL Devices. In International Conference on Field Pro-grammable Logic and Applications, 2006. FPL’06, pages 487–492,2006.

[CPG+08] Encarnacion Castillo, Luis Parrilla, Antonio Garcia, Uwe Meyer-Baese, Guillermo Botella, and Antonio Lloris. Automated SignatureInsertion in Combinational Logic Patterns for HDL IP Core Protection.

111

Bibliography

In 4th Southern Conference on Programmable Logic, 2008, pages 183–186, 2008.

[CPM+98] Crispin Cowan, Calton Pu, Dave Maier, Heather Hintony, JonathanWalpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, andQian Zhang. StackGuard: Automatic Adaptive Detection and Pre-vention of Buffer-overflow Attacks. In SSYM’98: Proceedings of the7th conference on USENIX Security Symposium, Berkeley, CA, USA,1998. USENIX Association.

[CPW74] J. R. Connet, E. J. Pasternak, and B. D. Wagner. Software Defenses inReal-time Control Systems. Digest of papers, page 94, 1974.

[CSB92] Anantha P. Chandrakasan, Samuel Sheng, and Robert W. Brodersen.Low-power CMOS Digital Design. IEEE Journal of Solid-State Cir-cuits, 27(4):473–484, 1992.

[Dau06] Andrew Dauman. An Open IP Encryption Flow Permits Industry-wideInteroperability. Synopsys, Inc. White Paper, June 2006.

[Des97] Solar Designer. Non-executable Stack Patch. URL: http://www.openwall.com/linux/, 1997.

[DMW98] J. H. Daniel, D. F. Moore, and J. F. Walker. Focused Ion Beams for Mi-crofabrication. Engineering Science and Education Journal, 7(2):53–56, 1998.

[Dob03] Igor Dobrovitski. Exploit for CVS Double free() for Linuxpserver. Neohapsis Archives (http://www.security-express.com/archives/fulldisclosure/2003-q1/0545.html), 2003.

[Dri09] Saar Drimer. Security for Volatile FPGAs. November 2009.

[DRS03] Nurit Dor, Michael Rodeh, and Mooly Sagiv. CSSV: Towards a Re-alistic Tool for Statically Detecting All Buffer Overflows in C. ACMSIGPLAN Notices, 38(5):155–167, 2003.

[DS90] X. Delord and Gabriele Saucier. Control Flow Checking in PipelinedRISC Microprocessors: the Motorola MC88100 Case Study. In Pro-ceedings of the Euromicro’90 Workshop on Real Time, pages 162–169,1990.

[DS91] X. Delord and Gabriele Saucier. Formalizing Signature Analysis forControl Flow Checking of Pipelined RISC Multiprocessors. In Pro-ceedings of the IEEE International Test Conference on Test, pages 936–945, Washington, DC, USA, 1991. IEEE Computer Society.

112

Bibliography

[DSG05] Nij Dorairaj, Eric Shiflet, and Mark Goosman. PlanAhead Softwareas a Platform for Partial Reconfiguration. Xilinx XCELL Journal, Art,55:68–71, 2005.

[EAV+06] Ulfar Erlingsson, Martın Abadi, Michael Vrable, Mihai Budiu, andGeorge C. Necula. XFI: Software Guards for System Address Spaces.In OSDI ’06: Proceedings of the 7th symposium on Operating systemsdesign and implementation, pages 75–88, Berkeley, CA, USA, 2006.USENIX Association.

[EL02] David Evans and David Larochelle. Improving Security Using Exten-sible Lightweight Static Analysis. IEEE Software, 19(1):42–51, 2002.

[Erl07] Ulfar Erlingsson. Low-level Software Security: Attacks and Defenses.Foundations of Security Analysis and Design IV, 4677:92, 2007.

[ES84] James B. Eifert and John Paul Shen. Processor Monitoring Using Asyn-chronous Signatured Instruction Streams. In Twenty-Fifth Interna-tional Symposium on Fault-Tolerant Computing, 1995,’Highlights fromTwenty-Five Years’, Reprinted from FTGS-14 1984, pages 394–399,1984.

[Est] Chip Estimate. ChipEstimate.com. URL: http://www.

chipestimate.com/.

[Fed01] Federal Information Processing Standards Publication 197. Announc-ing the Advanced Encryption Standard (AES). URL: http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf,2001.

[FHSL96] Stephanie Forrest, Steven A. Hofmeyr, Anil Somayaji, and Thomas A.Longstaff. A Sense of Self for Unix Processes. In SP ’96: Proceed-ings of the 1996 IEEE Symposium on Security and Privacy, page 120,Washington, DC, USA, 1996. IEEE Computer Society.

[FKF+03] Henry Hanping Feng, Oleg M. Kolesnikov, Prahlad Fogla, Wenke Lee,and Weibo Gong. Anomaly Detection Using Call Stack Information.In SP ’03: Proceedings of the 2003 IEEE Symposium on Security andPrivacy, page 62, Washington, DC, USA, 2003. IEEE Computer Soci-ety.

[FKK96] Alan O. Freier, Philip Karlton, and Paul C. Kocher. The SSL Proto-col – Version 3.0. URL: http://www.mozilla.org/projects/security/pki/nss/ssl/draft302.txt, 1996.

113

Bibliography

[Fra] Fraunhofer IISB. FIB Focused Ion Beam - Anwendungs-beispiele, Spezifikationen und Prinzip. URL: http://www.iisb.fraunhofer.de/de/arb_geb/technologie_an_fib.htm.

[FS01] Mike Frantzen and Mike Shuey. StackGhost: Hardware FacilitatedStack Protection. In SSYM’01: Proceedings of the 10th conference onUSENIX Security Symposium, pages 55–66, Berkeley, CA, USA, 2001.USENIX Association.

[FS03] Niels Ferguson and Bruce Schneier. Practical Cryptography. JohnWiley & Sons, Inc., New York, NY, USA, 2003.

[FT03] Y. C. Fan and H. W. Tsao. Watermarking for Intellectual Property Pro-tection. Electronics Letters, 39(18):1316–1318, 2003.

[Gai94] Jiri Gaisler. Concurrent Error-detection and Modular Fault-tolerance ina 32-bit Processing Core for Embedded Space Flight Applications. InTwenty-Fourth International Symposium on Fault-Tolerant ComputingFTCS, 1994, pages 128–130. IEEE Computer Society Press, 1994.

[GCvDD02] Blaise Gassend, Dwaine Clarke, Marten van Dijk, and Srinivas De-vadas. Silicon Physical Random Functions. In CCS ’02: Proceedingsof the 9th ACM conference on Computer and communications security,pages 148–160, New York, NY, USA, 2002. ACM.

[GDWL92] Daniel D. Gajski, Nikil D. Dutt, Allen C.-H. Wu, and Steve Y.-L. Lin.High-level Synthesis: Introduction to Chip and System Design. KluwerAcademic Publishers, Norwell, MA, USA, 1992.

[Ger91] Anne Geraci. IEEE Standard Computer Dictionary: Compilation ofIEEE Standard Computer Glossaries. IEEE Press, Piscataway, NJ,USA, 1991.

[GHJM05] Dan Grossman, Michael Hicks, Trevor Jim, and Greg Morrisett. Cy-clone: A Type-safe Dialect of C. C/C++ Users Journal, 23(1):112–139, 2005.

[GKST07] Jorge Guajardo, Sandeep S. Kumar, Geert-Jan Schrijen, and Pim Tuyls.FPGA Intrinsic PUFs and Their Use for IP Protection. In CHES ’07:Proceedings of the 9th international workshop on Cryptographic Hard-ware and Embedded Systems, pages 63–80, Berlin, Heidelberg, 2007.Springer-Verlag.

[GO98] Anup K. Ghosh and Tom O’Connor. Analyzing Programs for Vulner-ability to Buffer Overrun Attacks. In Proceedings of the 21st National

114

Bibliography

Information Systems Security Conference, Crystal City, VA, pages 274–382, 1998.

[GRRV03] Olga Goloubeva, Maurizio Rebaudengo, Matteo Sonza Reorda, andMassimo Violante. Soft-Error Detection Using Control Flow Asser-tions. In DFT ’03: Proceedings of the 18th IEEE International Sympo-sium on Defect and Fault Tolerance in VLSI Systems, pages 581–588,Washington, DC, USA, 2003. IEEE Computer Society.

[GRRV05] Olga Goloubeva, Maurizio Rebaudengo, Matteo Sonza Reorda, andMassimo Violante. Improved Software-based Processor Control-flowErrors Detection Technique. In Reliability and maintainability sympo-sium, pages 583–589, 2005.

[Hag] Hagai Bar-El, Discretix Technologies Ltd. Known AttacksAgainst Smartcards. URL: http://www.discretix.com/PDF/

KnownAttacksAgainstSmartcards.pdf.

[HB03] Eric Haugh and Matt Bishop. Testing C Programs for Buffer OverflowVulnerabilities. In Proceedings of the Network and Distributed SystemSecurity Symposium, volume 2. Citeseer, 2003.

[HBF07] Daniel E. Holcomb, Wayne P. Burleson, and Kevin Fu. Initial SRAMState as a Fingerprint and Source of True Random Numbers for RFIDTags. In Proceedings of the Conference on RFID Security. Citeseer,2007.

[HFS98] Steven A. Hofmeyr, Stephanie Forrest, and Anil Somayaji. IntrusionDetection using Sequences of System Calls. Journal of Computer Se-curity, 6(3):151–180, 1998.

[HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaksand Access Errors. In Proceedings of the Winter USENIX Conference,volume 136, 1992.

[HP99] Inki Hong and Miodrag Potkonjak. Behavioral Synthesis Techniquesfor Intellectual Property Protection. In Design Automation Conference,pages 849–854, 1999.

[IBM] IBM. Rational Purify. URL: http://www-01.ibm.com/

software/awdtools/purify/.

[IFI90] IFIP WG. Dependability: Basic Concepts and Terminology: in En-glish, French, German, Italian and Japanese: IFIP WG 10.4. Depend-able Computing and Fault Tolerance, 1990.

115

Bibliography

[ISO05] ISO JTC. 1/SC 27: Information Technology–Security Techniques–Code of Practice for Information Security Management. 2005.

[ITR07] ITRS. International Technology Roadmap for Semiconductors,2007 Edition, URL: http://www.itrs.net/Links/2007ITRS/

Home2007.htm. Technical report, 2007.

[ITS91] ITSEC. Information Technology Security Evaluation Criteria (ITSEC).Office for Official Publications of the European Communities, 1991.

[Jal94] Pankaj Jalote. Fault Tolerance in Distributed Systems. Prentice-Hall,Inc., Upper Saddle River, NJ, USA, 1994.

[JK97] Richard W. M. Jones and Paul H. J. Kelly. Backwards-compatibleBounds Checking for Arrays and Pointers in C Programs. Automatedand Algorithmic Debugging, pages 13–26, 1997.

[JMG+02] Trevor Jim, J. Greg Morrisett, Dan Grossman, Michael W. Hicks,James Cheney, and Yanling Wang. Cyclone: A Safe Dialect of C. InATEC ’02: Proceedings of the General Track of the annual conferenceon USENIX Annual Technical Conference, pages 275–288, Berkeley,CA, USA, 2002. USENIX Association.

[JMKP07] Jose A. Joao, Onur Mutlu, Hyesoon Kim, and Yale N. Patt. DynamicPredication of Indirect Jumps. IEEE Computer Architecture Letter,6(2):25–28, 2007.

[JYPQ03] Adarsh K. Jain, Lin Yuan, Pushkin R. Pari, and Gang Qu. Zero Over-head Watermarking Technique for FPGA Designs. In GLSVLSI ’03:Proceedings of the 13th ACM Great Lakes symposium on VLSI, pages147–152. ACM Press, 2003.

[KBT08] Dirk Koch, Christian Beckhoff, and Jurgen Teich. ReCoBus-Builder - aNovel Tool and Technique to Build Statically and Dynamically Recon-figurable Systems for FPGAs. In Proceedings of International Confer-ence on Field-Programmable Logic and Applications (FPL 08), pages119–124, Heidelberg, Germany, September 2008.

[KC08] Kris Kaspersky and Alice Chang. Remote Code Execution through In-tel CPU Bugs. In Hack In The Box (HITB) 2008 Malaysia Conference,2008.

[KE91] David R. Kaeli and Philip G. Emma. Branch History Table Predic-tion of Moving Target Branches due to Subroutine Returns. SIGARCHComputer Architecture News, 19(3):34–42, 1991.

116

Bibliography

[Kea01] Tom Kean. Secure Configuration of a Field Programmable Gate Array.In FCCM ’01: Proceedings of the the 9th Annual IEEE Symposiumon Field-Programmable Custom Computing Machines, pages 259–260,Washington, DC, USA, 2001. IEEE Computer Society.

[Kes00] David Kessner. Copy Protection for SRAM based FPGADesigns. Application Note, Free IP Project, URL:http://web.archive.org/web/20031010002149/http:

//free-ip.com/copyprotection.html, 2000.

[KHPC98] Darko Kirovski, Yean-Yow Hwang, Miodrag Potkonjak, and JasonCong. Intellectual Property Protection by Watermarking Combina-tional Logic Synthesis Solutions. In ICCAD ’98: Proceedings of the1998 IEEE/ACM international conference on Computer-aided design,pages 194–198, New York, NY, USA, 1998. ACM.

[KHT08] Dirk Koch, Christian Haubelt, and Jurgen Teich. Efficient Reconfig-urable On-Chip Buses for FPGAs. In 16th Annual IEEE Symposiumon Field-Programmable Custom Computing Machines (FCCM 2008),pages 287–290. IEEE Computer Society, April 2008.

[KJJ99] Paul C. Kocher, Joshua Jaffe, and Benjamin Jun. Differential PowerAnalysis. In CRYPTO ’99: Proceedings of the 19th Annual Interna-tional Cryptology Conference on Advances in Cryptology, pages 388–397, London, UK, 1999. Springer-Verlag.

[KK99] Oliver Kommerling and Markus G. Kuhn. Design Principles forTamper-Resistant Smartcard Processors. In USENIX Workshop onSmartcard Technology (Smartcard ’99), pages 9–20, 1999.

[KKP03] Gaurav S. Kc, Angelos D. Keromytis, and Vassilis Prevelakis. Coun-tering Code-injection Attacks with Instruction-set Randomization. InCCS ’03: Proceedings of the 10th ACM conference on Computer andcommunications security, pages 272–280, New York, NY, USA, 2003.ACM.

[KLMR04] Paul Kocher, Ruby Lee, Gary McGraw, and Anand Raghunathan. Se-curity as a new Dimension in Embedded System Design. In DAC ’04:Proceedings of the 41st annual Design Automation Conference, pages753–760, New York, NY, USA, 2004. ACM. Moderator-Ravi, Srivaths.

[KLMS+98] Andrew Byun Kahng, John Lach, William Henry Mangione-Smith,Stefanus Mantik, Igor Leonidovich Markov, Miodrag M. Potkonjak,Paul Askeland Tucker, Huijuan Wang, and Gregory Wolfe. Water-marking Techniques for Intellectual Property Protection. In DAC ’98:

117

Bibliography

Proceedings of the 35th annual Design Automation Conference, pages776–781, New York, NY, USA, 1998. ACM.

[KLMS+01] Andrew Byun Kahng, John Lach, William Henry Mangione-Smith,Stefanus Mantik, Igor Leonidovich Markov, Miodrag M. Potkonjak,Paul Askeland Tucker, Huijuan Wang, and Gregory Wolfe. Constraint-Based Watermarking Techniques for Design IP Protection. IEEETransactions on Computer-Aided Design of Integrated Circuits andSystems, 20(10):1236–1252, 2001.

[klo99] klog. The Frame Pointer Overwrite. Phrack magazine, 55(9), 1999.

[KMM+98] Andrew Byun Kahng, Stefanus Mantik, Igor Leonidovich Markov,Miodrag M. Potkonjak, Paul Askeland Tucker, Huijuan Wang, and Gre-gory Wolfe. Robust IP Watermarking Methodologies for Physical De-sign. In DAC ’98: Proceedings of the 35th annual Design AutomationConference, pages 782–787, New York, NY, USA, 1998. ACM.

[KMM08] Tom Kean, David McLaren, and Carol Marsh. Verifying the Authen-ticity of Chip Designs with the DesignTag System. In HOST ’08:Proceedings of the 2008 IEEE International Workshop on Hardware-Oriented Security and Trust, pages 59–64, Washington, DC, USA,2008. IEEE Computer Society.

[KNKA96] Ghani A. Kanawati, V. S. Sukumaran Nair, Narayanan Krishnamurthy,and Jacob A. Abraham. Evaluation of Integrated System-level Checksfor on-line Error Detection. In IPDS ’96: Proceedings of the 2nd Inter-national Computer Performance and Dependability Symposium (IPDS’96), pages 292–301, Washington, DC, USA, 1996. IEEE ComputerSociety.

[Koc96] Paul C. Kocher. Timing Attacks on Implementations of Diffie-Hellman,RSA, DSS, and Other Systems. In CRYPTO ’96: Proceedings ofthe 16th Annual International Cryptology Conference on Advances inCryptology, pages 104–113, London, UK, 1996. Springer-Verlag.

[Kot06] Arun Kottolli. The Economics of Structured- and Standard-cell-ASICDesigns. EDN Magazine, 51(6):61–68, 2006.

[KP98] Darko Kirovski and Miodrag Potkonjak. Intellectual Property Pro-tection Using Watermarking Partial Scan Chains For Sequential LogicTest Generation. In ICCAD ’98: Proceedings of the 1998 IEEE/ACMinternational conference on Computer-aided design, 1998.

118

Bibliography

[KR06] Ian Kuon and Jonathan Rose. Measuring the Gap between FPGAs andASICs. In FPGA ’06: Proceedings of the 2006 ACM/SIGDA 14th inter-national symposium on Field programmable gate arrays, pages 21–30,New York, NY, USA, 2006. ACM.

[Kre03] Andreas Krennmair. ContraPolice: A libc Extension for ProtectingApplications from Heap-smashing Attacks, 2003.

[KT05] M. Moiz Khan and Spyros Tragoudas. Rewiring for WatermarkingDigital Circuit Netlists. IEEE Transactions on Computer-Aided Designof Integrated Circuits and Systems, 24(7):1132–1137, 2005.

[LB94] James R. Larus and Thomas Ball. Rewriting Executable Files to Mea-sure Program Behavior. Software-Practice and Experience, 24(2):197–218, 1994.

[LBD+04] James R. Larus, Thomas Ball, Manuvir Das, Robert DeLine, ManuelFahndrich, Jon Pincus, Sriram K. Rajamani, and Ramanathan Venkat-apathy. Righting Software. IEEE Software, 21(3):92–100, 2004.

[LBMC94] Carl E. Landwehr, Alan R. Bull, John P. McDermott, and William S.Choi. A Taxonomy of Computer Program Security Flaws. ACM Com-puting Surveys (CSUR), 26(3):211–254, 1994.

[LC02] Kyung-suk Lhee and Steve J. Chapin. Type-Assisted Dynamic BufferOverflow Detection. In Proceedings of the 11th USENIX Security Sym-posium, pages 81–88, Berkeley, CA, USA, 2002. USENIX Associa-tion.

[LC06] Qiming Li and Ee-Chien Chang. Zero-knowledge Watermark Detec-tion Resistant to Ambiguity Attacks. In MMSec ’06: Proceedings ofthe 8th workshop on Multimedia and security, pages 158–163, NewYork, NY, USA, 2006. ACM.

[LD03] Tri Van Le and Yvo Desmedt. Cryptanalysis of UCLA WatermarkingSchemes for Intellectual Property Protection. In IH ’02: Revised Pa-pers from the 5th International Workshop on Information Hiding, pages213–225, London, UK, 2003. Springer-Verlag.

[LKMS04] Ruby B. Lee, David K. Karig, John P. McGregor, and Zhijie Shi. En-listing Hardware Architecture to Thwart Malicious Code Injection. Se-curity in Pervasive Computing, pages 237–252, 2004.

[LLG+04] Jae W. Lee, Daihyun Lim, Blaise Gassend, G. Edward Suh, MartenVan Dijk, and Srini Devadas. A Technique to Build a Secret Key in

119

Bibliography

Integrated Circuits for Identification and Authentication Applications.In Proceedings of the IEEE VLSI Circuits Symposium, pages 176–179.Citeseer, 2004.

[LLG+05] Daihyun Lim, Jae W. Lee, Blaise Gassend, G. Edward Suh, MartenVan Dijk, and Srini Devadas. Extracting Secret Keys from IntegratedCircuits. IEEE Transactions on Very Large Scale Integration Systems,13(10):1200, 2005.

[LMSP98] John Lach, William H. Mangione-Smith, and Miodrag Potkonjak. Sig-nature Hiding Techniques for FPGA Intellectual Property Protection.In ICCAD ’98: Proceedings of the 1998 IEEE/ACM international con-ference on Computer-aided design, pages 186–189, New York, NY,USA, 1998. ACM.

[LMSP99] John Lach, William H. Mangione-Smith, and Miodrag Potkonjak. Ro-bust FPGA Intellectual Property Protection through Multiple SmallWatermarks. In DAC ’99: Proceedings of the 36th annual ACM/IEEEDesign Automation Conference, pages 831–836, New York, NY, USA,1999. ACM.

[LMSP01] John Lach, William H. Mangione-Smith, and Miodrag Potkonjak. Fin-gerprinting Techniques for Field-Programmable Gate Array Intellec-tual Property Protection. IEEE Transactions on Computer-Aided De-sign of Integrated Circuits and Systems, volume 20, 2001.

[Lu82] David J. Lu. Watchdog Processors and Structural Integrity Checking.IEEE Transaction on Computers, 31(7):681–685, 1982.

[MBS07] Albert Meixner, Michael E. Bauer, and Daniel Sorin. Argus: Low-Cost, Comprehensive Error Detection in Simple Cores. In MICRO ’07:Proceedings of the 40th Annual IEEE/ACM International Symposiumon Microarchitecture, pages 210–222, Washington, DC, USA, 2007.IEEE Computer Society.

[MBS08] Albert Meixner, Michael E. Bauer, and Daniel Sorin. Argus: Low-Cost,Comprehensive Error Detection in Simple Cores. IEEE Micro-Instituteof Electrical and Electronics Engineers, 28(1):52–59, 2008.

[MdR99] Todd C. Miller and Theo de Raadt. strlcpy and strlcat: Consistent,Safe, String Copy and Concatenation. In ATEC ’99: Proceedings ofthe annual conference on USENIX Annual Technical Conference, pages41–41, Berkeley, CA, USA, 1999. USENIX Association.

120

Bibliography

[MH91] Edgar Michel and Wolfgang Hohl. Concurrent Error Detection usingWatchdog Processors in the Multiprocessor System MEMSY. In Fault-tolerant computing systems: tests, diagnosis, fault treatment: 5th Inter-national GI/ITG/GMA Conference, Nurnberg, September 25-27, 1991:proceedings, page 54. Springer, 1991.

[MHPS96] Istvan Majzik, Wolfgang Hohl, Andras Pataricza, and Volker Sieh.Multiprocessor Checking using Watchdog Processors. Computer Sys-tems Science and Engineering, 11(5):301–310, 1996.

[MKGT92] Ghassem Miremadi, Johan Karlsson, Ulf Gunneflo, and Jan Torin. TwoSoftware Techniques for On-line Error Detection. In Digest of Papers,Twenty-Second International Symposium on Fault-Tolerant Comput-ing. FTCS-22., pages 328–335, 1992.

[MKP09] Mehrdad Majzoobi, Farinaz Koushanfar, and Miodrag Potkonjak.Techniques for Design and Implementation of Secure ReconfigurablePUFs. ACM Transaction on Reconfigurable Technology Systems,2(1):1–33, 2009.

[MKSL03] John P. McGregor, David K. Karig, Zhijie Shi, and Ruby B. Lee. A Pro-cessor Architecture Defense against Buffer Overflow Attacks. In Pro-ceedings of the IEEE International Conference on Information Tech-nology: Research and Education (ITRE 2003), pages 243–250. Cite-seer, 2003.

[MLS91] Thierry Michel, Regis Leveugle, and Gabriele Saucier. A New Ap-proach to Control Flow Checking Without Program Modification. InDigest of Papers of Twenty-First International Symposium of Fault-Tolerant Computing, pages 334–343, 1991.

[MM88] Aamer Mahmood and Edward J. McCluskey. Concurrent Error De-tection Using Watchdog Processors-A Survey. IEEE Transaction onComputers, 37(2):160–174, 1988.

[MN98] Lee D. McFearin and V. S. Sukumaran Nair. Control Flow CheckingUsing Assertions. Dependable Computing and Fault Tolerant Systems,10:183–200, 1998.

[MP00] Seapahn Meguerdichian and Miodrag Potkonjak. Watermarking whilePreserving the Critical Path. In DAC ’00: Proceedings of the 37th An-nual Design Automation Conference, pages 108–111, New York, NY,USA, 2000. ACM.

121

Bibliography

[MS91] Henrique Madeira and Joao G. Silva. On-line Signature Learning andChecking: Experimental Evaluation. In CompEuro’91: Proceedings ofthe 5th Annual European Computer Conference of Advanced ComputerTechnology, Reliable Systems and Applications, pages 642–646, 1991.

[MV05] Matt Messier and John Viega. Safe C String Library v1.0.3. URL:http://www.zork.org/safestr/, 2005.

[MWB+10] Matthias May, Norbert Wehn, Abdelmajid Bouajila, Johannes Zeppen-feld, Walter Stechele, Andreas Herkersdorf, Daniel Ziener, and JurgenTeich. A Rapid Prototyping System for Error-Resilient Multi-ProcessorSystems-on-Chip. In Proceedings of DATE’10, pages 375–380, March2010.

[Nam82] Masood Namjoo. Techniques for Concurrent Testing of VLSI Proces-sor Operation. In Proceedings of International Test Conference, pages461–468, 1982.

[Nam83] Massod Namjoo. CERBERUS-16: An Architecture for a General Pur-pose Watchdog Processor. In Proceedings of Symposium on Fault-Tolerant Computing, pages 216–219, 1983.

[NCH+05] George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak,and Westley Weimer. CCured: Type-safe Retrofitting of Legacy Soft-ware. ACM Transactions on Programming Languages and Systems(TOPLAS), 27(3):477–526, 2005.

[NNC+01] Naveen Narayan, Rexford D. Newbould, Jo Dale Carothers, Jefrey J.Rodriguez, and W. Timothy Holman. IP Protection for VLSI Designsvia Watermarking of Routes. In Proceedings of the 14th Annual IEEEInternational ASIC/SOC Conference, pages 406–410, 2001.

[OCK+75] Severo M. Ornstein, William R. Crowther, M. F. Kraley, R. D. Bressler,A. Michel, and Frank E. Heart. Pluribus: A Reliable Multiprocessor.In AFIPS ’75: Proceedings of the May 19-22, 1975, national computerconference and exposition, pages 551–559, New York, NY, USA, 1975.ACM.

[Oli01] Arlindo L. Oliveira. Techniques for the Creation of Digital Watermarksin Sequential Circuit Designs. IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, 20(9):1101–1117, 2001.

[OR95] Joakim Ohlsson and Marcus Rimen. Implicit Signature Checking. InFTCS ’95: Proceedings of the Twenty-Fifth International Symposium

122

Bibliography

on Fault-Tolerant Computing, pages 218–227, Washington, DC, USA,1995. IEEE Computer Society.

[OSM02] Nahmsuk Oh, Philip P. Shirvani, and Edward J. McCluskey. Control-flow Checking by Software Signatures. IEEE Transactions on Relia-bility, 51(1):111–122, 2002.

[OVB+06] Hilmi Ozdoganoglu, T. N. Vijaykumar, Carla E. Brodley, Benjamin A.Kuperman, and Ankit Jalote. SmashGuard: A Hardware Solution toPrevent Security Attacks on the Function Return Address. IEEE Trans-action on Computers, 55(10):1271–1285, 2006.

[PAX03] PAX Team. Non Executable Data Pages. URL: http://pax.

grsecurity.net/docs/pax.txt, 2003.

[PB04] Jonathan Pincus and Brandon Baker. Beyond stack smashing: Recentadvances in exploiting buffer overruns. IEEE Security and Privacy,02(4):20–27, 2004.

[PBA10] Andrea Pellegrini, Valeria Bertacco, and Todd Austin. Fault-BasedAttack of RSA Authentication. In Proceedings of Design and Test inEurope (DATE’10), pages 855–860, March 2010.

[PMHH93] Andras Pataricza, Istvan Majzik, Wolfgang Hohl, and Joachim Honig.Watchdog Processors in Parallel Systems. Microprocessing and Micro-programming, 39(2-5):69–74, 1993.

[QP00] Gang Qu and Miodrag Potkonjak. Fingerprinting Intellectual Propertyusing Constraint-addition. In DAC ’00: Proceedings of the 37th AnnualDesign Automation Conference, pages 587–592, New York, NY, USA,2000. ACM.

[Qu02] Gang Qu. Publicly Detectable Watermarking for Intellectual Prop-erty Authentication in VLSI Design. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 21(11):1363–1367,2002.

[Rag06] Roshan G. Ragel. Architectural Support for Security and Reliabilityin Embedded Processors. PhD thesis, University of New South Wales,Sydney, Australia, 2006.

[RAMSP99] Azra Rashid, Jeet Asher, William H. Mangione-Smith, and MiodragPotkonj. Hierarchical Watermarking for Protection of DSP FilterCores. In Proceedings of the Custom Integrated Circuits Conference.Piscataway, NJ, pages 39–45. IEEE Press, 1999.

123

Bibliography

[RBD+01] Rob A. Rutenbar, Max Baron, Thomas Daniel, Rajeev Jayaraman, ZviOr-Bach, Jonathan Rose, and Carl Sechen. (When) will FPGAs killASICs? (panel session). In DAC ’01: Proceedings of the 38th annualDesign Automation Conference, pages 321–322, New York, NY, USA,2001. ACM.

[RCV+05] George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, andDavid I. August. SWIFT: Software Implemented Fault Tolerance. InCGO ’05: Proceedings of the international symposium on Code gener-ation and optimization, pages 243–254, Washington, DC, USA, 2005.IEEE Computer Society.

[Reu] Design & Reuse. Catalyst of Collaborative IP Based SoC Design. URL:http://www.design-reuse.com/.

[Ric02] Gerardo Richarte. Four different tricks to bypass stackshield andstackguard protection. URL: http://www1.corest.com/files/files/11/StackGuardPaper.pdf, 2002.

[Ric08] Robert Richardson. CSI Computer Crime and Security Survey. Tech-nical report, 2008.

[RKMV03] William Robertson, Christopher Kruegel, Darren Mutz, and FredrikValeur. Run-time Detection of Heap-based Overflows. In LISA ’03:Proceedings of the 17th USENIX conference on Large Installation Sys-tems Administraton, pages 51–60, Berkeley, CA, USA, 2003. USENIXAssociation.

[RL04] Olatunji Ruwase and Monica S. Lam. A Practical Dynamic BufferOverflow Detector. In Proceedings of the 11th Annual Network andDistributed System Security Symposium, pages 159–169, 2004.

[RLC+07] Eduardo L. Rhod, Calisboa A. Lisboa, L. Carro, Massimo Violante, andMatteo Sonza Reorda. A Non-intrusive On-line Control Flow Error De-tection Technique for SoCs. In IEEE Latin-Americam Test Workshop,LATW, volume 8, 2007.

[RRKH04] Srivaths Ravi, Anand Raghunathan, Paul Kocher, and Sunil Hattan-gady. Security in Embedded Systems: Design Challenges. ACM Trans-action on Embedded Computer Systems, 3(3):461–491, 2004.

[RSA78] Ronald Linn Rivest, Adi Shamir, and Leonard Max Adleman. AMethod for Obtaining Digital Signatures and Public-key Cryptosys-tems. Communications of the ACM, 21(2):120–126, 1978.

124

Bibliography

[SAH] Bassel Soudan, Wael Adi, and Abdulrahman Hanoun. Enabling Se-cure Integration of Multiple IP Cores in the Same FPGA. D&R Indus-try Articles. URL: http://www.design-reuse.com/articles/21638/secure-integration-ip-cores-fpga.html.

[SBDB01] R. Sekar, M. Bendre, D. Dhurjati, and P. Bollineni. A Fast Automaton-Based Method for Detecting Anomalous Program Behaviors. In SP’01: Proceedings of the 2001 IEEE Symposium on Security and Pri-vacy, pages 144–155, Washington, DC, USA, 2001. IEEE ComputerSociety.

[SBE+07a] Walter Stechele, Oliver Bringmann, Rolf Ernst, Andreas Herkersdorf,Katharina Hojenski, Peter Janacik, Franz Rammig, Jurgen Teich, Nor-bert Wehn, Johannes Zeppenfeld, and Daniel Ziener. Autonomic MP-SoCs for Reliable Systems. In Proceedings of Zuverlassigkeit und En-twurf (ZuD 2007), pages 137–138, Munich, Germany, March 2007.

[SBE+07b] Walter Stechele, Oliver Bringmann, Rolf Ernst, Andreas Herkersdorf,Katharina Hojenski, Peter Janacik, Franz Rammig, Jurgen Teich, Nor-bert Wehn, Johannes Zeppenfeld, and Daniel Ziener. Concepts for Au-tonomic Integrated Systems. In Proceedings of edaWorkshop07, Mu-nich, Germany, June 2007.

[SD07] G. Edward Suh and Srinivas Devadas. Physical Unclonable Functionsfor Device Authentication and Secret Key Generation. In DAC ’07:Proceedings of the 44th annual Design Automation Conference, pages9–14, New York, NY, USA, 2007. ACM.

[SFF+02] Oliverio J. Santana, Ayose Falcon, Enrique Fernandez, Pedro Medina,Alex Ramırez, and Mateo Valero. A Comprehensive Analysis of Indi-rect Branch Prediction. In ISHPC ’02: Proceedings of the 4th Interna-tional Symposium on High Performance Computing, pages 133–145,London, UK, 2002. Springer-Verlag.

[Sha07] Hovav Shacham. The Geometry of Innocent Flesh on the Bone:Return-into-libc without Function Calls (on the x86). In CCS ’07:Proceedings of the 14th ACM conference on Computer and commu-nications security, pages 552–561, New York, NY, USA, 2007. ACM.

[SKB02] Li Shang, Alireza S. Kaviani, and Kusuma Bathala. Dynamic PowerConsumption in Virtex-II FPGA Family. In FPGA ’02: Proceed-ings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays, pages 157–164, New York, NY, USA,2002. ACM Press.

125

Bibliography

[SM90] Nirmal Raj Saxena and Edward J. McCluskey. Control-Flow CheckingUsing Watchdog Assists and Extended-Precision Checksums. IEEETransactions on Computers, 39(4):554–559, 1990.

[SPA] SPARC International, Inc. The SPARC Architecture Manual V8. URL:http://www.sparc.com/standards/V8.pdf.

[SPE] SPEC (Standard Performance Evaluation Corporation). SPECCPU2000 V1.3. URL: http://www.spec.org.

[SS87] Michael A. Schuette and John Paul Shen. Processor Control FlowMonitoring using Signatured Instruction Streams. IEEE Transactionon Computers, 36(3):264–277, 1987.

[SS06] Eric Simpson and Patrick Schaumont. Offline Hardware/Software Au-thentication for Reconfigurable Platforms. Cryptographic Hardwareand Embedded Systems (CHES 2006), pages 311–323, 2006.

[SSK07] Debasri Saha and Susmita Sur-Kolay. Fast Robust Intellectual PropertyProtection for VLSI Physical Design. In ICIT ’07: Proceedings of the10th International Conference on Information Technology, pages 1–6,Washington, DC, USA, 2007. IEEE Computer Society.

[ST82] Thirumalai Sridhar and Satish M. Thatte. Concurrent Checking of Pro-gram Flow in VLSI Processors. In Proceedings International Test Con-ference (ITC 1982), Philadelphia, PA, USA, pages 191–199, 1982.

[ST87] John Paul Shen and Stephen P. Tomas. A Roving Monitoring Processorfor Detection of Control Flow Errors in Multiple Processor Systems.Microprocessing and Microprogramming, 20(4-5):249–269, 1987.

[SXZ+04] Zili Shao, Chun Xue, Qingfeng Zhuge, Edwin Hsing Mean Sha, andBin Xiao. Security Protection and Checking in Embedded System In-tegration Against Buffer Overflow Attacks. In ITCC ’04: Proceedingsof the International Conference on Information Technology: Codingand Computing (ITCC’04) Volume 2, pages 409–412, Washington, DC,USA, 2004. IEEE Computer Society.

[Syn] Synopsys. Synplify Premier. URL: http://www.synopsys.

com/Tools/Implementation/FPGAImplementation/

CapsuleModule/syn_prem_ds.pdf.

[SZHS03] Zili Shao, Qingfeng Zhuge, Yi He, and Edwin Hsing Mean Sha. De-fending Embedded Systems Against Buffer Overflow via Hardware/-Software. In ACSAC ’03: Proceedings of the 19th Annual Computer

126

Bibliography

Security Applications Conference, pages 352–361, Washington, DC,USA, 2003. IEEE Computer Society.

[SZT08] Moritz Schmid, Daniel Ziener, and Jurgen Teich. Netlist-Level IPProtection by Watermarking for LUT-Based FPGAs. In Proceedingsof IEEE International Conference on Field-Programmable Technology(FPT 2008), pages 209–216, Taipei, Taiwan, December 2008.

[TC00] Ilhami Torunoglu and Edoardo Charbon. Watermarking-based Copy-right Protection of Sequential Functions. IEEE Journal of Solid-StateCircuits, 35(3):434–440, 2000.

[TH07] Jurgen Teich and Christian Haubelt. Digitale Hardware/Software-Systeme: Synthese und Optimierung. Springer, 2007.

[UR94] Shambhu J. Upadhyaya and Bina Ramamurthy. Concurrent ProcessMonitoring with No Reference Signatures. IEEE Transaction on Com-puters, 43(4):475–480, 1994.

[US-08] US-CERT. Vulnerability Notes Database CERT Coordination Center.URL: http://www.kb.cert.org/vuls/, 2008.

[VBKM00] John Viega, J. T. Bloch, Yoshi Kohno, and Gary E. McGraw. ITS4: AStatic Vulnerability Scanner for C and C++ Code. In ACSAC ’00: Pro-ceedings of the 16th Annual Computer Security Applications Confer-ence, page 257, Washington, DC, USA, 2000. IEEE Computer Society.

[vdV04] Arjan van de Ven. New Security Enhancements in Red Hat EnterpriseLinux v.3, update 3. Red Hat, August, 2004.

[Ven00] Vendicator. Stack Shield: A Stack Smashing Technique Protec-tion Tool for Linux. URL: http://www.angelfire.com/sk/

stackshield/info.html, 2000.

[VIS] VISENGI. VHDL Obfuscator & Watermarker. URL: http://www.visengi.com/en/products/software/vhdl_obfuscator.

[WD01] David Wagner and Drew Dean. Intrusion Detection via Static Anal-ysis. In SP ’01: Proceedings of the 2001 IEEE Symposium on Secu-rity and Privacy, pages 156–169, Washington, DC, USA, 2001. IEEEComputer Society.

[WFBA00] David Wagner, Jeffrey S. Foster, Eric A. Brewer, and Alexander Aiken.A First Step towards Automated Detection of Buffer Overrun Vulnera-bilities. In Network and Distributed System Security Symposium, pages3–17, 2000.

127

Bibliography

[Whe] David A. Wheeler. Flawfinder Home Page. URL: http://www.

dwheeler.com/flawfinder.

[Wil07] Ron Wilson. Panel Unscrambles Intellectual Property EncryptionIssues. EDN Magazine URL: http://www.edn.com/article/

CA6412249.html, 2007.

[Woj98] Rafal Wojtczuk. Defeating Solar Designer Nonexecutable Stack Patch.Bugtraq mailinglist, 1998.

[Wri08] Craig Wright. Hacking Coffee Makers. URL: http://www.

securityfocus.com/archive/1/493387, 2008.

[WS90] Kent Wilken and John Paul Shen. Continuous Signature Monitoring:Low-cost Concurrent Detection of Processor Control Errors. IEEETransactions on Computer-Aided Design of Integrated Circuits andSystems, 9(6):629–641, 1990.

[Xila] Xilinx Inc. FPGA IFF Copy Protection Using Dallas Semi-conductor/Maxim DS2432 Secure EEPROMs. URL: http:

//www.xilinx.com/support/documentation/application_

notes/xapp780.pdf.

[Xilb] Xilinx Inc. ISE Design Suite Software Manuals and Help - PDFCollection These. URL: http://www.xilinx.com/support/

documentation/sw_manuals/xilinx11/manuals.pdf.

[Xilc] Xilinx Inc. JBits 3.0 SDK for Virtex-II. URL: www.xilinx.com/labs/projects/jbits/.

[Xild] Xilinx Inc. MicroBlaze Processor Reference Guide. URL:http://www.xilinx.com/support/documentation/sw_

manuals/mb_ref_guide.pdf.

[Xile] Xilinx Inc. Protect Your Brand with Extended Spartan-3AFPGAs. URL: http://www.xilinx.com/products/design_

resources/security/devicedna.htm.

[Xilf] Xilinx Inc. Virtex-II Platform FPGAs: Complete Data Sheet. URL:http://www.xilinx.com/support/documentation/data_

sheets/ds031.pdf.

[Xil03] Xilinx Inc. Next-Generation Virtex Family From Xilinx to top one Bil-lion Transistor Mark. URL: http://www.xilinx.com/prs_rls/silicon_vir/03131_nextgen.htm, 2003.

128

Bibliography

[XKI03] Jun Xu, Zbigniew Kalbarczyk, and Ravishankar K. Iyer. TransparentRuntime Randomization for Security. In Proceedings of the Symposiumon Reliable Distributed Systems, pages 260–272, 2003.

[XKPI02] Jun Xu, Zbigniew Kalbarczyk, Sanjay Patel, and Ravishankar K. Iyer.Architecture Support for Defending against Buffer Overflow Attacks.In Workshop on Evaluating and Architecting Systems for Dependabil-ity, 2002.

[YC80] Stephen S. Yau and Fu-Chung Chen. An Approach to ConcurrentControl Flow Checking. IEEE Transactions on Software Engineering,6(2):126–137, 1980.

[ZAT06] Daniel Ziener, Stefan Aßmus, and Jurgen Teich. Identifying FPGAIP-Cores based on Lookup Table Content Analysis. In Proceedingsof 16th International Conference on Field Programmable Logic andApplications (FPL 2006), pages 481–486, Madrid, Spain, August 2006.

[Zie10] Daniel Ziener. Techniques for Increasing Security and Reliability of IPCores Embedded in FPGA and ASIC Designs. PhD thesis, Universityof Erlangen-Nuremberg, 7 2010. Verlag Dr. Hut, Munich, Germany.

[Zim95] Philip R. Zimmermann. The official PGP user’s guide. MIT Press,Cambridge, MA, USA, 1995.

[ZT05] Daniel Ziener and Jurgen Teich. Evaluation of Watermarking Methodsfor FPGA-Based IP-cores. Technical Report 01-2005, University ofErlangen-Nuremberg, Department of CS 12, Hardware-Software-Co-Design, Am Weichselgarten 3, D-91058 Erlangen, Germany, March2005.

[ZT06] Daniel Ziener and Jurgen Teich. FPGA Core Watermarking Based onPower Signature Analysis. In Proceedings of IEEE International Con-ference on Field-Programmable Technology (FPT 2006), pages 205–212, Bangkok, Thailand, December 2006.

[ZT08a] Daniel Ziener and Jurgen Teich. Concepts for Autonomous ControlFlow Checking for Embedded CPUs. In Proceedings of the 5th Interna-tional Conference on Autonomic and Trusted Computing (ATC 2008),pages 234–248, Oslo, Norway, June 2008.

[ZT08b] Daniel Ziener and Jurgen Teich. Power Signature Watermarking of IPCores for FPGAs. Journal of Signal Processing Systems, 51(1):123–136, April 2008.

129

Bibliography

[ZT09] Daniel Ziener and Jurgen Teich. Concepts for Run-time and Error-resilient Control Flow Checking of Embedded RISC CPUs. Int. Jour-nal of Autonomous and Adaptive Communications Systems, 2(3):256–275, July 2009.

[ZZPL04] Tao Zhang, Xiaotong Zhuang, Santosh Pande, and Wenke Lee. Hard-ware Supported Anomaly Detection: Down to the Control Flow Level.Technical report, Georgia Institute of Technology, 2004.

[ZZPL05] Tao Zhang, Xiaotong Zhuang, Santosh Pande, and Wenke Lee. Anoma-lous Path Detection with Hardware Support. In CASES ’05: Proceed-ings of the 2005 international conference on Compilers, architecturesand synthesis for embedded systems, pages 43–54, New York, NY,USA, 2005. ACM.

130

(PDF) Security in Embedded Hardware - University of Twente - DOKUMEN.TIPS (2024)

FAQs

How do you secure an embedded system? ›

Implementing Strong Authentication and Authorization Mechanisms. Authentication plays a crucial role in ensuring that only authorized users can access the embedded system. Implementing robust authentication and authorization mechanisms prevents unauthorized access and potential data breaches.

What are the three measures of information security in embedded systems? ›

Embedded systems can be secured by implementing encryption and authentication measures, limiting access to critical functions, implementing secure boot processes, and regularly updating and patching software.

What is embedded hardware security? ›

Embedded systems security is a cybersecurity field focused on preventing malicious access to and use of embedded systems. Embedded systems security provides mechanisms to protect a system from all types of malicious behavior.

What is an example of embedded security? ›

What is an example of embedded system security? Two types of embedded systems security exist—physical security and software security. At its simplest, physical security can be solely a matter of locked doors, cameras, and even a security guard restricting access to an embedded system.

What are the 3 most important pillars of information security? ›

Confidentiality, Integrity and Availability, often referred to as the CIA triad (has nothing to do with the Central Intelligence Agency!), are basic but foundational principles to maintaining robust security in a given environment.

What are the three main hardware security measures? ›

First, ensure all hardware is kept in a secure location when not in use. It could be a locked cabinet or room. Second, physically secure all devices with locks or other tamper-proof devices. Third, limit access to hardware to authorized personnel only.

What are the 5 physical security controls required for information security? ›

Physical security controls include such things as data center perimeter fencing, locks, guards, access control cards, biometric access control systems, surveillance cameras and intrusion detection sensors.

How will you secure your system? ›

Choose strong passwords

Protect your devices and accounts from intruders by choosing passwords that are hard to guess. Use strong passwords with at least eight characters, a combination of letters, numbers and special characters.

What makes a system secure? ›

Restrict access.

Access to systems should be restricted to the minimum level that is required for a user to perform the tasks they need to perform. In addition, firewalls should be used to segregate and isolate systems so that an issue or attack on one system is less likely to impact other systems.

What are three ways to keep your devices secure? ›

Here are 11 ways to protect your information on mobile devices.
  • Enable Multi-Factor Authentication (MFA) on all accounts. ...
  • Use passcodes, strong passwords, and biometric authentication. ...
  • Enable your screen auto-lock feature. ...
  • Guard your screen in public places. ...
  • Use trusted Wi-Fi networks.

What makes a computer system secure? ›

Generally, basic computer security focuses on protecting computer systems from unauthorized access and use. For your own personal computer security, this can include steps like installing antivirus software, using a password generator and protecting the data you share online.

References

Top Articles
Latest Posts
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 6194

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.