Home

Chapter 5 Lockstep

image

Contents

1. SI FI 22 PMU 2 a GH ee 23 GI0 t B 2 G ae 24 HETI Level 1 24 a EE iia 25 HTU1 Level 1 25 0 m an gy Se 26 MIBSPI1 Level 1 26 D m nn rar 27 LIN1 Level 1 27 a Erm 28 ADC1 Group 2 28 a E ca 29 CANI Level 1 E A A EE cm Figure F 4 Activation of ESM Low Interrupt in VIM Module Appendix G Requirements for Multi core Proces SOrs Requirement The applicant shall analyze determine and document the configuration of the MCP settings for required unused and dynamic features that will be set either in hardware or in software during start up and during operation The applicant shall verify that the use of those settings enables the MCP to execute the applications hosted on its cores in a deterministic manner with the software architecture and operating system s used in the intended installation For undocumented features the applicant shall contact the MCP manufacturer to identify configuration settings used to control the undocumented features Comment The Lockstep safety feature is always enabled 6 CPU clock cycles after a reset of the controller It is not possible to disable it programmatically via the application code For the synchronization of the two cores running in Lock step the application ha
2. AIRBUS E amp AIRBU SPACE G Lockstep Analysis for Safety Critical Embedded Systems Stylianos Ganitis M N AALBORG UNIVERSITY wba MASTER THESIS FOR MSC IN EMBEDDED SOFTWARE SYSTEMS GUST lt A i5 Electronics and IT Aalborg University http www aau dk AALBORG UNIVERSITY STUDENT REPORT Title Abstract Lockstep Analysis for Safety Critical Lockstep is a state of the art architec Embedded Systems ture of microcontrollers employed in safety critical systems It is a safety Theme feature for detecting core level faults Safety in Embedded Systems Complemented with other safety features for memory peripherals and Project Period communication buses they constitute Spring Semester 2015 systems for use in fields that require high functional safety Project Group The current project was conducted in cooperation with Airbus Defence and Space Thus one of the objectives is to determine if Lockstep could be beneficial to use in Avionic Systems Participant s Stylianos Ganitis ee which is the field with the strictest Ulrik Nyman safety requirements Copies 5 f The project focuses mainly on the Page Numbers 115 Texas Instruments Lockstep archi tecture which was the object under Date of Completion investigation Nevertheless patents August 1 2015 of other semiconductor companies are presented leading the reader to realize the challenges in a Lockstep ar
3. CORE 0 CORE 1 LOCK STEP 25 LOGIC MEMORY CORE 1 STATE Figure 6 6 Prior art for prevention of loss of lockstep 23 This prior art method is not efficient as it makes the cores unavailable for an amount of time with burden on the performance of the system Additionally if this amount of time that is required to recover from the loss of lockstep is long it may lead to the crash of the whole computer system The invention of Hewlett Packard employs an apparatus comprising of at least two processor units operating in lockstep at least one idle processing unit and a controller unit that is capable of copying an architected state of an operational pro cessor unit to the idle one The processor unit that experiences the loss of lockstep generates a signal and after the transfer of its architected state to the idle process ing unit it goes offline In Figure 6 8 is depicted a system that contains three pairs of CPUs each of them operating in lockstep CPU0 in lockstep with CPU1 CPU5 in lockstep with CPU4 CPU2 in lockstep with CPU3 In lockstep mode each pair is connected to a lock step logic see Figure 6 7 62 Chapter 6 Lockstep Related Patents 26 LOCK STEP LOGIC Figure 6 7 Pair of processors in lockstep mode connected to a lockstep logic 23 The node controller is coupled in between and from its perspective each pair ap pears as a single processor In lockstep mode the pairs are first connected to a
4. Figure 5 4 Compare Mismatch Test Vectors 42 5 4 4 Error Forcing Mode In error forcing module two non identical signals are forced to the input ports of the CCM R4F module More specifically a hardcoded test pattern with a value of 0x5 is applied to the first input and OxA to the second input respectively expecting a CCM R4F compare error to be propagated to the ESM As is described in the 1oo1D Lockstep mode see section 5 4 2 the CCM R4F self test is also raised as a safety mechanism in case of a path failure of the CCM R4F compare signal This test ensures that an error between the CCM R4F and ESM path is successfully detected If a fault is not detected in this mode a hardware failure is present The duration of this mode is 1 CPU clock cycle and after its completion the mode is automatically switched to 1001D Lockstep mode 5 4 5 Self Test Error Forcing Mode In this mode an error is injected during the self test of the module In a normal operation of the module a CCM R4F self test flag is expected to be raised to ESM The error can be cleared through the application after the right operation of the mode is verified 5 4 6 Mitigation of common cause failures in CPU In order to prevent common cause errors that can affect both of the pair of the CPU cores Texas Instruments employs some special techniques that at least can reduce this possibility of simultaneously identical error appearance in both of t
5. The current research has focused primarily on the systems with Cortex R4F CPU cores This section aims to emerge the main differences of the Lockstep implemen tation in the most recent microcontoller of Texas Instruments that is targeting to safety critical systems with the Lockstep implementation discussed in section 5 4 5 5 Lockstep implementation on TMS570LC43x by Texas Instruments 49 Outputs CPU2 tothe Outputs from CPU1 to cpulclk system the system CCM R5F CPU Bus Compare 2 cycle delay PD Inactivity Monitor Compare errors Checker CPU ESM Inactivity Monitor VIM Bus Compare tt VIM1 VIM2 Lockstep mode Safe values values that will force the CPU1 checker CPU s outputs Main CPU to inactive states CPU2 Checker CPU Lockstep mode 2 cycle delay cpu2clk Inputs to CPU1 Inputs to CPU2 Figure 5 7 Block Diagram of CCM R5F 40 Regarding the comparison of the CPU outputs the concept is the same with the implementation of TMS570LS31x 21x microcontrollers A 2 cycles delay is applied to the output of the Main CPU and an identical amount of delay to the input of the Checker CPU Before the delay stage the output of the Master CPU propagates to the system while after the delay stage is fed to the CPU Compare Module for Cortex R5F CCM R5F module rece
6. e High voltage should operate better than low e Low temperature should operate better than high 4 DOWNIa DOWN1a is an additional test algorithm for CPU memory read write op erations that is targeting row column decode in the SRAM array the sense amp and sense amp multiplexers the memory array output buffers and with aggressive writes checks potential at speed write failures 5 DTXN2a The DTXN2a algorithm is targeting to insure the correct functionality of the global column decode logic 4 3 Error Correction Code ECC on Flash and SRAM As already mentioned Error Correction Code ECC is employed in the R4F CPUs In microcontrollers a potential corruption of the stored data in the memory is possible due to several reasons Error detection techniques are employed by the semiconductor companies to verify the correctness of the data before or during the execution of the software in the memory modules 8 Adding extra ECC bits to the original data is a technique that facilitates the error detection insuring the data integrity to the system The implementation on the TMSx570 devices is capable of detecting up to 2 bit errors and correct single bit errors The ECC controllers are located inside each of the CPUs This placement of the ECC controller provides two basic advantages The diagnostic covers a potential error detection between the CPU and the memory Moreover the ECC logic is tested itself for potential errors as produced
7. 4 2 Programmable Built In Self Test PBIST Module The Programmable Built In Self Test Module integrated on the Hercules microcon trollers facilitates the testing of on chip memories via a host processor interface A dedicated on chip PBIST ROM is employed to store information regarding on chip memories memory groupings memory background patterns and test algorithms Moreover a test of the memories at their maximum possible speed in application is provided and an intelligent clock gating is implemented in order to conserve power Finally it provides the ability to test its own PBIST ROM itself 40 Regarding the architecture a small co processor with a dedicate instruction set specialized in memory testing is employed which executes routines that are stored in the PBIST ROM In comparison to software based techniques that are used for memory testing from the main Cortex R4F processor the PBIST provides signifi cant advantages e A dedicated memory path is used specifically for the self test of the mem ories resolving the problem of the long access paths that embedded CPUs have to face e The dedicated instruction set that is specifically formulated for memory test ing facilitates the development of such test routines The generic instruction set of an embedded CPU may do the development and execution of such self tests a complex task e The algorithm code on a conventional embedded CPU for this purpose is significantly larger tha
8. the TMS570LS3137 microcontroller by Texas Instruments Initially we investigated the term Lockstep and its relation to safety critical em bedded systems The experiments demonstrate how to configure the Lockstep in the desired mode and how to perform the only potential experiment by setting the CCM R4F module in Error Forcing mode causing artificially an error Addition ally the HiteX SafeTI Kit was employed while it is advertised as a great tool for the assessment of the safety features of the Hercules family of microcontrollers It utilizes a clever concept of a run time injection of a fault with a simple button click via the provided GUI Although the time measurements are not realistic while the errors are produced by software which adds overhead to the real fault detection time Most likely it is more realistic for the assessment of other safety features that are out of the scope of this project The current project contributes to the company as information material for a po tential future use of Lockstep architecture in the innovative avionics systems that it 83 84 Chapter 9 Retrospective produces Moreover it highlights an obscure in terms of literature but interesting safety feature that is already a state of the art in automotive medical and industrial applications The following research question that was thoroughly investigated was the reasons to employ a Lockstep based MCU A simplified answer would be tha
9. toolchain gcc arm none eabi 4_9 2015q1 20150306 pkg and find the built toolchain source code packages and the md5 checksum file Appendix F HALCoGen Configuration Screen shots The following figures depict the configuration that is required in the GUI of HALCo Gen for the first experiment see Section 7 1 1 After setting the appropriate op tions the code is generated to proceed further with the development of the appli cation i File Edit View Tools Window Help G agdaxoaoe QRajiajees aay Start Page TMDX570LS31HDK PINMUX RTI GIO SCI LIN SC MIBSPIL SPI2 MIBSPB SP4 MIBSPIS General Driver Enable SAFETVINIT R4 MPU PMU Interrupts VIM General VIMRAM VIM Channel0 31 VIP r Enable Driver Compilation Se Click and mark the required modules for driver compilation from below 7 Enable RTI driver Mark Unmark all drivers Enable GIO driver Enable SCI driver Enable LIN driver Enable SCI2 driver SCILIN LIN in SCI mode Enable SPI drivers M Enable SPI2 driver Enable SPI4 driver Enable MIBSPI drivers Enable MIBSPI1 driver Enable SPI1 driver Enable MIBSPI3 driver T Enable SP13 driver Enable MIBSPI5 driver Enable SPI5 driver Enable CAN drivers Enable CAN1 driver Enable CAN2 driver Enable CAN3 driver Enable ADC drivers Enable ADC1 driver Enable ADC2 driver Figure F 1 Enable RTI GIO Drivers 103 104 Appendix F HALCoGen Config
10. IEC 61508 is used by safety equipment vendor companies to verify and document that their devices are suitable for use in SIL rated systems 34 2 1 2 ISO 26262 ISO 26262 is a sector specific standard for the automotive industry that focuses on functional safety It is an extension of the IEC 61508 which covers industrial sys tems in general In this standard the safety systems are divided in two categories e Active safety systems that react proactively to prevent an accident such as ACC ABS and ESP e Passive safety systems that refer to the reactive actions after an accident occur rence i e safety belts airbags belt tensioners These systems and especially the electronics must also be secure because a mal function could lead to personal injury The safety lifecycle that is followed in order to ensure functional safety according to ISO 26262 starts with a definition of the system at vehicle level For instance investigating an airbag system a hazard analysis and risk assessment have to be initially performed A potential hazard for this system could be an unexpected inflation of the airbag For each of these hazards a safety goal has to be set For instance how to prevent the unintended inflation of the airbag Similarly to the IEC 61508 each of these safety goals is categorized in Automotive Safety Integrity Level ASIL terms There are 4 available ASILs from ASIL A to ASIL D with an increasing safety classification level
11. International Electrotechnical Commission IEC Initially they were developed for application on system level but they were extended on product and component level They consist of seven parts e IEC 61508 1 General requirements e IEC 61508 2 Requirements for E E PE safety related systems e IEC 61508 3 Software requirements e IEC 61508 4 Definitions and abbreviations e IEC 61508 5 Examples and methods for the determination of safety integrity levels e IEC 61508 6 Guidelines on the application of IEC 61508 2 and IEC 61508 3 e IEC 61508 7 Overview of techniques and measures The IEC 61508 include a series of actions that are necessary to be followed during the design implementation operation and maintenance of the system to comply with the required Safety Integrity Level SIL There are four available SILs each corresponding to a range of target likelihood of failures of a safety function So the safety integrity level is a factor dependent on a safety function rather than on the system or any part of it 18 The SIL4 is considered the strictest while it targets to the protection against the highest severity level risks 10 Chapter 2 Functional Safety and Certification in Embedded Systems For a given process SILs are measures of the safety risk The assignment of a SIL reveals to what extent a process is expected to perform safely and in case of a fail ure to what extent it can react by failing safely 43 On a device level
12. a European company The EADS was founded the 10th of July 2000 by the fusion of the German aerospace and defence company Daimler Chrysler the French missile and aircraft company Aerospatiale Matra and the Spanish aircraft company Construcciones Aeronauti cas SA This new company became immediately the second largest global aerospace com pany after its American competitor Boeing and the second largest European de fence company after the British company BAE Systems 1 1 2 The Electronics Business Line The entity of Electronics is headed by Thomas Mueller and is responsible for pro viding equipment for system integrators that serve both Airbus Defence and Space within the Airbus Group and external customers all over the world The prod ucts are basically addressing the civil defence and security markets by providing ground maritime airborne and space applications The products of the Electronics business line include radars and IFF systems electronic warfare devices avionics space platform electronics space payload electronics as well as optronic sensors 1 1 3 TEOHD5 Department The current research was conducted on behalf of TEOHD5 department of Avionics Center in Friedrichshafen in Germany The title of the department describes what it has to deal with TEOHD5 Platform Software Avionics Computer The mission of the department is the development delivery and maintenance of Basis Software The definitions of Basis Softwa
13. both CPU signal ports are fed with Os e 3rd cycle both CPU signal ports are fed with As e 4th cycle both CPU signal ports are fed with 5s Compare Mismatch Test In the compare mismatch test patterns with one different bit are applied to the input ports of the CCM R4F module see Figure 5 4 For instance when a sig nal with all 1s is applied to the first line a sequence of signals are applied to the second starting by flipping the least significant bit and on Cycle 0 1 2 3 The comparison of the unequal vectors is expected to generate a mismatch in the pro portional bit i e bit 0 in cycle 0 bit 1 in cycle 1 to indicate a correct functionality without though propagating an error signal to the ESM 5 4 Lockstep implementation on TMS570LS31x 21x by Texas Instruments 45 In case that a match is detected in any of the unequal comparisons the self test er ror flag is set the self test error signal is raised and the test is terminated revealing an unintended behavior of the module CPU 1 Signal Position CPU 2 Signal Position n 1 8 5 4 1s 1s 1s 1s 2 2 1 1 1 ejeje eje ejejejej s ejeje ejo oje je efw ejoje ejn ejejojeje eje jejojo 1 1 1 2 1 1 1 1 ejeje efjejeje 1 1 1 1 1 1 1s N 2 h e H v e e e e e e e is 1 1 1 1 1 1 1 2n 0 1 ds 121 1 2 SEE E N E E E 1s
14. in both of the cases an RM48L952 3 1 Hardware 25 Figure 3 2 SafeTI Hitex Safety Kit 17 Additionally a TPS65381 Q1 companion chip is integrated to monitor the opera tion of the SDUT device This companion chip is a power source that contains a watchdog which requires specific messages from the SDUT in a predetermined amount of time in order to detect potential hardware faults The software stack of the system consists of four layers from bottom to top 1 The hardware abstraction layer 2 The Texas Instruments SafeTI Diagnostic Library 3 The SAFERTOS operating system 4 Example user application SAFERTOS SafeTl Diagnostic Lib SafeTI HSK Board Host Control GUI Hercules CME VCC SPI Reset MCU Companion M Safety MCU OAA RIOT sN ip Texas INSTRUMENTS Fault Injection and Monitoring SPI Fault Injection and Monitoring Hercules cu Host Board Communication Power Supply Se Control MCU 4i TEXAS INSTRUMENTS Figure 3 3 SafeTI HSK Block Diagram 33 26 Chapter 3 Embedded Software Development Essentially the purpose of the kit is to monitor the reactions of the SDUT device when faults are injected in real time by the C amp M The C amp M is responsible for monitoring the behavior of the device under test and provide the results and the measurement of system response times in a GUI based application in a graphical manner 3 2 Software The software tools that were used during t
15. lockstep logic Figure 6 7 and then to the node controller Figure 6 8 In one im plementation the node controller is aware of the architected state of all the pairs that are connected to it Another case may be that the node controller is used just as means for communication between the processors In this way it could store internally or to an external component of the system the architected state of the processors A further scenario could be that the node controller will facilitate the storage of its architected state to another processor of the system Finally one pro cessor could have the role hot standby that will be idle and in case of a failure loss of lockstep on another processor it could recover it by adapting its archi tected state with the help of the node controller 100 111 113 CPU 0 NODE CONTROLLER CPU 4 125 gt 121 123 Figure 6 8 Architectural details for recovery from loss of lockstep 23 A flow diagram for the recovery of lockstep via the hot standby processor is 6 5 System and method to increase lockstep core availability Infineon Technologies AG3 demonstrated in Figure 6 9 An example case would be a process depicted in the first step of the flow diagram is executed in lockstep on the pair of CPU0 and CPU1 of the Figure 6 8 An error event is detected by CPU0 indicating an impending loss of lockstep and it signals the node controller for this failure The node control
16. 18 Figure 6 2 Error signaling Interface in a lockstep architecture by Freescale 47 A simplified model of the current patent is illustrated in Figure 6 2 The system in cludes two substantially identical processors operating in lockstep two memories that are bi directional connected to the system interconnect and may be static ran dom access memory SRAM or dynamic random access memory DRAM or flash an error logic coupled in between of the two processors which bi directionally sends and receives control signals to the processors and the lockstep logic includ ing a bus interface unit BIU that is employed for the connection of the lockstep logic to system interconnect The lockstep logic includes the comparator logic that compares the outputs of the two or more processors and reports any potential error indicating that the pro cessors do not operate in lockstep mode anymore Furthermore it may acts as a coordinator to couple the non failed processor to the system interconnect via BIU or trigger an exception or directly reset one or both processors The current embodiment is referring to lockstep operation of the processors where both execute the same program code simultaneously accessing the same addresses and anticipating identical outputs A read request accesses the same data saving the content in their respective cache memories exactly on the same clock cycle and in the same location On the other hand they can also operate in
17. Tools on Ubuntu 8 10 E 1 1 Install Ubuntu Ubuntu 8 10 E 1 2 Tune environment and install required software E 1 3 Build GNU Tools for ARM Embedded Processors Appendix F HALCoGen Configuration Screenshots Appendix G Requirements for Multi core Processors Bibliography vii 87 89 91 95 97 99 99 99 99 101 103 107 113 List of Figures 1 1 2 1 2 2 2 3 2 4 2 5 2 6 2 7 3 1 3 2 3 3 3 4 4 1 5 1 5 2 5 3 5 4 5 5 5 6 5 7 5 8 5 9 6 1 6 2 6 3 Airbus Defence and Space logo ooa 3 ASIL classification examples 28 2504 soe scan oh ee kee s 11 Criticality Level Pyramid for Avionic Systems 45 13 lool Block Diagram 19 4 d 3 otis ee vee eB Dao De eee a 15 Two Out Of Three Architecture 2003 46 16 One out of One with Diagnostics 1001D Two out of Two 2002 Architectures 46 o i000 amp aie See aoe 8 ee Gee Bat eo 16 The Hercules ARM Family of safety MCUs 30 18 SafeTl Devices 32 2 2 2 4 etn 2 Avid owe BP Ai Wie aye ee Ate E 19 TMS570LS3137 Hercules Development Kit 23 SafeTI Hitex Safety Kit 17 ih a eck os le ee ee es Ge 25 SafeTI HSK Block Diagram 33 3 084 o2b 2a whe aoe ee a 25 Hardware Abstraction Layer in an Embedded system 36 27 Test Coverage and Duration 40 04 31 MPC564xL Block Diagram 12 acc lt 2 ce ale le ee ae we ee 41 In
18. a coun terpart of the well established DO 178 addressing electronic hardware issues to reach the certification compliance DO 254 follows a top down approach on electronic hardware starting from the system level The scope of this document is to address certification issues and to ensure the functional safety on electronic hardware components where the com plexity of these devices is increasing and their size is shrinking 4 It relates ex clusively to airborne systems and it is worth mentioning that electronic hardware is not certified as a stand alone entity There is a distinction that derives from an analysis of a hardware component It may be either a Simple Electronic Hardware SEH or a Complex Electronic Hard ware CEH If it is determined that a component belongs to the former category after the conduction of the appropriate analysis i e the component is fully testable or it does not contain silicon based logic the DO 254 document is not required More specifically a simple component is one that provides the capability of per forming exhaustive testing on it examining all the potential inputs Further steps in an abstract form could be defined as follows e Systems aspects are considered first System or function Development As surance Level DAL is determined for a hazard that may occur at aircraft level Additionally a hazard assessment should be performed System func tions and DAL are allocated to hardware Requireme
19. and per manently The developer is not able to disable it The current experiment is kept as simple as possible in terms of functionality in order to avoid the complexity in code that a more sophisticated application could involve The validity of the experiment is not affected while the error detection is independent of the application complexity and the size of the code The errors that may occur are exclusively related to the hardware and the conditions that the system could be physically exposed to What is visually perceived during the execution of the code is that when the pro grammable push button is pressed it turns two LEDs on on the evaluation board e The red LED indicating that a hardware error is present on the system and is turned automatically on e The white LED that is set by the code when the button is pressed If the button gets pressed for a second time it turns both of the LEDs off The behavior of this application was designed with the purpose to artificially gener ate an error when the user presses the programmable button and monitor if it is efficiently detected by the comparator module When the error is detected the ex ecution of the software is halted With a second press of the button the error is cleared from the ESM module and the execution of the software continues 7 1 Experiments 71 yN wv gt KA A v mia o oreo omemene xe mammi Figure 7 1 Entering Error forcing mode
20. chapter 66 Chapter 6 Lockstep Related Patents is to give an insight about the potential problems and the suggested solutions for the lockstep architecture and not to analyze all the details of each invention 6 6 Computing with both lockstep and free step processor modes Hewlett Packard Development Company L P The current invention demonstrates a method for systems comprised of a set of two or more processors with capability to operate in two step modes 1 lockstep or free step mode Strongly depending on the severity in the safety requirements of a task it can be assigned to a processor to work in a free step mode or to be coupled in a set of two processors for execution in lockstep mode The benefit is an optimal way to trade off between high performance and system integrity by using free step or lockstep mode respectively a j INTERFACE a E i aerial OPERATING evsiem bate hele ea Figure 6 12 A computer system including four processors with the suggested architecture by Hewlett Packard 1 The example system in Figure 6 12 includes four processors placed in pairs P11 P12 P21 22 and each pair interfaces to the rest of the system by the Core Elec tronics Component CEC The rest of the system consists of memory RAM or hard disk etc and I O channels The CEC includes an interface logic for the communication of the processors with the rest system and a loss of lockstep logic that controls the error detecti
21. cruise control up to 100m RADAR and LI DAR Vehicle dynamic and chassis control ABS braking systems Electronic stability program ESP Blind spot detection Pre crash detection 40 Chapter 5 Lockstep e Hybrid electric vehicles The implementation of this MCU is dual core dual issue To increase instruction throughput the processor can issue certain pairs of instructions simultaneously two instructions per clock cycle 6 Instead of using two MCUs with a primary core executing the software and a checker core executing some safety diagnostic software MPC564xL family contains two channels each consisting of a core bus interrupt controller memory controller and other core related modules This MCU can be pre configured to operate in two distinct modes 1 Decoupled parallel mode DPM independent core operation 2 Lockstep mode LSM parallel core operation Depending on the nature of the application the developer has to employ the right operating mode The trade off between performance and software transparency has to be considered under the choice of the mode In LSM the two cores result in the performance of one while high diagnostic coverage and short detection in tervals without software interaction is achieved On the other hand in DPM the two channels work independently providing higher performance but lacking of diagnostic coverage without software interaction at the hardware level 3 More spe
22. e acti vate a redundant engine the system sets the defective engine to idle state In high altitude the level of radiation that occurs from cosmic rays is significantly high As is already mentioned such a radiation can cause a bit flip in the core of the system that executes this temperature calculation In case of such a bit flip during the conversion from a single core MCU without any diagnostic an erroneous fire alarm could be raised The erroneous calculation could end up in a non realistic over limit temperature value leading to an undesirable deactivation of the engine and getting the flight crew into trouble In presence of a Lockstep MCU such an error would be detected and could temporarily set the system into a safe state The two cores operating in Lockstep most likely would not experience the same bit flip and they would produce divergent outputs This discrepancy in the comparison of the output signals would generate an error From this point and on everything would depend on the system requirements For instance if the system could handle to wait the time that is required for a soft reset of the MCU it could be determined if the error is temporal or permanent In case of permanent error the system could finally set the engine idle and enable a redundant one In case of a soft error a reset of the system could be enough to continue operating efficiently In this example we tried to eliminate the complexity of such a syste
23. fault injection in the Hardware an ESM error signal is raised 5 In the ESM IRQ handler another pin is raised indicating to the monitor that the fault is detected The steps that occur in a fault injection in the C amp M and SDUT devices are demon strated in Figures 7 7 and 7 8 respectively NE a IRQ handler for Signal message IRQ handler for Signal message recognize F iai PEER profile profile signal with sampled profile signal with sampled a a message samples SS ee is sent samples SS W i sent to SDUT timestamp timestamp Figure 7 7 Dataflow of C amp M Device In a Fault Injection Operation 33 NJ TEN N eA E N ie N N end test N ome application part a bide rary sib gt S safety library application part set Figure 7 8 Dataflow of SDUT Device In a Fault Injection Operation 33 Asa result during the pre work for the error injection peripherals or OS interrupts may occur and they add overhead to the detection time This overhead makes the measurement of the detection time non deterministic In simple words the reason is that the fault is injected by software and is not a realistic case where a hardware error generates the highest priority interrupt More specifically the error detection time in Error forcing mode would expected to be 1 CPU clock cycle as is the duration of the Error Forcing test see Section 5 4 4 HSK integrated software utilizes the functions of the SafeTI Diagnostic Library provided by Tex
24. faults are detected by the CCM R4F module 4 3 Error Correction Code ECC on Flash and SRAM 35 The Single Error Correction Double Error Detection SECDED ECC logic is a di agnostic for the on chip SRAM and Flash memory modules Accesses to the mem ory modules are protected with this embedded in the CPU logic 8 bits of ECC code are provided for 64 bits of data or instructions fetched from the memory The expected ECC code is calculated by CPU and compared with the one received from the memory This logic is capable of correcting a single bit error Therefore in case of a single bit error it is flagged and corrected by the CPU In case of multi bit errors they are only flagged by the CPU and the application has to act appropri ately in such a case The logic is different for each memory type and depending on the CPU In case of flash memory the ECC check bits are programmed along the program data In RAM modules the ECC code is generated by hardware with each write operation and is stored in specific ECC memory location The detection of an error is per formed during a read operation where the ECC bits from the special ECC space are read along the data and if it is a single bit error it is corrected directly Chapter 5 Lockstep The term lockstep is used in a variety of circumstances In the military the march ing where the legs and the arms of a drilling soldier were absolutely in sync with the corresponding of the soldier in f
25. for humans and or for the environment As a result appropriate measures must be applied during the design implementation installation and maintenance of safety critical systems both in terms of software and hardware 37 At the beginning of a safety critical system development a hazard and risk analysis is required to determine the safety integrity level for each component individually and for the whole system Therefore different standards exist for each sector that provide guidelines for the lifecycle of hardware and software development As a result safety critical systems should not only be safe but they need to look safe in order to be in compliance with the respective safety standard by the certification authority A vast majority of safety critical systems is in the form of embedded systems where a microprocessor is employed and it is not visible from the outside envi ronment The highest volume production of safety critical embedded systems is observed in automotive and domestic product development companies Safety is an abstract term Although in the field of the embedded systems the following definition could give a good interpretation Safety is a property of a system that it will not endanger human life or the environment 2 Chapter 1 Introduction and a safety related or safety critical system may be defined as The system by which the safety of equipment or plant is assured 37 The field of avionic
26. in the development of avionic systems The initial version DO 178 was addressing the software life cycle in avionics A later version the DO 178A was enhanced with the classification of the system into criticality levels and component testing methods to increase the quality of the overall system The next version DO 178B was developed from scratch Software quality was increased by the addition of more planning methods more application development practices the integra tion of COTS products and tools and the integration of techniques for continuous testing and monitoring of the system in real world conditions The latest DO 178C version addresses known issues of the previous version while it provides new tool qualification guidance The benefits derived from following the DO 178 series are not restricted only at the certification The guidelines facilitate the verifiable software quality higher re liability greater re usability decreased maintenance lower life cycle costs faster integration to the hardware and increased probability of error detection during the development phase 14 Chapter 2 Functional Safety and Certification in Embedded Systems Design Assurance Guidance for Airborne Electronic Hardware DO 254 DO 254 was released in 2000 and was formally required by FAA from 2005 The Eu ropean commission EUROCAE European Commission for Civil Aviation Equip ment provides the ED 80 containing the same content with DO 254 It is
27. of Hercules Family Texas Instruments provides a series of safety features for the Hercules family of mi crocontrollers that target to the functional safety of the applications The current research focuses on the Lockstep architecture and the benefits that it may provide in a safety critical system but this section aims to give an overview of the other provided features that make Hercules a competent family of microcontrollers in a variety of safety applications currently in the transportation industrial and medi cal markets Instead of using software mechanisms to detect hardware errors Hercules micro controllers implement safety in hardware that reduces the software overhead while it increases the performance The Lockstep architecture is presented thoroughly in Chapter 5 and eliminates redundant system requirements and consequently ad ditional cost In absence of the Lockstep architecture redundant subsystems are employed to comply with functional safety requirements This redundancy of the subsystems increases the cost and the space of the system significantly Except of Lockstep as a safety feature of the Hercules family the CPU hardware built in self test BIST is employed for detection of latent defects without extra software overhead Furthermore for the protection of memories and buses Error Correction Code ECC logic is integrated in the CPU Moreover high diagnostic coverage is achieved while all Random Access Memories RAM
28. should be treated during the certification process Moreover an example use case of Lockstep archi tecture in an avionic system is presented 8 1 Certification Actions One of the research questions for this project is How Lockstep should be treated in the certification process of an avionic system As was already mentioned in chapter 2 1 3 the certification process is applied at a system level As a result we cannot conclude if the Lockstep architecture can be employed in avionic systems or not and under which DAL is suitable to use such an MCU It is strongly dependent on the nature of the application and the hazards that could derive For a COTS hardware component as is the TMS570LS3137 MCU a series of analysis is required i e SEH or CEH component hazard analysis allocation of hardware component to each function and so on Although such an analysis is out of the scope of this project and impossible in absence of a specific system and application Instead we focused on how this processor should be treated in a certification pro cess Following a list of requirements that are included in the CAST 32 Multi core Processors paper provided by FAA the analysis regarding the Lockstep architec ture is attached in Appendix G we are confident to claim that the TMS570LS3137 MCU should be treated as a single core processor Beside the fact that the CAST 32 paper mentions that This paper does not apply to the following MCP archite
29. the comparator detects a mismatch it generates a control signal notifying the state control logic of the CPU cores see Figure 6 10 In another implementation a control register is employed as shown in Figure 6 10 that is notified by the comparator for a mismatch and it generates the control signals for controlling the state control logic of the CPU cores In any of the aforementioned scenarios the state control logic of the CPU cores receive a control signal that there is a discrepancy in the comparison of the outputs Therefore it generates a read signal to the state buffer the saved state is retrieved and the execution of the task restarts To sum up a known good state of the master CPU core is saved in the state buffer before the execution of a task In case of a comparison error in the lock step execution the core states are reverted back to the known good state and the instructions are re executed The benefit of this method is that in an erroneous situation only the CPU cores are winding back to a safe state and not the entire system i e peripherals As a result the core availability meaning the time that the core is operational and able to execute instruction sets is significantly increased Several enhancements are proposed in this patent such as using a counter to pre vent an endless loop of reverting to the same state when executing the same task which could derive from a permanent error However the purpose of this
30. the several semiconductor companies that have already integrated it in their microcontrollers The available hardware for this research was the TMS570LS3137 Hercules Devel opment Kit provided by Texas Instruments that has an integrated delayed lockstep mechanism and we focused on this specific implementation The following re search questions were the primary concern of this project 1 2 What is Lockstep in terms of embedded systems What are the reasons for using Lockstep as an error detection mechanism What kind of errors can it detect How to use Lockstep specifically for TMS570LS31x 21x How Lockstep should be treated in the certification process of an avionic system What are the limitations and drawbacks of a Lockstep architecture Chapter 2 Functional Safety and Certification in Embedded Systems The goal of this chapter is to describe the terms around safety in embedded sys tems and to highlight the differences among them Additionally safety standards and their objectives are briefly presented While the object under investigation is the Hercules family of microcontrollers we will point out the safety standards that it complies with and the approach that Texas Instruments follows for this purpose Finally we focus on the avionics standards that is the sector of the department that supported the current project 2 1 Functional Safety and Fault Tolerance An increasing amount of electronic devices inv
31. 0 984 B2 2012
32. 1 Block Diagram 19 Anthony Vaughan the manager of Hercules safety group of Microcontrollers in North America describes the usability of this 1001D architecture in the aerospace area and how it can be utilized to meet the strict safety criteria that are required in this industry with the required highest level functional safety 46 A common approach to ensure the functional safety in embedded aerospace elec tronic systems is the employment of the 2003 architecture see Figure 2 4 Sys tems following this architecture comprise of 3 probably dissimilar embedded controllers and a voting circuit In case of a fault occurrence in one of the three controllers the system availability remains unaffected while the other two control the system This fail operational architecture is proven to be efficient for flight critical applications where is required high system availability and extremely low failure rate Beyond the fact that 1001D is already used in a variety of safety critical applica 16 Chapter 2 Functional Safety and Certification in Embedded Systems gt Output Circuit Two out of Three 2003 Figure 2 4 Two Out Of Three Architecture 2003 46 tions is not enough for the safety requirements of the aerospace area Instead a Two out of Two with Diagnostics 2002D architecture may be utilized to cre ate fail safe systems This proposal that is analyzed in the article of A Vaughan deploys two separate channels each con
33. 3 systems They have been independently assessed by Exida and they were developed to address the risks of MCU devel opment and to restrict the random failures that have been noticed in safety critical systems during operation lExida is one of the leading authorities in the field of functional safety 18 Chapter 2 Functional Safety and Certification in Embedded Systems Hercules Platform Value Line High Performance High Performance Transportation and Transportation and Industrial and Medical Safety MCUs Safety MCUs Safety MCUs Transportation Applications Transportation Applications Industrial Applications Automotive Q100 Qualification Automotive Q100 Qualification Medical Applications 40 to 125 C Operation 40 to 125 C Operation TMS Qualification LIN CAN Connectivity FlexRay CAN Connectivity 40 to 85 105 C Operation Supports Safety for Developed to Safety Standards Ethernet USB Connectivity IEC 61508 Systems IS026262 ASIL D Developed to Safety Standards Cortex M3 to 100 MIPS IEC 61508 SIL 3 IEC 61508 SIL 3 Cortex R4F over 280 MIPs Cortex R4F over 320 MIPs Figure 2 6 The Hercules ARM Family of safety MCUs 30 Combining such an MCU that is specifically developed to comply with a SIL and a RTOS that ensures the requested requirements for this safety level simplifies the certification process for both the certification authority and the development com pany In this dir
34. 3137 has 3MB flash and 256KB RAM which are protected with Single bit Error Correction and Double bit Error Detec tion SECDED The flash is an Electrically Erasable and Programmable Memory EEPROM with a 64 bit wide data bus Furthermore program and read opera tions are performed with a 3 3V supply input to the flash memory Same input supply is used for the I O operations The TMS570LS3137 device has 3MB of integrated flash and 256KB of data RAM Both the flash and RAM have single bit error correction and double bit error de tection The flash memory on this device is a nonvolatile electrically erasable and programmable memory implemented with a 64 bit wide data bus interface The flash operates on a 3 3 V supply input same level as I O supply for all read program and erase operations When in pipeline mode the flash operates with a system clock frequency of up to 180 MHz The SRAM supports single cycle read and write accesses in byte halfword word and double word modes Additionally two Next Generation High End Timer N2HET co processors and two Analog to Digital Converters ADC with up to 24 inputs are integrated in the MCU The N2HET module is an intelligent timer that is used in real time applica tions while it provides sophisticated timing functions A High End Time Transfer Unit HTU with a built in Memory Protection Unit MPU facilitates the DMA type transactions of N2HET and main memory Moreover two Multi Buffered A
35. 4 4 6 2 are measures to prevent the common mode failure in the redundant CPU architecture The approach of Texas Instruments in the TMS570 family of processors for facilitat ing the application developers to achieve functional safety is called Safe Island This term describes the combination of hardware and software diagnostics while 20 Chapter 2 Functional Safety and Certification in Embedded Systems at the same time concerning the cost In this philosophy elements like power clock reset CPU Flash SRAM and associated interconnect to Flash and SRAM operate continuously hardware safety mechanisms to insure the right execution of software Therefore after ensuring the correct operation of the aforementioned elements software based diagnostics can be executed to verify the correctness of other device elements such as peripherals 21 Chapter 3 Embedded Software Development In this section we present the hardware and software that was used during this project The hardware selection was predetermined by the company while the TMS570LS3137 fulfills all the safety and communication features that are needed to be employed as a coordinator between two modules of a large radar project that is currently in progress Regarding the software the recommendations of Texas Instruments were used to provide stability and flexibility regarding a potential certification process such as the Code Composer Studio IDE the HALCoGen for automatic co
36. F they try to transfer it to a simpler component with lower failure rate where the error detection is easier and the self test to ensure the right operation of the component itself is faster Definitely the CCM comparator module is much simpler than the core Additionally the time execution of CCM self test is much less than the CPU self test 10 405 and 364 000 usec respectively Along the current report we point out several advantages of the Lockstep architec ture It is a run time safety mechanism to detect errors without adding any burden in the development process From the developer s point of view a single core pro gramming model is required avoiding the complexity in programming multi core processors Additionally while it is a hardware implemented diagnostic the error detection is faster than implementing any software that checks for errors The only disadvantage we can notice is the increased power consumption that such an MCU require in comparison to a single core Obviously in any safety related system the designer has to decide on a trade off between either performance or cost in terms of silicon area or power consumption and safety related functions 9 2 Conclusion To sum up the objectives of the current project were accomplished giving to the reader an overview of the Lockstep architecture in general and detailed informa tion about the Lockstep implementation of Texas Instruments using the Cortex R4F processor The res
37. NCE Insert a delay of 500ms to avoid button bounce rtiStartCounter rtiCOUNTER_BLOCK0 Reset the button pressed indicator flag buttonPressed BUTTON_LOW break 94 USER CODE END USER CODE BEGIN 4 USER CODE END Appendix B Main function of the first experiment Appendix C CCMR4F Compare Function x file ccmr4 c brief CCMR4 Driver Source File to check CCMR4 compare error date 15 June 2015 x version 1 00 000 c Texas Instruments 2009 All rights reserved include ccmr4 h include gio h include het h include lt stdio h gt extern unsigned int ESM_High_Int_Flag x fn void CCM_R4_Compare void brief CCMR4 Compare fail Error creation and check routines void CCM_R4_Compare void Setting the Error forcing mode CCMR4Reg gt CCMKEYR 0x00000009 Waiting for interrupt ESM Interrupt flag while ESM_High_Int_Flag Clear the Interrupt Flag ESM_High_Int_Flag 0 95 Appendix D ARM GCC 4 9 2015q1 Big Endian Patch home steliosgan gcc arm none eabi 4_9 2015q1 20150306 src gcc gcc config arm t rmprofile orig 2015 02 27 05 16 50 000000000 0100 home steliosgan gcc arm none eabi 4_9 2015q1 20150306 src gcc gcc config arm t rmprofile 2015 03 20 10 59 56 870119629 0100 13 8 13 10 MULTILIB_OPTIONS mflo
38. S3137 The SafeTI Hitex Kit is an evaluation board implemented by Hitex in cooperation with Texas Instruments for assessing the safety features of the MCU The kit has 2 Hercules MCUs on it One is the primary MCU and the second is used to inject faults The software tool can be used to choose and profile the reactions of the main MCU The purpose of this experiment is to highlight the capabilities that this evaluation kit provides related to the Lockstep architecture and the CCM R4F module Measurement of Test Execution Time The profiling window in the HSK Monitor GUI provides the capability to perform time measurements for certain tests In the drop down menu when selecting the 7 1 Experiments 73 CCMR4 module which is the main concept of this project 3 tests are available and they are executed when the user presses the Profiling button e CPU Lockstep The CPU Lockstep performs a self test of the CCM R4F module see Section 5 4 3 and the result of this test is depicted in the Figure 7 2 The amount of time that is required for a self test of the CCM R4F module is 14 usec as is shown on the right bottom corner of the HSK Monitor GUI ES c Figure 7 2 CPU Lockstep Test Time Measurement e CPU Error Forcing Test The execution of the CPU Error Forcing test reveals the duration of an error de tection that is applied on the compare error output signal of the compare unit see Section 5 4 4 The de
39. S570 LS and TMS570 LC differences Available http e2e ti com support microcontrollers hercules f 312 p 357489 1254554 Accessed 13 06 2015 TMS570 wiki page Available http processors wiki ti com index php Category TMS570 Accessed 07 06 2015 TMS570LC43x 16 32 Bit RISC Flash Microcontroller Technical Reference Man ual Texas Instruments May 2014 TMS570LS3137 page Available http www ti com product tms5701s3137 Ac cessed 04 06 2015 TMS570LS31x 21x 16 32 Bit RISC Flash Microcontroller Technical Reference Man ual Texas Instruments November 2012 Revised 2013 Understanding Safety Integrity Level User s Manual Magnetrol William C Moyer Dripping Springs TX US Dynamic Lockstep Cache memory Replacement Logic Pat US2012 0272006 A1 2012 T Baghai V Hilderman Avionics Certification A Complete Guide to DO 178 Software DO 178C Update DO 254 Hardware Second Edition Avionics Communications Inc 2013 Anthony Vaughan Lockstep microcontrollers advance aerospace electronics safety Available http www ecnmag com articles 2013 03 lockstep microcontrollers advance aerospace electronics safety Accessed 10 07 2015 Michael J Rochford Round Rock TX US Davlde M Santo Munich DE William C Moyer Dripping Springs TX US Error detection and communi cation of an error location in multi processor data processing system having processors operating in lockstep Pat US 8 09
40. Tl QM TMS320F2802x SafeTl 60730 Class B TMS320F2803x SafeTI 60730 Class B TMS320F2806x SafeTI 60730 Class B TMS320F2833x SafeTl 60730 Class B TMS320 2823x SafeTl 60730 Class B TPS65381 SafeTl 26262 ASIL D Power management SafeTI 61508 SIL 3 TPS65310 SafeTl 26262 ASIL B DRV3201 SafeTI 26262 ASIL D SAATELE SafeTl 61508 SIL 3 SN65HVDA1040 SafeTl QM QM CAN transceiver 4950 54x 251 Power and sensor TPIC7218 SafeTl QM QM interface ASSP Sensor signal PGA400 SafeTl QM QM conditioner Figure 2 7 SafeTI Devices 32 temporary failure of SRAM or shorting of adjacent signals in the MCU and so on For the prevention of such errors semiconductor companies utilize safety mech anisms that execute during the operation of the system within the control loop Because of the temporal constraints in real time embedded systems the burden of the continuously or even periodically execution of these diagnostic tests became a significant issue for the semiconductor companies As a result the need for parallel execution of such tests became vital The Hercules MCU platform provide inte grated into hardware diagnostics facilitating tight control loops while the checks run in parallel The application developer has to be aware of common mode failure when imple menting such safety mechanisms Common mode failure describes the case that faults occur in commonly shared signals The temporal and physical diversity as is presented in Sections 4 4 6 1 and
41. U1 is di 53 54 Chapter 6 Lockstep Related Patents rectly fed to the system before experiencing the delay stage and reach the CPU compare unit The role of the CPU2 is to feed the CPU compare unit with a second signal that under normal conditions without an error detection should be identi cal to the delayed output signal of the CPU1 Therefore the output of the CPU2 is not propagated to the system and it does not affect the internal states of memories or registers SYS_OUT ourtd DEL 1 m k ccu OUTe OUTI OUTI SYS_OUTI gt ai OUT N m1 CPU 1 CPU 2 SYS_0UT2 M IN2d DEL 2 IN17 IN2 n SYS_IN SYS_IN Figure 6 1 Prior art and Texas Instruments invention 29 In the prior art as shown in Figure 6 1 on the left side the CPU1 and CPU2 are executing the same program code that are coupled via the input SYS_IN line at the same time The output signals of the CPUs OUT1 OUT2 are fed to the CCU CPU Compare Unit and they are checked for match or mismatch Moreover they are propagated to the system via the output buses SYS_OUT1 and SYS_OUT2 On the right side of Figure 6 1 is depicted the current invention of Texas Instruments The input bus SYS_IN is directly connected to the CPU1 that is known as the master CPU while a delay stage DEL2 interferes before feeding the checker CPU CPU2 with the input signal The output of the CPU1 OUT1 is propagated directly to the system via the SYS_OUT bus but it
42. acteristics as reliability and availability Reliability describes the likelihood for a system to exe cute efficiently an assigned task in a specified amount of time Availability is the percentage of the entire system service life during it can be used to execute the assigned task 31 With the term Fault Tolerance is described the ability of a functional unit to con tinue the execution of a required task even in presence of faults or errors 19 In 2 1 Functional Safety and Fault Tolerance 9 terms of hardware the fault tolerance is its ability to continue performing a re quired task even when faults occur For instance hardware fault tolerance 0 is interpreted as the inability of the system to continue operating efficiently even in presence of a single error Generally fault tolerance N means that the system will lead to the loss of a safety function in occurrence of N 1 errors 43 2 1 1 TEC 61508 The challenge in electrical electronic or programmable electronic systems E E PE is to utilize a variety of safety functions to prevent or mitigate in a safe way a fault that may occur which would lead to an unexpected behavior of the system that could be catastrophic for the surrounding environment The assessment of risk to minimize this kind of failures in any sector is supported by the IEC 61508 standard series They are international standards for E E PE safety related systems including both hardware and software developed by the
43. are Texas Instruments The lockstep invention of Texas Instruments addresses a dual CPU architecture microcontroller where the outputs of the CPUs are compared 29 The two CPUs execute the same program code where a delay stage is applied to the output bus of the first CPU1 and a second delay stage is coupled to the input of the second CPU CPU2 The CPU compare unit is responsible for the comparison of the two signals the delayed output of the CPU1 and the output of the CPU2 The amount of delay stages must be predetermined and equal and it can be a number of clock cycles or fractions of clock cycles of the system clock The signals that are com pared in the CPU compare unit are data that belong to the same operation of the program code while the time shift is compensated before reaching the comparator The slight time difference in the execution of an instruction by the two CPUs re ferred as execution in a delayed lockstep ensure the detection of a common cause errors such as a short voltage drop or a glitch in the clock signal Practical im plementations have shown that the appropriate delay to detect the majority of the common cause errors is between 0 5 and 2 cycles of the system clock The CPU compare unit is adapted to report a match or mismatch in the comparison of the two signals In case of a mismatch the system should handle the error detection appropriately The system performance remains unaffected while the output of the CP
44. as Instruments for the operation of the safety features The profil ing time measurements of our experiments deviate from the timestamps that are presented in the manual of the SafeTI library 78 Chapter 7 Results TestType Profiling Time in us CCMR4F_SELF_TEST CCMR4F_ERROR_FORCING_TEST CCMR4F_SELF_TEST_ERROR_FORCING Figure 7 9 SafeTI CCM R4F Related Functions Time in usec 20 The CCMR4F_SELF_TEST in Figure 7 9 corresponds to the CPU Lockstep profil ing test 14 usec the CCMR4F_ERROR_FORCING_TEST to the CPU Error Forcing Test 15 usec and the CCMR4F_SELF_TEST_ERROR_FORCING to CPU Self test Error Forcing 6 usec respectively To sum up the current project covers all the potential experiments that could be conducted related to Lockstep architecture for the available hardware The restric tions in Texas Instruments implementation i e Lockstep mode is disabled in Debug Mode do not give the opportunity for more sophisticated experiments Finally the Hitex SafeTI Kit provides an innovative way of run time fault injection to the main MCU using a helper MCU but the fault detection time measurement is not a useful feature for our experiments while the values are not realistic and may vary in different consequent fault injections Chapter 8 Lockstep In Avionics This chapter focuses on the process of verifying a COTS hardware component in the special field of Avionics and how the Lockstep architecture
45. at abi softfp mfloat abi hard MULTILIB_DIRNAMES softfp fpu MULTILIB_OPTIONS mfpu fpv4 sp d16 mfpu vfpv3 d16 mfpu fpv5 sp d16 mfpu fpv5 d16 MULTILIB_DIRNAMES fpv4 sp d16 vfpv3 d16 fpv5 sp d16 fpv5 d16 MULTILIB_OPTIONS mbig endian MULTILIB_DIRNAMES big endian MULTILIB MATCHES march armv6s m mcpu cortex m0 MULTILIB MATCHES march armv6s m mcpu cortex m0plus MULTILIB MATCHES march armv6s m mcpu cortex ml 89 11 91 20 ifneq filter armv7 armv7 r armv7 a subst comma space with_multilib_list MULTILIB_REQUIRED mthumb march armv7 MULTILIB_REQUIRED mthumb march armv7 mfloat abi softfp mfpu vfpv3 d16 97 98 Appendix D ARM GCC 4 9 2015q1 Big Endian Patch MULTILIB_REQUIRED mthumb march armv7 mfloat abi hard mfpu vfpv3 d16 MULTILIB_REQUIRED mthumb march armv7 mbig endian MULTILIB_REQUIRED mthumb march armv7 mfloat abi softfp mfpu vfpv3 d16 mbig endian MULTILIB_REQUIRED mthumb march armv7 mfloat abi hard mfpu vfpv3 d16 mbig endian MULTILIB_OSDIRNAMES mthumb march armv7 armv7 ar thumb MULTILIB_OSDIRNAMES mthumb march armv7 mfloat abi hard mfpu vfpv3 d16 armv7 ar thumb fpu MULTILIB_OSDIRNAMES mthumb march armv7 mfloat abi softfp mfpu vfpv3 d16 armv7 ar thumb softfp MULTILIB_OSDIRNAMES mthumb march armv7 mbig endian armv7 ar bi
46. avior The three main characteristics of the CCM R4F module are 42 e Run time detection for errors e Self test capability e Error forcing capability 5 4 1 Architecture and CCM R4F module operation The lockstep architecture and the interconnection diagram of the CCM R4F mod ule is depicted in Figure 5 2 Two Cortex R4F CPUs are fed with identical input and their outputs are driven to the CCM R4F for a match check There is a delay of 2 cycles before the entrance of the input signal to the Checker CPU and an equal amount delay is coupled to the output of the Master CPU after it is propagated to the system This delay serves temporal diversity purposes that are described in Section 5 4 6 In case of a mismatch in the comparison of the outputs from master and checker CPU a compare error signal is generated to notify the Error Signaling Module After a reset 6 CPU clock cycles are required till the start of the compari son in order to ensure that CPU output signals have been set to a known state 42 The CCM R4F can operate in four distinct operating modes 1 1001D lockstep 2 self test 3 error forcing 4 self test error forcing 5 4 Lockstep implementation on TMS570LS31x 21x by Texas Instruments 43 CCM R4F 2 cycles delay CPU1CLK CPU 1 Master CPU CCM R4F or ESM compare Error Signaling Module CPU 2 Checker CPU 2 cycles delay CPU2CLK Figure 5 2 Interconnectio
47. bjectives of the Lockstep as a safety concept have been accomplished One aspect of extending this project could be the investigation of the low level architecture of such an MCU on transistor level In fact it is not possible to obtain such information from the semiconductor com panies while they protect their intellectual property and they do not provide so detailed information to the public In practical terms the next step for Lockstep MCUs is their integration in real world projects in other sectors that require functional safety More specifically this research was conducted under the supervision and support of Airbus Defence and Space in order to obtain deeper knowledge regarding the Lockstep architecture to highlight the pros and cons and how it could be integrated in avionic systems that need to fulfill extremely stringent safety requirements From the semiconductor companies point of view we would expect over the next few years a Lockstep architecture with two dissimilar cores It would be more sus ceptible to common cause failures while it is unlikely for differently implemented cores to fail in the same way Obviously there are challenges for this step basically due to the complexity of synchronizing dissimilar components i e synchronize two cores of different architectures to execute the same instructions at the same time Appendix A CCM R4F Register Tables Bit Field Value Description 31 17 Reserve
48. by pressing the pushbutton More specifically in HALCoGen the RTI and GIO drivers need to be set active see Appendix F for the HALCoGen configuration screenshots The RTI driver is utilized to integrate a small delay configured for 500 ms delay in the Compare 0 Period of RTI1 Compare after the button is pressed to avoid the button bouncing The GIO driver is enabled in order to use the GIOA7 pin of the evaluation board that is assigned to the programmable push button Additionally the following interrupt channels should be activated e 0 for ESM High e 20 for ESM Low e 2 for RTI Compare 0 e 9 for GIO Int A Finally the interrupt for the GIOA7 pin Bit 7 in GIO PortA tab should be enabled After setting correctly this configuration to the HALCoGen the code should be generated and we are ready to develop the application in CCS The main advantage of the generated code by HALCoGen is that it generates all the necessary start up code for the initialization of the controller and driver specific code that the user can easily configure via its graphical user interface For example a file sys_core asm is generated automatically to set both of the cores to an identical state i e their 72 Chapter 7 Results register values as is a prerequisite for the efficient use of the Lockstep diagnostic see Section 5 4 2 The scope of this experiment is the CCM R4F configuration It is a simple mod ule in terms of configuring and manipulati
49. can be tested using HW BIST and a Memory Protection Unit is employed to assure protection against deterministic errors in the application software 16 29 30 Chapter 4 Safety Features of Hercules Family 4 1 CPU Logic Built In Self Test LBIST Self Test Controller STC Due to the high complexity of a Cortex R4F CPU TI does not recommend several CPU software based tests that are performed by other safety critical microcon trollers Though a hardware based CPU Logic Built In Self Test LBIST Self Test Controller STC scheme is proposed for providing high diagnostic coverage on the CPUs operating in Lockstep at a transistor level The STC is utilized to test the CPU core using the deterministic LBIST controller as the test engine This hardware based solution is more efficient in terms of required memory detection capability and execution time of the tests 35 The LBIST tests are triggered by the software where the developer may select to run all or part of them based on the execution time The execution time that the application sets for the LBIST tests in each safety critical loop can be allocated to the LBIST diagnostic The version used in TMS570LS31x 21x family is capable of running the LBIST tests on both cores simultaneously 40 Running the LBIST STC is much more costly in terms of power than a normal ap plication due to the increase of level of transistor switching per clock cycle that is required The user is able to redu
50. ce the CPU clock for the duration of the tests It is a trade off between current consumption and time to complete the execution of the diagnostic tests Before the execution of the LBIST test a context save is performed and the CPU is isolated from the rest of the system The remainder logic of the system does not interrupt its operation After the completion of the test execution a CPU reset is required the system is notified that the reset was performed due to the com pletion of this test through the SYSER register and the normal operation of the system continues by restoring the CPU context Essentially a context saving is not required when the STC runs on start up while the start up configurations will be performed The LBIST provides the option to test a proper operation of the diagnostic The testing process is deterministic and in order to ensure that a test is performed within expected time a timeout counter is employed to detect a potential non deterministic situation that indicates an error Moreover the STC can operate in an error forcing mode where an input error is injected to verify the right functionality of the module in the error detection and propagation to the system level The STC supports 24 intervals that the test process can be divided into and run a 4 1 CPU Logic Built In Self Test LBIST Self Test Controller STC 31 few intervals at a time The user is able to run either all the intervals together or in sli
51. ces specifying if the STC will continue from the next interval or restarting from interval 0 The table in Figure 4 1 depicts the coverage percent and time that is required for each interval in a typical case of the device running at HCLK 180 MHz VCLK 90 MHz and STCCLK 90 MHz Intervals Test Coverage Test Cycle Test Time us 0 0 0 0 1 62 13 1365 15 17 2 70 09 2730 30 33 3 74 49 4095 45 50 4 77 28 5460 60 67 5 79 28 6825 75 83 6 80 90 8190 91 00 z 82 02 9555 106 17 8 83 10 10920 121 33 9 84 08 12285 136 50 10 84 87 13650 151 67 11 85 59 15015 166 83 12 86 11 16380 182 00 13 86 67 17745 197 17 14 87 16 19110 212 33 15 87 61 20475 227 50 16 87 98 21840 242 67 17 88 38 23205 257 83 18 88 69 24570 273 00 19 88 98 25935 288 17 20 89 28 27300 303 33 21 89 50 28665 318 50 22 89 76 30030 333 67 23 90 01 31395 348 83 24 90 21 32760 364 00 Figure 4 1 Test Coverage and Duration 40 e HCLK Clock domain used by the high speed system modules Flash mem ory interfaces TCRAM interface Error Signaling Module ESM DMA e VCLK Clock domain used by some system modules VIM peripheral mod ules accessed via the Peripheral Central Resource PCR controller and all other register interfaces also accessed via the PCR e STCCLK Determines the self test execution speed by dividing HCLCK 32 Chapter 4 Safety Features of Hercules Family
52. chitecture Lockstep can be beneficial for avionic product solutions but due to the na ture of the certification process in this field it cannot be stated that such an MCU is targeting systems of a specific DAL criticality level The content of this report is freely available but publication with reference may only be pursued due to agreement with the author Contents List of Figures ix Preface xvii 1 Introduction 1 1 1 Airbus Defence and Space 05 4465 oes da lee eee ee ees 3 1 1 1 Creation of the Company 0 0 4 1 1 2 The Electronics Business Line 044 4 1 1 3 TEOHDS5 Department tives 2 shee ese eae Ss By Bie 4 1 2 Problem Statement 0 0 0000 eee eee eee 5 2 Functional Safety and Certification in Embedded Systems 7 2 1 Functional Safety and Fault Tolerance 2 eee ae a 4 7 SLI TECOS etd oo Sedu ie a Bae ik E E al ee S 9 2a ISO26262 e ren dh ee Sn le we Se Re ok he Y 10 2 1 3 Avionics Certification DO 178C and DO 254 11 2 1 4 The Meaning of 1O0LD 2 16 4 8 5 2 Ade g OPA ome 15 2 1 5 Texas Instruments Approach To Functional Safety 17 3 Embedded Software Development 21 3 Hard ware s est oe ose ge wae Ee ee EE PS e h 21 3 1 1 Hercules TMS570LS3137 microcontroller by Texas Instruments 21 3 1 2 SafeTI Hitex Safety Kit TMS570LS31x 21x 24 3 2 Sofware see ns ec Re en Ge ne ccs Sy is ese a ok B 26 3 2 1 Code Composer Stu
53. cifically the redundancy in the architecture of MPC564xL family of freescale is extended to more modules than only the core The term in the safety manual of the MPC5643LSM is called Sphere of Replication SoR and stands for the repli cation of the following components in order to detect permanent dormant latent and transient faults e e200z4 core including Memory Management Unit e Enhanced Direct Memory Access eDMA e Interrupt Controller INTC e Crossbar Switch XBAR e Memory Protection Unit MPU e Flash Memory Controller PFlashC e Static RAM Controller GRAMC e System Timer Module STM e Software Watchdog Timer SWT e Peripheral Bridge PBRIDGE 5 4 Lockstep implementation on TMS570LS31x 21x by Texas Instruments 41 MPC564xL Block Diagram E Cores Figure 5 1 MPC564xL Block Diagram 12 In LSM mode all the aforementioned components operate the same operations and transactions and the supervisor module for the smooth and equal execution in sync is called Redundancy Control Checker Units RCCU The role of the RCCUs is the detection but not the prevention of failures at the point where the outputs of the modules in sync are merged The isolation of the fault is performed with an error signaling by the Fault Collection and Control Unit FCCU which therefore allows the device or the application to perform the appropriate operations Nevertheless in the current research the main concern is the r
54. component level requirements of IEC 61508 and ISO 26262 For DO 254 the situation is more complicated It requires evidence of the internal component design including source code of embedded IP components As a re sult confidentiality challenges for the semiconductor companies exist while they cannot expose these information to the public Thus most DO 254 compliant sys tems consist of non compliant components The certification is applied on system level An important point to note for the Hercules family of microcontrollers is the miti gation of a detected error In previous chapters we presented the actions for detect ing an error but not how to deal with a presence of it The main reason is that there is not a certain answer how to deal with a detected error The Hercules devices are not supposed to be fault tolerant The documentation of these devices including the technical reference manuals and safety analysis documents have an indication of FT 0 Fault Tolerance Essentially the main focus of the Hercules family is the efficient detection of an error without though maintaining the operation of the system afterwards Then it is strongly dependent on the safety requirements of the system how each error should be handled Thus the real functional safety is implemented at system level Several safety features of these MCUs are provided but the system design and the implementation of the safety functions are the key elements to achieve
55. ctures e Two core processors in which both cores host the same software and execute that 79 80 Chapter 8 Lockstep In Avionics same software in Lockstep so that their outputs based on identical input data can be compared for use in a safety critical application we followed thoroughly the list of requirements that are provided because as we have already noticed each Lockstep implementation may be different in terms of shared resources system availability and so on This analysis contributes to a potential certification process in a system that will employ such an MCU Certification experts from Airbus Defence and Space have the knowledge and the expertise to follow the guidelines dealing with single core processor systems Therefore this project may be the spark for the integration of Lockstep architecture MCUs in avionic systems 8 2 Example Lockstep Use Case It should have already become clear that it is not the optimal way to present a specific example of a project without having performed a functional hazard anal ysis and clear requirements Nevertheless a simplified example application of a Lockstep architecture presented in a very abstract way could help the reader to understand how such an MCU could be employed in a safety critical system and the extra benefits that this could offer Beside the fact that the rate of aircraft engine failures have dramatically decreased at the point that flight crews is most like
56. d Reads return zeros and writes have no effect 16 CMPE Compare Error Read in User and Privileged mode Write in Privileged mode only Read CPU signals are identical Write Leaves the bit unchanged Read CPU signal compare mismatch Write Clears the bit 15 9 Reserved Reads return zeros and writes have no effect STC Self test Complete Note This bit is always 0 when not in self test mode Once set switching from self test mode to other modes will clear this bit Read Write in User and Privileged mode Read Self test on going if self test mode is entered Write Writes have no effect Read Self test is complete Write Writes have no effect 7 2 Reserved Reads return zeros and writes have no effect STET Self test Error Type Read Write in User and Privileged mode Read Self test failed during Compare Match Test if STE 1 Write Writes have no effect Read Self test failed during Compare Mismatch Test if STE 1 Write Writes have no effect STE Self test Error Note This bit gets updated when the self test is complete or an error is detected Read Write in User and Privileged mode Read Self test passed Write Writes have no effect Read Self test failed Write Writes have no effect Figure A 1 CCMSR Resgister 42 89 90 Appendix A CCM R4F Register Tables l Bit Field Value Description 31 4 Reserved 0 Read
57. d as a driver code generator tool that insures a stable HAL code 3 2 3 GNU ARM Toolchain As an alternative toolchain to the CCS integrated tools the GNU ARM was em ployed The GNU ARM toolchain provides GCC compiler support for develop ment on embedded ARM processors with the contribution of ARM employees Specifically it supports Cortex R Cortex M processor families covering Cortex MO Cortex M3 Cortex M4 Cortex M0 Cortex M7 Cortex R4 Cortex R5 and Cortex R7 14 28 Chapter 3 Embedded Software Development The executables for the GNU ARM toolchain that are officially provided do not support the big endian BE32 word invariant format that is required for the Cortex R4F CPU of the TMS570LS3137 Building manually the binaries that are also pro vided from GNU ARM community was necessary applying first a diff patch see Appendix D on the source code After applying the patch to the binaries that are provided from the GNU ARM webpage 14 the process for building them is as described in the Appendix E The reason for the investigation of an alternative toolchain is a potential future need of the company to use the GCC compiler in a project that employs such a microcontroller Additionally the type of license for the CCS integrated toolchain varies regarding the specific model of the microcontroller For some models is pro vided free while for others it has code size limitations in a free version Chapter 4 Safety Features
58. d free step processor modes Pat US 8 826 288 B2 2014 Andre Roger Munlch DE Antonio Vilela Mering DE Method and system for fault containment Pat US 8 819 485 B2 2014 M Baumeister Using Decoupled Parallel Mode for Safety Applications User s Manual Freescale T Ferrell C Spitzer U Ferrell ed Digital Avionics Handbook 2014 Code Composer Studio wiki page Available http processors wiki ti com index php GSG CCSv5 Overview Accessed 08 06 2015 Cortex R4 and Cortex R4F Technical Reference Manual r1p3 ARM Dhrystone Benchmarking for ARM Cortex Processors User s Manual ARM ECC Handling in TMSx70 Based Microcontrollers User s Manual Texas Instru ments February 2011 Flight Operations Briefing Notes Handling Engine Malfunctions User s Manual Airbus Defence and Space Christopher Temple Munich DE Florian Bogenberger Poing DE Method And Apparatus For Handling An Ouput Mismatch Pat US 8 373 435B2 2013 Freescale MPC564xL Family 32 bit Architecture MCUs Brochure Freescale MPC564xL Page Available http www freescale com webapp sps site prod summary jsp code MPC564xL Accessed 12 05 2015 Functional Safety User s Manual IEC International Electrotechnical Commis sion GNU Tools for ARM Embedded Processors Available https launchpad net gcc arm embedded Accessed 14 06 2015 HALCoGen wiki page Available http processors wiki ti com index ph
59. de The following two tests are generated by CCM R4F in this order to verify its proper functioning e Compare Match Test e Compare Mismatch Test 44 Chapter 5 Lockstep Each of these test patterns is applied on both of the inputs of CCM R4F and clocked for one cycle with a total duration of the self test at 3615 CPU clock cycles During the self test which is a non interruptible task both CPUs can work independently of the test execution but the comparison of their output signals is skipped while the module is busy with its self diagnostic test Furthermore the self test checks only the integrity of the CCM R4F module and not its memory mapped register controls Compare Match Test Four different test patterns are fed to the CCM R4F in the compare match test These patterns contain identical vectors that are fed to both of the input ports of the module simultaneously to check for output equality In case of discrepancy the test is terminated the self test error flag is set and the self test error signal is generated The table in Figure 5 3 depicts the vectors that fed to the input ports CPU 1 Signal Position CPU 2 Signal Position 7 Jafo ns 7 6 54 2 ace oe ae ele i a 2 ofo ofo ofo o fo os o afo 1 o a o 1 o oa 1 os olijof ifo ij o 1 os o Figure 5 3 Compare Match Test vectors 42 The duration of this test is 4 cycles where e 1st cycle both CPU signal ports are fed with 1s e 2nd cycle
60. de generation and the TI compiler 3 1 Hardware The hardware that was available for the conduction of the experiments in this project is described in this section The TMS570LS3137HDK evaluation board from Texas Instruments and the SafeTI Hitex Safety Kit TMS570LS3137 by Hitex in cooperation with Texas Instruments are briefly demonstrated 3 1 1 Hercules TMS570LS3137 microcontroller by Texas Instruments TMS570 are the first Microprocessor Control Units MCU based on ARM Cortex R4F based floating point that meet the IEC61508 SIL3 standards This family of mi croprocessors is designed specifically to comply with automotive and transporta tion safety standards while it consists of dual Cortex R4F processors operating in Lockstep System wide protection is insured by the error detection in the proces sor and the memory modules Both single and double precision is achieved with the integrated high performance Floating Point Unit FPU They are targeting to safety applications of transportation industry including 39 21 22 Chapter 3 Embedded Software Development e Automotive chassis and stability control Electric power steering Hybrid and electric vehicles Aerospace e Railway communications e Off road vehicle engine controlp The CPU offers 1 66 DMIPS MHz and can provide up to 298 DMIPS with con figurations that can run at 180MHz The big endian BE32 word invariant format is supported on this device The TMS570LS
61. dependently exe cuting different instructions and operating as a multi core system To conclude this patent presents a solution in the case of an internal soft error in one of the processors that operate in lockstep In case of an exception signaling in one of the processors as a response for example in a cache error the proces sor will execute a different set of instructions i e interrupt service routine The simultaneous occurrence of such an error in both of the processors is rare and as a result the lockstep operation will fall out of the correct functionality The imi 6 3 Method and system for fault containment Infineon Technologies AG 57 tation of the erroneous state in the processor that operates without experiencing the error through an error signaling interface in order to keep all the utilized pro cessors in a sync manner is the proposal of Freescale that is analyzed in this patent 6 3 Method and system for fault containment Infineon Tech nologies AG The current invention by Infineon Technologies AG enhances the prior art delayed lockstep implementation with a method to prevent the propagation of the detected error to the system 2 As shown in Figure 6 3 on the upper side which is substantially the implementa tion of Texas Instruments delayed lockstep mechanism see Section 6 1 an error is detected after the amount of delay unit2 usually equal to delay unit1 and amounts between 0 5 and 2 5 clock cycles when th
62. dio CCS and Tools Suite 26 32 2 HAL COG odie a lacs Sol eek Seed eed A ak eee ee 27 3 2 3 GNU ARM Toolchain 00504 27 4 Safety Features of Hercules Family 29 4 1 CPU Logic Built In Self Test LBIST Self Test Controller STC 30 4 2 Programmable Built In Self Test PBIST Module 32 4 3 Error Correction Code ECC on Flash and SRAM 34 vi Contents Lockstep 37 51 W a tis Lockstep eraa Ne So a Shi el tet Sa ull Be ote See eae tees tee 38 5 2 Reasons to use Lockstep 2 v 04 gids s 6p Stal fe SL Ro Pie ld 38 5 3 Lockstep implementation by Freescale 0 39 5 4 Lockstep implementation on TMS570LS31x 21x by Texas Instruments 41 5 4 1 Architecture and CCM R4F module operation 42 5 4 2 1oo01D Lockstep Mode onnaa eS Ghee nae bs OG eR ceed 43 543 aeletest modes s onepu ee te aces Da ese Soe Celik 43 5 4 4 Error Forcing Mode 2 2684 s be ea ew eho Se wes 45 5 4 5 Self Test Error Forcing Mode 04 45 5 4 6 Mitigation of common cause failures in CPU 45 5 4 7 Lockstep behavior in CPU Debug mode 47 5 4 8 Error Signaling Module ESM 47 5 5 Lockstep implementation on TMS570LC43x by Texas Instruments 48 5 5 1 Vectored Interrupt Manager VIM 50 Lockstep Related Patents 53 6 1 Delayed lockstep CPU compare Texas Instruments 53 6 2 Error detection and communication of an error l
63. e The second experiment presents the behavior of the CPU through a GUI that is provided with the HiTex SafeTI kit It calculates the time that is required from an error occurrence till the error detection The purpose of the experiments is to demonstrate how the Lockstep architecture can be employed in a safety critical application on the TMS570LS3137 microcon troller and to calculate the time that an error will be detected The hardware that was used is the TMS570LS3137 Hercules Development Kit see Section 3 1 1 and the SafeTI Hitex Safety Kit TMS570LS3137 see Section 3 1 2 7 1 1 Error Forcing Experiment on TMS570LS3137HDK The purpose of the experiment is to give to the reader an overview of how the TMS570LS3137 microcontroller has to be configured in order to operate efficiently in Lockstep mode Regarding the hardware the TMS570LS3137 Hercules Develop ment Kit was used to create a simplified application with the programmable push button and the LEDs that are integrated on it In terms of software the HALCoGen 04 03 00 was used to configure and generate the initialization code for the micro controller and the peripherals that were employed in this experiment and the Code Composer Studio IDE 6 1 0 for the development and debugging partially the ap plication 69 70 Chapter 7 Results From the aforementioned in the Lockstep implementation of TMS570LS3137 and CCM R4F the Lockstep is active in loo1D Lockstep Mode by default
64. e companies globally with revenues of around 14 billion per year and approx 40 000 employees operating in more than 170 locations worldwide The Chief Executive Officer CEO of Airbus Defence and Space is Bernhard Gerwert Airbus Defence and Space focuses on core businesses Space Military Aircraft Missiles and related systems and services AIRBUS DEFENCE amp SPACE Figure 1 1 Airbus Defence and Space logo The duty of Airbus Defence and Space is to develop cutting edge reliable products in the field of defence and space The main concern of the company is to help governments and institutions to protect natural resources societies and individual freedom The aircrafts satellites and services help to monitor climate and crops and to secure borders Airbus Defence and Space consists of four Business Lines Military Aircraft Space Systems Communications Intelligence amp Security CIS and Electronics Among the innovative products are the transport aircraft A400M the military jet Eurofighter and in the framework of the Airbus Safran Launcher joint venture the Ariane launcher 4 Chapter 1 Introduction 1 1 1 Creation of the Company In 1997 America appeared as the global flagship of the Aeronautics Space and Defence industry Thus in order to counterbalance the American leadership the French German and Spanish governments asked their space and defence manu facturers to present a project of fusion into a single entity
65. e of the puzzle of the safety features that modern micro controllers provide to help the application designers to achieve functional safety and to comply with the required certification standards However currently it is not broadly known in the avionics community and one aspect of this project is to investigate how it could be employed in the production of avionic systems This report contributes to the academic community by describing a safety feature that is used in the industry but lacks of reliable scientific documentation sources Furthermore while this project was conducted under the support and supervision of the Airbus Defence and Space it was easier for the author to obtain information by semiconductor companies due to the established partnership of the company with the majority of them This project is intended for electrical and embedded system students and for en gineers interested in safety around embedded systems It could be used as infor mation material and guideline to employ a Lockstep architecture microcontroller 1 1 Airbus Defence and Space 3 in any embedded system 1 1 Airbus Defence and Space Airbus Group is a European global leader in the aeronautics space and defence industry Founded in July 2000 under the name European Aeronautics Defence and Space EADS is currently Europe s leading company in the defence and space industry It is the second largest space company in the world and one of the top 10 defenc
66. e output signals of the CPUs are checked in the comparator unit and in case of a mismatch the system is getting noticed As a result till the detection of the error it is possible that erroneous data in a write transaction are propagated to a system interconnect or a local SRAM The embod iments of this patent are targeting to the prevention or containment to this error propagation to the system 58 Chapter 6 Lockstep Related Patents 10 14 m 30 DELAY aha UNIT 2 40 wae COMPARATOR _ i3 3 UNIT DELAY UNT cpus L 12 EI 7 m2 5 14 ERROR 10 CPU 1 UT gt BLOCKING gt UNIT 30 16 40 EA COMPARATOR UNIT 13 DELAY UNIT 1 CPU Re 12 Figure 6 3 Prior art delayed lockstep implementation and Infineon s enhancement 2 In a first approach for corruption protected delayed lockstep CPUs a method is utilized as is demonstrated on the bottom of Figure 6 3 A further delay unit2 is placed directly to the output signal of CPU1 and this delayed version is fed to the comparator unit for a match check with the output of the CPU2 before it is fed to the system in subsequent stages In this regard the comparator unit has the time to generate an error signal in case of a mismatch before the corrupted data propagate to the system In this way the delay unit2 may delay a signal that is related to a write transaction and prevent an erroneous wri
67. earch questions that we called to answer covered the Lockstep concept drawing all the available information that after a big effort we managed to collect vV What is Lockstep in terms of embedded systems see Chapter 5 1 V What are the reasons for using Lockstep as an error detection mechanism What kind of errors can it detect see Chapter 5 2 y How to use Lockstep see Chapter 7 1 1 9 3 Future Work 87 Y How Lockstep should be treated in the certification process of an avionic product see Chapter 8 1 V What are the limitations and drawbacks of a Lockstep architecture see Chapter 9 1 Due to the kind of errors that Lockstep is targeting the experiments were restricted while we do not own the equipment to artificially produce such errors Addition ally as is already mentioned in Section 5 4 7 the Lockstep is automatically disabled in debugging mode Thus it was not possible even to check the behavior of the system i e register values when an error forcing was performed Nevertheless the current project is a good base for a future researcher or a designer of safety critical systems to comprehend the Lockstep as a safety concept and to identify the advantages and disadvantages of such an architecture Every potential aspect of safety should be thoroughly considered while the protection of environment and human lives should be a top priority for everyone 9 3 Future Work From the academic point of view the main o
68. ection Texas Instruments provides the SafeTI design package that helps their customers to build applications capable of easily achieving compliance to safety functional standards The SafeTI design package consists of 5 components 32 1 Safety related semiconductor components 2 Safety related documents tools and software 3 Complementary embedded processing and analog components 4 Quality manufacturing process 5 Safety development process In Figure 2 7 is depicted a list of the currently available SafeTI devices and the integrity levels according to what they are developed respectively The two categories of errors that the functional safety has to deal with are known as systematic and random faults Random faults are inherent to the application and the metrics that already discussed to determine the failure rate are not ap plicable to them Random errors could occur by a permanent failure of a circuit 2 1 Functional Safety and Fault Tolerance 19 SafeTl SafeTl SafeTl Safety integrity products devices functional safety level TMS570LS31x 21x SafeTl 26262 ASIL D SafeTI 61508 SIL 3 TMS570LS12x 11x SafeTl 26262 ASIL D SafeTI 61508 SIL 3 TMS570LS04x 03x SafeTl 26262 ASIL D SafeTl 61508 SIL 3 TMS570LS20x 10x SafeTl 61508 SIL 3 RM48x SafeTI 61508 SIL 3 Microcontrollers RM46x SafeTl 61508 SIL 3 RM42x SafeTI 61508 F28M35x SafeTI 61508 TMS470M Safe
69. endix G Requirements for Multi core Processors Processor Errata The applicant shall assess the processor errata Today 25th of May 2015 the last errata document that is data provided by the manufacturer provided by Texas Instruments website http www ti com product TMS570LS3137 technicaldocum ents was published on March 2015 spn222 pdf Silicon Errata Revision D The applicant shall document their processes for Texas Instruments website provides an Alert Me continuing to obtain errata from the manufacturer mechanism where the user is notified whenever there is an throughout the development and service life of update to any of the technical documents including errata the MCP installation of the microcontroller of interest The applicant shall document their processes for In case of an upcoming error in an updated errata document resolving those problems in the same manner as the development team should instantly meet and test if the any other reported problems error that is described could affect the application Testing cases or other documents of proving the reliability of the system regardless the error presence should be filled and submitted to the certification office Software Hypervisors and MCP Hardware Hypervisors Z Z O Z O Z o o O The applicant shall state in their software AEH Not Applicable plans or other deliverable documents whether or not they intend to use a software hypervisor or a MCP hard
70. eplication of the core that is a common point between the various semiconductor companies and the purpose is to reveal how this redundancy contributes in cooperation with the other safety features that each microcontroller provides to comply with the stan dards for the safety critical systems 5 4 Lockstep implementation on TMS570LS31x 21x by Texas Instruments In this section we describe the lockstep implementation on TMS570LS31x 21x of the Hercules family of microcontrollers The current research was based on a Texas Instruments TMS570LS3137 Hercules Development Kit and focused on the imple 42 Chapter 5 Lockstep mentation the architecture and the capabilities that are presented in this section Safety critical systems require a run time detection of errors that are derived from a faulty behavior on the processor level Two instances of Cortex R4F processors operate in a delayed lockstep see Section 6 1 in order to prevent errors that may lead to unsafe operating conditions The CPU Compare Module for Cortex R4F CCM R4F is employed in the Texas Instruments architecture to compare the out put signals of the two processors that run in a 1001D one out of one with diag nostics In case of a mismatch an error is signaled to the Error Signaling Module Furthermore a self test method for the CCM R4F module is applied at boot time to detect any potential hardware errors on the module itself that could lead the module to an unintended beh
71. er Automotive Safety Integrity Level Built In Self Test Control Area Network Control and Monitoring device Error Correction Code Logic Built In Self Test Programmable Built In Self Test CPU Self Test Controller Module CCMR4F Key Register CCMR4F Status Register Code Composer Studio Core Electronics Component Complex Electronic Hardware Chief Executive Office Communications Intelligence and Security Commercial Off The Shelf Direct Memory Access Decoupled Parallel Mode xi DRAM EADS eDMA EEPE EEPROM EMIF ESD ESM ESP EUROCAE FAA FCCU FMPLL FT GPIO HalCoGen HAL HTU I2C IDE IEC INTC ISR LED LIN LSM Direct Random Access Memory European Aeronautical Defence and Space enhanced Direct Memory Access Electrical Electronic or Programmable Electronic Electrically Erasable Programmable Read Only Memory External Memory Interface ElectroStatic Discharge Error Signaling Module Electronic Stability Program European Commission for Civil Aviation Equipment Federal Aviation Administration Fault Collection and Control Unit Frequency Modulated Phase Locked Loop Fault Tolerant General Purpose Input Output HAL Code Generator Hardware Abstraction Layer High end Timer Transfer Unit Inter Integrated Circuit Integrated Development Environment International Electrotechnical Commission INterrupt Controller Interrupt Service Routine Light Emitting Diode Local Interconnect Network LockStep Mode MCU M
72. f Lockstep by Freescale is presented with significant differences than the one by Texas Instruments The purpose is to highlight a different way that another leading company in the world of safety critical systems targets the functional safety of such applications 37 38 Chapter 5 Lockstep 5 1 What is Lockstep The term Lockstep in the field of the embedded systems describes an error detection mechanism employed by semiconductor vendor companies that con struct systems for safety critical applications Safety reliability and functional safety requirements that are prerequisites in avionic and automotive electronic sys tems are often satisfied with a hardware redundant architecture 10 Lockstep is a method to monitor and verify the correctness in the operation of a system 44 by employing at least two processing modules performing identical operations Typically the processors are synchronized to the same state during the system start up The same tasks are independently processed by the processors operating in lockstep and a real time comparison of the output signals is performed In case of discrepancy in the output signals an error signal is generated and further exe cution of the task is inhibited Several mitigation techniques can be employed to recover from the detected error by leading the system to a safe state where poten tially dangerous operations cannot be executed 10 5 2 Reasons to use Lockstep The lockstep c
73. f one more CPU three CPUs running in lockstep mode and with a majority voting system to detect the defect CPU and recover while the functional CPUs continue with the execution of the 64 Chapter 6 Lockstep Related Patents software Although this method is robust and efficient the cost the power con sumption and area on the silicon are increased The invention of Infineon involves two CPUs operating in lockstep master and checker in a delayed lockstep or not and a state buffer where the state of the mas ter CPU is stored In case of a mismatch in the comparison of the output signals of the two CPUs a control signal is generated and the stored state is loaded to both of the CPUs from the state buffer Control signals 7 Eror 110 102 signals ap eee 114 114d y CPU Cor y l i y qg d Main e an LEN Jaa AS lT a Aj e t gt bagel ae i S tod 5 E 106 ss E gs BS te E DAE On EOL __ State Butler e S E amp igi x hae gt Z Contro signals Oo 15 oo assssnnansnnnnnstinsnnainnsiicnmnnmnsinnannaninassscoratsansnsnanseressee aaiding oi io i 112b oo ae Fi E q e pit 28 State Control Logic a ui S a Checker CPU Core Dt t ay rT lt 3 f y i E i Control q 4a 114b 104 signals 116 Figure 6 10 Schematic diagram of the Infineon s patent introducing the state buffer 27 As shown in Figure 6 10 each of the CPU cores include a state control logic Var ious i
74. gE thumb MULTILIB_OSDIRNAMES mthumb march armv7 mfloat abi hard mfpu vfpv3 d16 mbig endian armv7 ar bigE thumb fpu MULTILIB_OSDIRNAMES mthumb march armv7 mfloat abi softfp mfpu vfpv3 d16 mbig endian armv7 ar bigE thumb softfp MULTILIB_REUSE mthumb march armv7 marm march armv7 MULTILIB_REUSE mthumb march armv7 mfloat abi softfp mfpu vfpv3 d16 marm march armv7 mfloat abi softfp mfpu vfpv3 d16 MULTILIB_REUSE mthumb march armv7 mfloat abi hard mfpu vfpv3 d16 marm march armv7 mfloat abi hard mfpu vfpv3 d16 MULTILIB_REUSE mthumb march armv7 mbig endian marm march armv7 mbig endian MULTILIB_REUSE mthumb march armv7 mfloat abi softfp mfpu vfpv3 d16 mbig endian marm march armv7 mfloat abi softfp mfpu vfpv3 d16 mbig endian MULTILIB_REUSE mthumb march armv7 mfloat abi hard mfpu vfpv3 d16 mbig endian marm march armv7 mfloat abi hard mfpu vfpv3 d16 mbig endian endif Appendix E How to Build the GNU ARM Toolchain on Ubuntu E 1 Build GNU Tools on Ubuntu 8 10 The following instructions are located in the GNU ARM Toolchain website 14 and are integrated into this report due to the frequent changes that occur by the active GNU ARM community E 1 1 Install Ubuntu Ubuntu 8 10 ISO image is available from http old releases ubuntu com releases 8 10 ubuntu 8 10 desktop i386 iso You can install it as a native s
75. gcc arm embedded toolchain The command to decompress it looks like tar xf prebuilt native tools tar lzma lzma Please be noted that those prebuilt tools are for Ubuntu 8 10 32 bit and not suitable for any other build platforms For those working on other build platforms please either prepare your own build tools and use them through option build tools of build script or just use the ones from your system by not specifying the build tools option E 1 3 Build GNU Tools for ARM Embedded Processors If you download and decompress the prebuilt tools successfully you have set up the building environment You can now start to build the toolchain by yourself with below commands Copy the src release package into toolchain directory cp gcc arm none eabi 4_9 2015q1 20150306 src tar bz2 toolchain Prepare source codes cd toolchain tar xjf gcc arm none eabi 4_9 2015q1 20150306 sre tar bz2 cd gcc arm none eabi 4_9 2015q1 20150306 sre find name tar xargs I tar xf cd 102 Appendix E How to Build the GNU ARM Toolchain on Ubuntu Start building the toolchain Can specify skip_steps mingw32 option to skip building windows host toolchain and if specify that option when building prerequisites you have to specify it when building toolchain too build prerequisites sh build_tools YOUR_PATH build toolchain sh build_tools YOUR_PATH After this you can cd into
76. ge to the die NORTH FLIP_WEST NJ Fl r N Figure 5 5 Example of orientation of the cores 35 In Figure 5 5 a placement example of two cores on the silicon is illustrated where the one is north oriented and the other flipped west The common cause failures that physical diversity is called to restrict might be due to manufacturing flaws or even susceptibility to radiation due to the direction of some of the components used in the core design A simple example for a potential manufacturing flaw that might be common to both cores without the physical diversity suppose that there is a geometry of a certain component that was prone to collecting moisture and it 5 4 Lockstep implementation on TMS570LS31x 21x by Texas Instruments 47 was processed in a way that this component was traveling parallel to some chem ical spray in the process This might lead to trapped materials at this component in the silicon but if the components were rotated and flipped it would make the likelihood of trapping contaminates less likely In this way making the two cores physically diverse can keep both cores from having the same defective component 5 4 7 Lockstep behavior in CPU Debug mode During a debug session where the execution of the code is halted several asyn chronous halting events occur with a possibility to lead to a loss of Lockstep in dication of non existing errors For this reason when halting debug events are detected the CCM R4F is a
77. hapter 5 Lockstep Low Priority Low Priority Interrupt Handling Interrupt a error_group1 O fo lt Interrupt Enable 2 ae High Priority High Priority 5 Interrupt Priority Interrupt Handling temp T 3 a 3 El T error_group2 Q g s 3 5 ERROR Pin Enable 9 g amp 3 e 2 Error Signal 8 error_group3 Handling Figure 5 6 ESM Block Diagram 42 As already mentioned in the different operating modes of the CCM R4F all errors that are related to this module fall into a Group2 category in the ESM This means that the interrupt and error pin behavior is fixed i e an interrupt will always be generated and the ERROR pin will be asserted 5 5 Lockstep implementation on TMS570LC43x by Texas In struments TMS570LC43x is currently the most recent microcontroller of the Hercules family The fist major difference is that the core has been upgraded to Cortex R5F which provides some enhancements to improve the performance of the device The main advantage of this migration to the Cortex R5F is a lower latency in accessing pe ripherals and a hardware support for flushing lines in cache that have been mod ified in the main memory The Hercules products with an integrated Cortex R4F core are built with only tightly coupled memory and by default no cache i e cache memory is an option in these devices With the integration of cache into the recent microcontrollers a higher frequency is achieved 38
78. he conduction of the experiments and the evaluation of the Lockstep architecture are presented in this section Code Composer Studio CCS for developing and debugging HALCoGen for configura tion of the peripherals and automatic code generation and the GNU ARM toolchain that was used as an alternative to the CCS integrated toolchain are briefly pre sented 3 2 1 Code Composer Studio CCS and Tools Suite Code Composer Studio is an Eclipse base integrated development environment IDE supporting Texas Instrument s portfolio of microcontrollers and embedded processors It is used for developing and debugging of embedded applications It provides features such as source code editor project build environment debugger profiler and simulators 5 Additionally an optimizing C C compiler is inte grated The debugger provides conditional or hardware breakpoints that are based on lo cal variables registers or C expressions Moreover a memory window is provided for all levels of memory to inspect potential issues such as cache coherency prob lems The debugger even supports complex multi core or multi processor systems providing synchronous operations and global breakpoints Measurement of code performance and efficient use of resources in debugging or development mode is achieved through the profiler Instruction cycles of a func tion cache misses hits pipeline stalls and branches are events that can be mea sured and monitored by the
79. he cores Temporal and physical diversity are practices to mitigate this common mode fail ure and the details of their implementation on the TMS570LS31x 21x family of TI 46 Chapter 5 Lockstep microcontrollers are described in this section 35 Temporal diversity The first measure that is already presented in the Texas Instruments patent see Section 6 1 is related to timing and guarantees a temporal diversity between the cores The term delayed lockstep is used to describe the Lockstep implementa tion with this timing enhancement The clock input to one of the cores is delayed 2 cycles on the input side while the output from the other is delayed 2 cycles on the output see Figure 5 2 Therefore each core is executing a different operation at any given point in time This way if there is a disturbance that might result in errant operation of the core such as radiation a voltage glitch a voltage dip etc both cores will again fail in different ways since they are executing different instructions at the point of the disturbance Physical diversity Another consideration that is given to the design are placement orientation of the cores on the silicon The 2 cores are rotated 90 degrees and flipped in relation to each other with at least 100 um space in between This is to give a physical di versity Physical diversity gives some assurance of a different failure mechanism if there is a manufacturing flaw causing physical dama
80. hecker core receives the data that the master read Both cores use the same data bus without risk of deadlocks data starvation or live locks while they execute the same instrunctions at a time The cores are not able to execute different instrunctions at a time The applicant shall ensure that the implemented means provides uninterruptible access to the shared memory locations from either core of the MCP and prevents either core being locked out from accessing the shared memory The applicant shall state in their software plans whether or not they intend to use shared cache between the processing cores If shared cache is used between the processing core the applicant shall describe in their plans their strategy for managing and verifying cache usage If the applicant uses shared cache between the processing cores the applicant shall conduct analyses and tests to determine the worst case effects that the use of shared cache and memory can have on the execution of the specific software applications hosted on the two cores of the MC The applicant shall describe the effects of the shared cache usage on the WCET to the certification authority The cores cannot independently access the shared resources L1 amp L2 Cache are exclusive resources for each core Not Applicable Not Applicable Not Applicable The applicant shall implement and verify a means to mitigate the effects of using shared cache Not App
81. hecker CPU s AXI master port for read transaction ARVALIDM When asserted indicates address and control are valid on the Checker CPU s AXI master port for for write transaction AWVALIDP When asserted indicates address and control are valid on the Checker CPU s AXI peripheral port for read transaction ARVALIDP When asserted indicates address and control are valid on the Checker CPU s AXI peripheral port port for write transaction BVALIDS When asserted indicates that a valid write response is available on the Checker CPU s AXI slave read transaction RVALIDS When asserted indicates address and control are valid on the Checker CPU s AXI slave port for Figure 5 8 List of the signals that must be inactive in Checker CPU 40 The latter diagnostic follows the same principle with the Checker CPU Inactivity Monitor but in terms of power domains The Power Domain Inactivity Monitor monitors the key bus signals of the inactive power domains that in a correct op eration must be inactive and isolated from the rest of the system Any activity of these signals indicates a transaction onto the interconnect which is an unintended behavior in case of a turned of power domain and an error is flagged 5 5 1 Vectored Interrupt Manager VIM As aforementioned the Vectored Interrupt Manager VIM Module is operating in lockstep in the same manner that the CPU cores do VIM is the responsible module for controlli
82. high reliability and comply with the application requirements A typical case for a CCM error detection could be to run the LBIST test to ensure the correctness of the core s if the LBIST test passes it means that it was a soft error In a successful LBIST result it is safe to continue the execution of the appli cation If the CCM error continues a soft reset is probably required to re sync the cores All these actions are strongly related to safety requirements of the system The aforementioned scenario though could not be applicable to a system that ex ecutes safety critical real time operations and the temporal burden of LBIST test execution could lead to a disaster Actions like turning the system off setting it to a degraded mode i e stop executing safety critical operations while transferring them to another operational module or notifying a human could be just a few scenarios of CCM error mitigation 86 Chapter 9 Retrospective As was already mentioned in this project it is not possible to eliminate risks in any system The real goal of functional safety is to detect and correct a potential fault or to lead the system to fail safely In any system there is a Single Point of Failure SPOF In the case of the Lockstep architecture what is essentially achieved is to transfer the SPOF from the CPU core s to the CCM comparator module This is always a concern of safety system designers Accepting that it is not feasible to avoid a SPO
83. hronous asyn chronous memory devices peripherals or FPGAs can be achieved with the External Memory Interface EMIF module 41 Defence amp Security Netwogkipss 4 Figure 3 1 TMS570LS3137 Hercules Development Kit The TMS570LS3137 Hercules Development Kit was employed in this project to evaluate and start the development with the Hercules platform of these safety se 24 Chapter 3 Embedded Software Development ries of MCUs by Texas Instruments The development board provides the following features e 337p BGA TMS570LS3137 MCU on board e On Board USB XDS100vs JTAG e On Board SCI to USB Serial 20 pin ARM and MIPI JTAG connectors 6 White NHET LEDs 2 Tri Color RGB NHET LEDs e 8MB SDRAM EMIF Ambient light and temperature sensor e 2 CAN transceivers 1 RJ 45 ethernet port 1 Micro SD card slot SPI mode 3 1 2 SafeTI Hitex Safety Kit TMS570LS31x 21x The SafeTI is a tool developed by Hitex in cooperation with Texas Instruments and facilitates the evaluation of safety features of TI s Hercules safety microcontrollers 33 The kit consists of two Hercules microcontrollers placed on the same eval uation board where the first acts as a Safety Device Under Test SDUT and the second as a Control and Monitoring Device C amp M There are two variations re garding the model of the SDUT device it may be either an TMS570LS3137 or an RM48L952 MCU In the current project was used the former The C amp M device is
84. ibADC MibSPI MPU N2HET PBIST PCR PD PBRIDGE RAM RCCU RTI SCI SDUT SECDED SEH SIL SoR SPI SPOF SRAMC SRAM STC SWT TC Microprossecor Control Unit Multi Buffered Analog to Digital Converter Multi Buffered Serial Peripheral Interface Memory Protection Unit Next High End Timer Programmable Built In Self Test Peripheral Central Resource Power Domain Peripheral Bridge Random Access Memory Redundancy Control Checker Unit Real Time Interrupt Serial Communication Interface Safety Device Under Test Single Error Correction Double Error Detection Simple Electronic Hardware Safety Integrity Level Sphere of Replication Serial Peripheral Interface Single Point Of Failure Static Random Access Memory Controller Static Random Access Memory Supplemental Type Certificate Software Watchdog Timer Type Certificate xiv List of Figures TRM Technical Reference Manual TSOA Technical Standard Order Authorization VandV Verification and Validation Acknowledgments First of all I would like to thank from the bottom of my heart my father Geor gios Ganitis and my brother Giannis Ganitis for their financial but mainly for the emotional support that were providing during my whole period of studies and especially during the conduction of this Master Thesis Moreover I would like to thank my professor Dr Ulrik Nyman for his continuous support and the frequent meetings and for his willingness to propose solutio
85. in this order The assignment of ASIL very similar to SIL assignment in IEC 61508 to each safety goal is dependent on 3 parameters e Exposure how often the situation is risky for the passengers or other road users e Controllability how involved people can handle the violation of the safety goal 2 1 Functional Safety and Fault Tolerance 11 e Severity how serious could a missed safety goal become ASIL Impact of Failure Controllability Exposure In Car Examples Lag in display from Failure of collision i Fatal Survival Anti lock Braking c uncertain Difficult to control Medium Probability system wheel lock up Fatal Survival Airbag deployment uncertain Difficult to control High Probability while driving Figure 2 1 ASIL classification examples 28 In this example the unexpected inflation of the airbag would most likely belong to ASIL D level A functional safety concept is required for each potential hazard For the airbag example the functional concept could rely on a redundancy concept where the system would consist of a control and monitoring channel The airbag would only inflate if both of the channels produce identical output The technical aspects of these functional safety concepts i e sufficient number of independent sensors and the method to independently enable the trigger circuit compromise the technical safety concept Software and hardware requirements are determined then according to the technical
86. is also coupled to a delay stage DEL1 which should be equal with the delay stage used in the CPU2 DEL2 The delayed output of CPU1 OUT1d and the output of the CPU2 are fed to CCU to be checked if they are iden tical In case of discrepancy the system is noticed via the compare output OUTc Regarding the read and write operations both CPUs read data from the common system memory but only the master CPU is permitted to write and modify the system state 6 2 Error detection and communication of an error location in multi processor data processing system having processors operating in lockstep Freescale Semiconductor Ind 6 2 Error detection and communication of an error location in multi processor data processing system having pro cessors operating in lockstep Freescale Semiconductor Inc The current invention of Freescale Semiconductor Inc facilitates the prevention of a wrong lockstep operation due to some internal core errors More specifically some internal errors such as soft errors in a cache are not replicated to both of the cores that operate in lockstep As a result the behavior of each core will be different leading the cores to fall out of lockstep 47 The patent is referring to multi core systems gt 2 cores that operate in lockstep The principle is that when a soft error is detected internally and an exception is triggered the other cores are forced to enter in the same altered state This imitation of the behavio
87. iseconds In case of detection of a hard error handle those errors or failures in a fail safe though further execution of the application is inhibited manner that contains the effects of any errorsor Endless detection of the same error A proposal for this failures within the system in which the MCP is ocassion would be to employ a back up sub system that will installed be placed in the longest possible distance away from the primary to assure physical diversity and will execute the same operations If part of the safety critical functionality hosted by The employment of a redundant MCP that is described in the the applicant s MCP must continue to be available REQ45 will achieve to continue the execution of safety critical even after errors or failures are detected in the processes even if an error is detected The defected MCP will MCP or its hosted software the applicant shall send a signal to the operational and will go idle The design implement and verify a means to provide operational MCP is set as the Master now Alternatively that functionality that is external to the MCP three or more sub systems can be employed to execute the same operations and to compare their outputs A voting technique shall be used to verify the correctness of the sub systems Bibliography 1 10 11 12 13 14 15 Ken G Pomaranski Roseville CA US Andrew H Barr Roseville CA US Computing with both lock step an
88. ives the output signals of both CPUs and checks for a mismatch In case of a discrepancy a compare error signal is raised to the ESM module see Figure 5 7 The new lockstep related feature of this microcontroller is the implementation of the VIM see Section 5 5 1 in Lockstep mode Two VIMs are running in lockstep and their outputs are fed for comparison to the CCM R5F In case of a mismatch a compare error is signaled to the ESM in the same way as with a core compare error Additionally two new run time diagnostic features are integrated 40 e Checker CPU Inactivity Monitor e Power Domain PD Inactivity Monitor The former measure monitors the key bus signals of the Checker CPU to the in terconnect The functionality of the Checker CPU is restricted to the execution of 50 Chapter 5 Lockstep the same instructions with the Master CPU and propagation of the output signal to the CCM R5F module It is prohibited to interact with the rest of the system while only the Master CPU is allowed to perform the transactions with memory and peripherals For this reason a number of key bus signals are monitored and must be inactive If an activity is indicated on them an unintended interaction of the Checker CPU occurred to the system In case of detection of activity in any of these signals see Figure 5 8 is flagged as an error Signals Remark write transaction AWVALIDM When asserted indicates address and control are valid on the C
89. l avionics In avionics after a safety assessment of the system that is driven by SAE stan dard ARP4754 Certification Considerations For Highly Integrated or Complex Aircraft Systems a criticality level is determined It might be from Level A to Level E Informally speaking the Level A is the most stringent where a potential failure might lead to loss of life of all passengers at Level B a few people passengers or crew may die at Level C there are major injuries at Level D minor injuries and finally at Level E no impact on the people and the environment 2 1 Functional Safety and Fault Tolerance 13 Level A Catastrophic Level B Hazardous Severe Level C Major Level D Minor Level E No Effect Figure 2 2 Criticality Level Pyramid for Avionic Systems 45 It should be noticed that even appliances that are not critical for the safety of the flight require at least a Level D certification A real life example is an aircraft crash that was caused by a seat back video screen for the entertainment of the passengers The crash analysis revealed that these appliances were sharing a bus for power supply with critical devices in the cockpit of the aircraft The malfunction of these safety unrelated devices affected the safety of the whole aircraft Software Considerations in Airborne Systems and Equipment Certification DO 178 As was already mentioned the DO 178 is aiming at establishing guidelines in terms of software
90. ler copies the architected state of the pair CPU0 CPU 1 to the idle pair CPU2 CPU3 At this point the system operates without the existence of a hot standby pair un til the appropriate actions execute on the first infected pair i e all caches are flushed on the CPU0 CPU 1 Finally the node controller reboots the CPU0 CPU1 pair of processors and set them as the new hot standby pair K START Noa 205 DETECT ERROR EVENT 210 SIGNAL LOSS OF L TI TO NODE CONTROLLER S i X a m v X a COPY ARCHITECTED S STATE TO HOT 220 STANDBY PROCESSOR x EXECUTE RECONERY 225 O 5 lt a ee SS REBOOT FAILED PROCESSOR AND DESIGNATE AS HOT 230 STANDBY PROCESSOR a END Figure 6 9 Flow diagram for recovery from loss of lockstep hot standby processor 23 6 5 System and method to increase lockstep core availabil ity Infineon Technologies AG An alternative method for increasing lockstep core availability is presented by Infi neon Technologies AG 27 In a simple implementation of a lockstep architecture in prior art when a core related error occurs the entire system both CPUs and peripherals is placed in a reset state to recover by the error This method has the drawback of a delay of tens of milliseconds that could not satisfy the temporal re quirements of a real time safety critical system Another technique that efficiently increases the core availability is the employment o
91. licable 110 Appendix G Requirements for Multi core Processors er REQ29 The applicant shall describe in their software AEH Not Applicable plans or other deliverable documents how they intend to allocate manage and measure the use of resources and the use of the interconnect by the applications hosted on the MCP and by other MCP peripherals so as to avoid contention for MCP resources and to prevent the capacity of the interconnect and the resources of the MCP from being exceeded The applicant shall allocate the usage of the MCP Not Applicable resources to the software applications hosted on the MCP and shall verify that the total of the resource demands when all applications are executing in the worst case situation does not exceed the total of the resources available The applicant shall determine the maximum Not Applicable capacity of any interconnect mechanism of their MCP to sustain transactions in a deterministic manner The applicant shall verify that the demands made _ Not Applicable on that mechanism by the software hosted on the MCP or by any peripherals of the MCP do not exceed its maximum capacity during any phase of operation of the system The applicant shall identifiy the non deterministic Not Applicable effects that any coherency mechanism of their MCP could cause such as excessive jitter in data arrival times or transactions being lost or serviced in an undesirable order The applicant sha
92. ll design implement and verify Not Applicable means to deactivate the features concerned to cause non deterministic effects or to mitigate these effects REQ35 In the software plans the applicant shall identify Not Applicable their MCP software architecture and operating system s and has provided details of how they will develop and verify all the software loaded onto the MCP so as to ensure that the software applications hosted by the MCP execute deterministically including details of any special methods or tools that are necessary due to the use of an MCP or the selected MCP architecture Software Verification The applicant shall describe in their Software Not Applicable Verification Plan SVP the environment to be used for each software test activity sufficiently representative of the target MCP REQ38 The applicant shall verify that each operating system and or software interface with the MCP hardware when installed on the MCP target complies with applicable objectives in DO 178B ED REQ37 For any testing that will not be conducted using the target MCP the applicant shall describe the environment that they intend to use their rationale for using a different test environment and why they consider that test environment to be 12B or DO 178C ED 12C e g DO 178C reference A 5 numbers 1 through 9 A 6 numbers 1 through 5 The applicant shall verify that each individual software application that is hosted
93. ly that will not ever meet one in their whole career it still remains a possibility Because of this infrequent occurrence of an engine failure the crew is not always able to identify and to handle such a malfunction 9 As a result erroneous operations of the crew in such a case could lead to devastating consequences for them and for the passengers The engine is likely the most complex and crucial component on the aircraft It is responsible for hundred calculations and as a result a main part of the safety relies on this component In our example we will focus on the fire detection scenario of the engine In avionics there is the principle of keeping the aircraft trajectory as the highest priority duty 9 Thus in case of an engine malfunction the system should stabilize the aircraft trajectory first before proceeding with further actions to resolve the engine problem This could probably cause a larger damage to the engine but it would help to ensure the safety of humans and the aircraft itself In our scenario we introduce a lockstep processor into the aircraft s engine One of the functions of this MCU is to receive the raw values from temperature sensors to convert them to Celsius degrees and in case of exceeding the acceptable limit 8 2 Example Lockstep Use Case 81 to notify the crew and the system controller via the Ethernet interface to prevent a fire After taking the appropriate actions to stabilize the aircraft trajectory i
94. m and fo cus only on one functionality and one hazard How the radiation could affect the conversion of the temperature values and how the Lockstep architecture could facilitate the system designer to mitigate this malfunction The general safety ar chitecture i e a voting system compromising of multiple instances of the Lockstep MCUs or multiple dissimilar temperature sensors needed to comply with DAL A requirements that such a system would most probably require is purposefully skipped The purpose of this Lockstep use case is to introduce the reader in think ing of the usability of this architecture in complex systems that need to comply with stringent standards and fulfill safety requirements to prevent hazards that could affect humans and the environment Chapter 9 Retrospective In this chapter we summarize the project objectives highlighting what has been achieved and how both company and scientific community could benefit by this research The discussion section describes the challenges and restrictions during this project and the philosophy of Texas Instruments for functional safety from the Hercules family MCUs point of view Moreover the Future Work section includes ideas how this current project could be extended in scientific and practical terms 9 1 Discussion The objectives of the current project were successfully covered providing evidence for each aspect of a Lockstep architecture and especially for the available hardware
95. mpany Airbus Defence and Space on the topic of Contin uous Integration Implementation for in house Applications in Friedrichshafen of Germany This Master Thesis is a product of six months work beginning on February of 2015 The topic is considered state of the art in the field of safety critical embed ded systems Nevertheless the main challenge was to find study material for the Lockstep concept while the majority of the literature is well protected from the semiconductor companies to protect their Intellectual Property This project is intended for electrical and embedded system students and for en gineers interested in safety around embedded systems It could be used as infor mation material and guideline to employ a Lockstep architecture microcontroller in any embedded system Aalborg University August 1 2015 fe S Stylianos Ganitis lt sganit13 student aau dk gt xvii Chapter 1 Introduction The increasing integration of computer systems into our daily routine and espe cially in systems where their behavior is vital has strengthen the need to contin uously investigate the means to make such systems more reliable and safe The safe operation of systems that may affect the integrity of human lives is essential However it is a prohibitively expensive process to ensure the functional correct ness of a system Hardware and software errors individually or in combination may lead to devastating consequences
96. mplementations that are proposed in the current patent propose different state control logic functionalities In one case they control the operation of their respective CPU core being capable of pausing and restarting them and writing their current state to the state buffer The saved state from the state buffer is also loaded with the help of this state control logic In other implementations it sam ples periodically the master CPU saving its state to the state buffer or a sampling is triggered on an event occurence such as the start of a task execution 6 5 System and method to increase lockstep core availability Infineon Technologies AG5 200 Task Start 201 Save State to 002 212 Buffer 204 Reload State from Buffer Error Detected during ft Taek Evaritinn be 206 task execution ie tots 210 Figure 6 11 Flow diagram where the save state is triggered on a task start 27 In the case that the flow diagram in Figure 6 11 demonstrates the state control logic of the master CPU saves the current state to the state buffer by generating a control signal The current state is interpreted in the voltages and or current values or their logical high and low values of the circuit components that the CPU contains or the values of the local registers of the main CPU After the completion of the storage of the current state the first instruction of the task is received by the memory module and is executed in lockstep mode If
97. n diagram of CCM R4F 35 5 4 2 1001D Lockstep Mode The 1001D is the default mode of the module after a start up where the afore mentioned comparison of the output signals from master and checker CPUs are compared and an error flag is raised to the ESM Upon a reset the values of inter nal registers for the Cortex R4F CPUs may vary Therefore to avoid an erroneous computation in comparison the application software must initialize registers of both master and checker CPU with identical values before they are used In this operating mode the CPU compare error module generates two signals as a safety mechanism to interface with the ESM in case of a mismatch both CCM R4F compare and CCM R4F self test error flags are raised to the ESM to ensure that the error is detected even if one of the paths fail 5 4 3 Self test mode In self test mode test patterns are automatically generated by the CCM R4F to determine its correct functionality In case of an error detection that indicates a hardware fault on the module itself a CCM R4F selftest flag will be raised to the ESM After the completion or termination of the self test if no error occurred the self test complete flag is set to notify the system to proceed appropriately Moreover after the completion of this integrity test the module remains in the self test mode until the appropriate register MKEY Mode Key Register is set to indicate the operation in a different mo
98. n the code of PBIST e The size of a PBIST controller is significantly smaller than the size of an em bedded CPU The execution of the tests is triggered by the application where the developer has the option to select the algorithms that will run against the memory modules of the system PBIST tests are destructive for the memory contents and if they execute during operation of the system the application may copy the data of the memory under test to a non tested memory and restore the data after the completion of the current test 4 2 Programmable Built In Self Test PBIST Module 33 A potential error is indicated in PBIST status registers and a further feature is the capability of keeping log of the errors Similarly to the LBIST controller see Section 4 1 PBIST is a deterministic diag nostic method Therefore a timeout counter could be programmably implemented using the Real Time Interrupt RTI module to indicate potential failure in test com pletion within expected amount of time 35 Some of the test algorithms that are used for PBIST are following 40 1 March13N March13N is the algorithm for SRAM testing that provides the highest overall coverage The following algorithms are complementary to March13N which is the baseline The basic concept is to check if a bit cell can be written and read as 1 and 0 and if a bit cell remains unaffected by the bits around it The operation is simple Firstly the array is initialized
99. nalog to Digital Converters MibADCs of 24 chan nels with 64 words and parity protection are integrated A conversion can be per formed on an individual channel or a group of them that is set by the software 1DMIPS Dhrystone Million Instructions Per Second is a benchmark to measure the integer per formance of processors and compilers 7 3 1 Hardware 23 Three Multi Buffered Serial Peripheral Interfaces MibSPIs two Serial Peripheral Interfaces SPI one Local Interconnect Network LIN one Serial Communica tion Interface SCI one Inter Integrated Circuit I2C three Control Area Network DCAN one Ethernet interface and one Flexray controller are the communication interfaces integrated into the device Two Frequency Modulated Phase Locked Loop FMPLL clock modules are em ployed These modules are capable of multiplying the external frequency reference when higher frequency is required for internal use A Direct Memory Access DMA controller with 16 channels 32 control packets and parity protected is integrated with a built in MPU to protect the rest of the memory in case of a malfunction of the DMA the DMA is restricted to a prescribed area of the memory All the device errors are controlled by the Error Signaling Module ESM while it determines if an interrupt will be triggered or an external pin ERROR is asserted for different kinds of errors Finally expansion of the device with external modules such as sync
100. ng and prioritizing the several interrupt sources that may occur during the program flow It is the same module for interrupt handling that is used on TMS570Ls31x 21x microcontrollers but it is enhanced with a redundant one to operate in lockstep for increasing safety Temporal diversity is also supported with a delay stage of 2 CPU clock cycles same as in the delayed lockstep architecture of the CPU cores 40 5 5 Lockstep implementation on TMS570LC43x by Texas Instruments 51 VIM Interrupt Vector Table 1 f MMR I F1 ViM a gt To CPU1 Core1 INT_REQ CCM VIM To CPU2 Core2 i VIM Interrupt Vector Table 2 Figure 5 9 Dual VIM for Safety 40 The scope of this research is restricted to the lockstep operation For more in formation about the kinds and the priorities of the generated interrupts and the interrupt handling in the Hercules family of microcontrollers someone can refer to the Technical Reference Manual TRM of the specific microcontroller Chapter 6 Lockstep Related Patents Various semiconductor companies have settled patents for distinct lockstep archi tectures and several features for enhancement of its functionality and capabilities In this chapter the lockstep related patents are presented emphasizing on the dis parity and the special characteristics that each invention has to demonstrate 6 1 Delayed lockstep CPU comp
101. ng the detected errors It is controlled by two 32 bit registers The CCM R4F Status Register CCMSR and the CCM R4F Key register CCMKEYR The former contains information about whether the self test is completed or an error is detected and what was the specific pattern that the self test failed The latter is used to set or read the mode of the module For more information regarding the registers of the CCM R4F module please refer to Ap pendix A The CCM R4F module is set to Error forcing Mode by setting the CCMKEYR register to 0x09 see Appendix A The input patterns of OxAs and 0x5s are fed to the CCM R4F module and a compare error is generated The current example application has employed a flag ESM_High_Int_Flag that is set to the ESM Interrupt Service Routine ISR The HALCoGen generated code determines the priority group that each detected error belongs to see Section 5 4 8 for error categories and executes the corresponding code regarding the severity of the error This is the point where the application developer has to deal with the detected error depending on the safety requirements and the temporal constraints of the application For the sake of this experiment the ISR is empty while the purpose is to wait for the user to press the button and clear the error Therefore with a second button press the ERROR pin is cleared and the execution of the application continues 7 1 2 CCM R4F Experiment with Hitex Safety Kit TMS570L
102. ns for each problem with the experience of his strong academic background Thank to Airbus Defence and Space the Head of TEOHD5 department Mr An dreas Schacht that trusted me to conduct this research and my supervisor Mr Frank Glaser that was always there for any technical problems and questions Furthermore thanks to Texas Instruments support and especially to Mr Chuck Davenport from TI forum for the long conversations around several issues and questions I admired his professionalism that demonstrated with the detailed and simplified answers for all of my questions Last but not least I would like to thank my lovely mother Despoina Ganiti that passed away one month before I start this academic journey for my Master studies She sacrificed everything for me and my brother She will be always with me in my heart and my mind She was physically absent during these two years of my studies but the thought of her was the largest source of power for me in any challenge that I met Preface This thesis is made as a completion of the Master education in Embedded Software Systems for Stylianos Ganitis The author after the completion of his Bachelor s studies on Informatics and Communications at the Technological Education Insti tute of Serres in Greece continued his academic journey at Aalborg University of Denmark in September 2013 In July of 2014 he had the honor to conduct his six months internship in the leading European Aerospace co
103. nt Integration 5 Basis Software Engineering Requirements Engineering according to develop ment standard Software Development according to RTCA DO178B Level A D Software Development according to standard processes ABD0100 VM XT and so on The permanent employees of the department are equipped with a multi role ed ucation following a continuous training on new technologies new development methodologies and certification issues 1 2 Problem Statement The current research focuses on the Lockstep architecture that is employed in safety critical systems mainly in the automotive medical and industrial markets The motivation behind this topic is that the Lockstep architecture is not broadly known in the scientific community It is a state of the art feature in a variety of safety critical applications Furthermore it has already a history of use in a variety of safety critical applications but the aforementioned domains that it is employed in are very special Consequently it lacks documentation in terms of a safety feature architecture and the only source of information is the safety manuals of semiconductor companies for their own specific implementation Chapter 1 Introduction This research focuses on clarifying what Lockstep as a broad term means high lighting the benefits and potential drawbacks of Lockstep as a safety mechanism for error detection in embedded systems and investigating the different implemen tations from
104. nts of the system are allocated to the hardware e Identification of functions and their effects For each function that is ex pressed by a requirement and has an assigned DAL hardware components are designed to fulfill the respective requirement e Design assurance approach for hardware Requirements based verification is performed to determine the criteria of the assigned DAL 2 1 Functional Safety and Fault Tolerance 15 2 1 4 The Meaning of 1001D For the Hercules family of microcontrollers the default Lockstep mode is described as 1001D one out of one with diagnostics A 1001D system is comprised of a sin gle channel with a diagnostic channel The terminology of these architectures is mentioned in the IEC 61508 standard Generally the definition of MooN in IEC 61508 4 is given as M out of N channels architecture where either of the two channels can perform the safety function and the definition of MooND is M out of N channel architecture with Diagnos tics 19 As is also mentioned in the same safety standard in case of a dangerous failure the safety function fails as well In simple words a MooN system contains N identical components and if at least M out of N components are operating cor rectly the system is error free Beyond the hardware this redundancy concept may also be applied at software time or information level Channel see bat age Wi sean gg Diagnostics 1 IEC 324 2000 Figure 2 3 100
105. ocation in multi processor data processing system having processors operating in lockstep Freescale Semiconductor Inc 2 0 004 55 6 3 Method and system for fault containment Infineon Technologies AG 57 6 4 Method and apparatus for recovery from loss of lock step Hewlett Packard Development Company L P 0 60 6 5 System and method to increase lockstep core availability Infineon Technologies AG Jet oe ak a ldnoed ane a re nk Beg a ek os 63 6 6 Computing with both lockstep and free step processor modes Hewlett Packard Development Company L P 004 66 Results 69 7s EXperiments e Sees haa at Maat et SS hehe te Uk i hho git 69 7 1 1 Error Forcing Experiment on TMS570LS3137HDK 69 7 1 2 CCM R4F Experiment with Hitex Safety Kit TMS570LS3137 72 FA ORCOMCIMSIONS ai oad Sate he oh Boe BS ok SN RRA RS 76 Lockstep In Avionics 79 8 1 Certification Actions 0 46 4 erie Oak a 79 8 2 Example Lockstep Use Case s o oo aa a 80 Retrospective 83 9T Discussion 8 2 4 ose ety wid i ae BoE A A ae Wea ad ae Wed Sy ned Mo 83 92y Conclusio e a 38 ett dh ay ds Mi Sa Ede woe Bate aeolian 86 Contents 953 F t re Work sogea te Sad ate a he NP EN ate dey eee 8 Appendix A CCM R4F Register Tables Appendix B_ Main function of the first experiment Appendix C CCMR4F Compare Function Appendix D ARM GCC 4 9 2015q1 Big Endian Patch Appendix E How to Build the GNU ARM Toolchain on Ubuntu E 1 Build GNU
106. oducts of the discussed microcontroller families need to comply with them in order to be per mitted for use in safety critical applications While the company under the current project was supported belongs to avionics industry an introduction to the related certification standards is provided One of the crucial points for the company research is How Lockstep should be treated in the certification process of an avionic system As is described in Chapter 2 1 3 the certification process in avionic systems varies from other sectors Beyond the most stringent requirements to ensure the functional safety the certification process is done on a system level As a result we cannot say that an MCU employing a Lock step architecture is certifiable for an avionic system or not It is always dependent on the application and the system and consequently on the DAL level that will be 9 1 Discussion 85 derived from the appropriate analysis Nonetheless we performed an initial task that is inevitable for a piece of COTS hardware in an avionic system certification We conducted an analysis concluding that the TMS570LS3137 beyond a dual core processor system it should be treated as a single core The redundancy exists due to the diagnostic channel that monitors and facilitates the error detection in one of the two cores 1001D Safety Architecture As is already mentioned the Hercules family of microcontrollers was developed targeting the relevant
107. olved in our daily routine is in evitable in the modern world and their faulty operation may harm humans and or animals and the environment 13 Functional safety is a fundamental concept for any field involving safety related systems For instance in automotive it ensures the right operation of the airbags at the right time in transportation it ensures that the doors will open and close at the right time in medical sector it ensures the right operation of a pump that could lead to an over dose of a medicine to a patient and in manufacturing it could ensure the right mix of chemicals through an automatic valve closure mechanism The employment of safety mechanisms focusing on electronics and software of the system facilitates the reduction of risks to a tolerable level It is not possible to eliminate the risk but what the functional safety is targeting to is to reduce its negative impact Additionally functional safety is aiming to bring the system or the device in a tolerable level while it provides the means to measure the probabil 8 Chapter 2 Functional Safety and Certification in Embedded Systems ity of an event occurrence and the harm that it may cause As a result a system is functional safe only if it contains tolerable risks In IEC 61508 Standard the Safety is defined as The freedom from unacceptable risk of physical injury or of damage to the health of people either directly or indirectly as a result of damage to property o
108. on by the comparison of the outputs of the processors 6 6 Computing with both lockstep and free step processor modes Hewlett Packard Development Company L P 67 when the system is operating in lockstep mode In case of operation in free step mode the output data from the processors propagate to the system via the interface logic bypassing the LOL In memory beside the data processes and the operating system is stored a configuration database that provide information for the opera tion of the system e Distinguishing the step mode of each CPU free step or lockstep mode e Some criteria that the above mode assignment could change dynamically e A list of processes that must run in lockstep mode By default processes run in free step mode if they do not exist in this list In each process call a look up operation is performed in the database to check if the process requires execution on a single processor in free step mode or on a pair of processors in lockstep mode Moreover some special rules could determine the running mode of the process such as the time of execution or resource require ments of the process For instance in a nightly execution could be set a rule for forcing a lockstep mode as the performance is not as important as the error de tection at that time In one of the variations of the invention resource utilization by operating system could also be employed to determine if a switch to free step mode of an overloaded pai
109. on the MCP complies with the applicable objectives in DO 178B ED 12B or DO 178C ED 12C e g DO 178C reference A 5 numbers 1 through 9 A 6 numbers 1 through 5 when all the applications hosted on the MCP are executing in the intended final configuration of the processor and its hosted software REQ40 The applicant shall verify that the data and control coupling between all software components hosted on the MCP has been exercised during software requirement based testing including exercising any implicit e g through interconnect features or explicit interfaces between the applications via shared memory and any mechanisms to control the access to shared memory and that the data and control coupling is correct REQ41 The applicant shall conduct robustness testing of the interfaces and the features of the MCP both when software applications are executing individually and when all the software hosted on the MCP is executing and has verified the compliance of all the hosted software with the applicable objectives in DO 178B ED 12B or DO 178C ED 12C e g DO 178C reference A 6 numbers 1 through 5 and with the resource allocation to each application under these conditions Discovery of Additional Features or Problems Discovery of Additional Features or Problems Additional Features or Problems REQ42 The applicant shall identify any features of the MCP not specifically mentioned in this paper that could cause non determinis
110. oncept is used in safety relevant systems which require the detec tion of temporary or permanent errors during the execution of a task or a program Data corruptions occurring in a processor are more often undetected silent er rors 1 In the literature the kind of detected errors by the lockstep architecture are catego rized in several different ways One approach is the soft and hard errors and these terms are used to describe the persistence of the error Soft errors describe these that occur by a temporary anomaly i e Electrostatic Discharge ESD hit noise glitch and lead to a bit or a logic gate flip in the core Soft errors may be derived from transient events such as cosmic radiation radioactive decay power supply variations or cross talking 29 leading to unintended transient signals or condi tions in CPU cores or in the peripherals and their lifetime is typically 2 milliseconds or less This kind of errors can affect an instruction execution or computation but the next time that the same operation will be performed the error will not be re producible On the other hand hard errors are permanent and are caused by a long term degradation of the silicon logic over time Hard errors may also occur by corrupted memory cells or other circuit components due to ionizing radiation manufacturing inconsistencies 27 or exposure to high current which may cause metal migration Generally they are caused by physical defect of the ha
111. onsisting of a variety of components with different SILs is the weakest link As mentioned before the hardware and system can be assessed regarding the fail ure rate metrics On the other hand in the software there are not such metrics while there are not random failure modes The IEC 61508 3 describes the require ments for the software life cycle to comply with a determined SIL The highest SIL that a broad range of application is targeting is the SIL3 For SIL4 which is an ultra high safety level redundancy architectures are used i e multiple channels such as two versions of software serving the same purpose For instance in a single system two RTOSs from different suppliers can be employed to achieve compliance with SIL4 Developing a system with use of a certified COTS Commercial Off The Self soft ware such as a RTOS does not necessarily mean that the final product complies with the COTS s SIL It ensures that the RTOS provides the mechanisms for several tasks that are required for achieving functional safety such as memory allocation task synchronization maintenance of temporal constraints and so on In essence the compliance depends on the loyalty to the specific SIL development and testing requirements The artifacts of following these rules should be well documented and they are provided to the safety assessor The TMS570LSxxx and RM48x are two processor families developed by Texas In struments that are suitable for use in SIL
112. or signals 2 One issue that is mentioned in the current patent is the protection of the error blocking unit Specifically to integrate a safety mechanism to resolve a potential dangerous behavior of the error blocking unit where it could generate an error sig nal in absence of a lockstep error The full compliance with the ISO26262 standard requires the protection of this unit without analyzing though a specific method for this enhancement 6 4 Method and apparatus for recovery from loss of lock step Hewlett Packard Development Company L P Hewlett Packard demonstrates a method for recovery from loss of lockstep 23 In case of a delayed correction of this loss the whole system may crash In a dual core system in case of one failed processor the whole system may halt processing if the other core is still full functional As was already discussed in Section 6 2 that is described the proposal of Freescale for the same issue the loss of lockstep may occur from a data cache error A prior art is also presented for the loss of lockstep see Figure 6 6 where in case of malfunction of the one of the cores the architected state of the other core considered as good is saved to the memory Then both of the cores are reset and reinitialized and the architected state is copied from the memory 6 4 Method and apparatus for recovery from loss of lock step Hewlett Packard Development Company L P 61 DUAL CORE PROCESSOR 20
113. p HALCoGen Accessed 09 06 2015 113 114 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Bibliography Hercules Microcontrollers Brochure Texas Instruments 2014 Hitex SafeTI Hitex Safety Kit http www ti com tool safeti hsk rm48 Ac cessed 10 06 2015 IEC FAQ Available http iec ch functionalsafety faq ed2 page5 htm Ac cessed 17 06 2015 IEC61508 International Electrotechnical Commission Texas Instruments SafeTI Library User s Manual D Pradhan K Greb Hercules Microcontrollers Real Time MCUs for safety critical products User s Manual Texas Instruments 2011 Anthony Seely Karl Greb Design of Microcontrollers for Safety Critical Opera tion User s Manual Texas Instruments Fort Collins CO US Kevin David Safford Method and apparatus for re covery from loss of lock step Pat US 7 085 959 B2 2006 Lockstep Available http wikipedia org wiki Lockstep Accessed 2 05 2015 Lockstep Compensation Available http en wikipedia org wiki Lockstep compensation Accessed 02 05 2015 Lockstep protocol Available http en wikipedia org wiki Lockstep protocol Accessed 02 05 2015 Simon Brewerton Trowbridge GB Neil Hastie Gloucestershire GB Sys tem and method to increase lockstep core availability Pat US 2014 0258684 Al 2014 News about Autosar and Iso26262 A ne
114. profiler The profiling facilities are active through the whole development cycle and the goal is to end up in a finely tuned code when it is used efficiently by developers Testing and performance benchmarking are time consuming tasks that should be 3 2 Software 27 performed continuously during the development process CCS supports a script ing environment suitable for the automation of such processes The flexibility of the Eclipse based environment that provides the capability of adding plugins and extensions the aforementioned features plus others such as image analysis and visualization simulation and support for TI s real time oper ating systems make the CCS an ideal IDE for embedded application development for TI s systems 3 2 2 HALCoGen The lowest software layer of an embedded system that directly interacts with the hardware is called Hardware Abstraction Layer HAL It is usually responsible for the system initialization and the direct access to the several components of the MCU Application Layer Real Time OS RTOS interfaces Hardware Adaptation Layer HAL Hardware Figure 3 4 Hardware Abstraction Layer in an Embedded system 36 Texas Instruments provides a GUI tool called HAL Code Generator HALCoGen that the user can configure and generate device specific code for initialization of the system and configuration even for the safety related functions 15 It could also be describe
115. r of processors or a reallocation of the process to an other pair should be performed MONITOR LOCK STEP VS FREE STEP RESOURCE UTILIZATION BALANCED N S22 Y CHANGE STEP MODE 23 MAINTAIN STEP MODES S24 Figure 6 13 Flow graph of a potential scenario using the current invention 1 CALL PROCESS Sit N LOCK STEP REQUIRED Si2 N Y ASSIGN TO LOCK STEP PPROCESSORS 13 ASSIGN TO FREE STEP PROCESSORS S14 In the scenario presented in the flow graph of Figure 6 13 the steps 21 S24 are executed in parallel with 11 S14 As aforementioned the steps S11 S14 check the configuration database and assign the process to a single or a pair of processors for free or lockstep mode respectively At the same time resource allocation of 68 Chapter 6 Lockstep Related Patents the processors is monitored by the operating system In step S22 if one of the processors is stressed a reallocation is performed in terms of step mode in 23 which may be performed dynamically or after a restart of the system Chapter 7 Results 7 1 Experiments Two experiments were conducted in the current project related to the Lockstep ar chitecture of Texas Instruments TMS570LS3137 microcontroller The purpose of the first is to demonstrate how the Lockstep architecture can be employed in a safety critical application in order to detect soft and hard errors that may occur during the execution of the softwar
116. r of the first core from the others is achieved with an error signaling interface It receives information regarding the altered internal state from the core that experiences it and provides this information to the other cores that they have to emulate the same altered state The information that is propagated may be the type and the location of the error facilitating to distinguish it from different predetermined types of errors When a machine check exception is generated in one core internally indicating a hardware failure the cross signaling interface forces all the other cores to generate the same exception Potential causes of such an exception could be a parity error a decode error logic error and single and multi bit errors discovered using error correction code ECC parity or other error detection mechanisms An alternative implementation to the machine check exception could be a cache control opera tion like a miss condition to be forced by the error signaling interface where all the cores perform an auto invalidation of the cache location and the location of the cache is reloaded In case of simultaneous errors a logic placed in the error signaling interface can decide which errors will be imitated by the other cores or which cores should be reset 56 Chapter 6 Lockstep Related Patents PROCESSOR 0 PROCESSOR 1 CROSS SIGNALING PROCESSOR PROCESSOR 0 2 1 6 i MEMORY MEMORY F ADDR A
117. r to the environment and the Functional Safety as It is a part of the overall safety that depends on a system or equipment operating correctly in response to its inputs In ISO 26262 the definition of Functional Safety is described as Ab sence of unacceptable risk due to hazards caused by mal functional behavior of electrical and or electronic systems 22 A variety of Standards exists for specific or general markets to facilitate the compli ance of the systems to the functional safety principles The standard for application specific functional safety development in electronics is the IEC 61508 Various domain specific standards have been derived from that 30 e IEC 60730 for white goods ISO 26262 for automotive passenger vehicles EN50128 for software development of railway applications e JEC61513 for nuclear power plants e JEC61511 for the process industry and associated instrumentation IEC62061 and ISO 13849 for machinery electrical control systems TEC62304 for medical systems These standards offer guidelines to assess risk and assign safety goals for safety related systems they provide methods to reduce systematic failures frameworks for quantitative analysis of random failure rates and effectiveness of diagnostics to detect them and finally they provide guidelines for the maintenance of functional safety after the deployment of the product The functional safety term should not be confused with product char
118. rdware and they are not recoverable 5 3 Lockstep implementation by Freescale 39 In other source literature the errors that lockstep architecture detects are classified as follows 10 1 Permanent faults 2 Intermittent faults 3 Transient faults In this classification permanent and transient faults are alternative terms to de scribe the hard and soft errors respectively However intermittent faults are a special case that describe errors that occur either periodically or at irregular inter vals under circumstances that several events performed simultaneously This kind of errors are difficult to detect as all the factors that contribute for their occurrence should be present to reproduce them The lockstep architecture is an efficient way to detect all the aforementioned faults and in some cases to isolate and or correct them using a variety of enhancements that are presented in the following section 5 3 Lockstep implementation by Freescale Freescale Semiconductor Inc one of the leading companies in the world in the construction of safety systems for automotive industrial and consumer systems has introduced the Lockstep concept into the dual core MPC564xL MCU family The safety features of this family is targeting to applications that need to comply with IEC61508 SIL3 and 15026262 ASILD safety standards 11 This level of safety is needed for systems such as e Electric power steering e Short and mid range adaptive
119. re Platform Support Applications Functions as they are presented in the internal documentation of the company are following e The Basis Software provides a Platform and additional optional Support Applications or Support Functions for the integration and installation of ap plication software through standard interfaces and standard communication methods e The Platform provides a standard interface to the Application Software Cov ers all Parts for abstraction of the Hardware The Platform includes the Oper 1 2 Problem Statement 5 ating System Board Support Package Device Drivers Libraries amp Commu nication Interfaces comparable to Windows e Support Applications Functions are utilized to support and or to main tain a complete system 1 X processors in a box e g Loader Maintenance Software Redundancy Communication Concepts etc The expertise of the department can be categorized under the following 5 sections 1 Equipment Engineering Support Including system design selection of proces sors and hardware modules software architecture 2 Certification Activities Development of Certification artifacts Support Verifi cation amp Validation V amp V Process Support Safety Analysis 3 Tools Development Environment Requirement Analysis and Software Design with UML RTOS Configuration 4 Test amp Integration Support Hardware Bring Up Hardware Software Inte gration and Test Support Equipme
120. rived time that is displayed in the HSK Monitor is 15 usec 74 Chapter 7 Results Ve amp Profiling Global Settings Diagnostic Settings CCMR4 Profiling Profiling full safety task oe coma Prttng _ Prfng tan safety CPU lockstep User Commands CPU or orcng est Ga CPU sel et eror forcing ETEEN Parameterized CPU RA test o J Figure 7 3 CPU Error Forcing Test Time Measurement e CPU Self test Error Forcing This test forces an error at the self test error signal see Section 5 4 5 Regarding the HSK Monitor GUI the duration of this test is 6 usec e ex SafgTl HSK 4 Validating amp Profiling Global Settings Diagnostic Settings Profiling oe Profiling Profiling full safety task PE CPU lockstep User Commands CPU Error forcing test os CPU self test error forcing User Manual Parameterized CPU RA tests 7 out V Stop recording Figure 7 4 CPU Self Test Error Forcing Time Measurement Run time Injection of Error and Time Detection Measurement The application that is integrated into the HSK monitors and implements the ap propriate conversions of the temperature and acceleration values that are derived 7 1 Experiments 75 from the corresponding sensors that are integrated on the board In Figure 7 5 we can observe the behavior of the application when shaking the evaluation board and the acceleration values change rapidly The tab Application displa
121. ront of was called lockstep marching 24 Ad ditionally it is also referred to a protocol in multiplayer computer games that is used to solve partially the look ahead problem This problem came up when the latency of one client was used as a cheat method while after this delay she knew the steps of the other players The lockstep protocol synchronizes the actions of all the clients relatively to the higher latency client 26 Moreover a compensation system is called lockstep where for example all the graduates of a law school that are hired by a company receive the same salary bonuses promotions regardless their skills and working experience 25 Generally the term in all the aforemen tioned cases is used to indicate a synchronous or imitating movement It is usually expressed that something is in lockstep with to show that it follows the latter In this research and in the embedded systems world the Lockstep is a safety feature that is integrated from semiconductor companies in families of microcon trollers that are used especially in safety critical systems The idea of the concept is the same but the implementation varies In this section the Lockstep architecture is described in a general manner although the implementation the architecture the placement of the cores and the details of its usability are specific for the Her cules family of Texas Instruments microcontrollers Additionally an alternative implementation o
122. s comprises of safety critical systems that need to fulfill the stringent safety requirements Reliability in avionic is required to ensure the effi cient management and control of aircraft and space systems A fault occurrence in such systems may lead to a range of consequences from disruption of schedule or aircraft operations to accident conditions 4 Electronics is a major part of avionic systems Thus reliability of hardware electronic components is a main concern of the avionic system designers Redundancy of components or systems is a common approach in the field of avion ics In redundant architectures the main goal is that when a part of the system fails the rest of the system is able to continue operating efficiently Beside the fact that redundancy is an efficient way to mitigate faults there are factors that can affect the reliability of such architectures i e common cause failures load sharing The current research approaches a safety feature architecture of microcontrollers that is already established in automotive medical and industrial applications Lockstep is a fault detection mechanism to prevent errors that may occur at the core level The microcontroller designers of Lockstep architecture were inspired by the concept of redundancy and came up with a multi core processor while at the same time attempting to deal with common cause failures that could affect the efficiency of such a diagnostic Lockstep is just a piec
123. s in the silicon The aforementioned testing methods were not possible to be performed in this project due to lack of the special required equipment in the company s laborato ries and the high cost of acquiring this Similar experiments are performed by the semiconductor companies to ensure the right operation of their Lockstep architec tures The idea of run time fault injection by a secondary MCU is the innovation of the Hitex SafeTI Kit and is useful to assess the behavior of the primary MCU via the well structured user interface that is provided What was observed during the conduction of the experiments for the time measurement is variations in the fault detection time For instance 2 subsequent CPU Lockstep error injections result in different fault detection time Initially we did not expect a variation while this 7 1 Experiments 77 time should be deterministic A Lockstep error occurrence is critical and generates the highest priority interrupt After investigating the way that an error is injected in the primary CPU the variation in time detection was justified When the safety application gets the command to INJECT a fault the following sequence of steps happens 1 An I O pin is raised to signal the fault injection request to the monitor 2 A few instructions are executed to check which kind of failure to inject 3 A function from the SafeTI Diagnostic Library is called to cause the fault injection 4 After the
124. s invention for avoiding error propagation in multi CPU systems 2 Variety of implementations are also proposed for the error blocking unit alterna tive methods for step 65 in Figure 6 4 The first that is already mentioned is to block a write transaction that is performed by the first CPU A second case may be to let a write transaction to propagate to the system and in case of a mismatch in the comparator unit to send a command to abort it Moreover it may modify the address of the write transaction in a predetermined location that is considered as a safe storage for this purpose Finally an implementation could be to utilize error detection correction code in a write operation and an abort will be generated by the error blocking unit by corrupting the address signature of the write transaction Another approach is presented by employing two extra comparator units to verify said protection condition that the output of the comparator unit1 error signal and the output of the comparator unit2 negated error signal correspondingly agree 60 Chapter 6 Lockstep Related Patents 23 50 ERROR gt cpus Sy ART gt slockne f gt UNIT UNIT 2 COMPARATOR T gt 13 DELAY gt NTH CPU 2 gt Lao 12 COMPARATOR COMPARATOR UNIT 2 Figure 6 5 Two extra comparator units for comparison of the error and the negated err
125. s return to zeros and writes have no effect 3 0 MKEY Mode Key Read in User and Privileged mode Write in Privileged mode only 0 Read Returns current value of the MKEY Write Lockstep mode 6h Read Returns current value of the MKEY Write Self test mode 9h Read Returns current value of the MKEY Write Error Forcing mode Fh Read Returns current value of the MKEY Write Note switching operation to lockstep mode Self test Error Forcing mode It is recommended to not write any other key combinations Invalid keys will result in Figure A 2 CCMKEYR Register 42 Appendix B Main function of the first experi ment file sys_main c brief Application main file date 15 Jun 2015 version 04 03 00 This file contains the main function for setting the CCM R4F module in Error forcing mode when the programmable pushbutton is pressed USER CODE BEGIN 0 USER CODE END x Include Files include sys_common h USER CODE BEGIN 1 include ccmr4 h include esm_demo h include rti h include gio h define BUTTON_LOW 0 define ERROR ABSENCE 0 91 92 Appendix B Main function of the first experiment define ERROR_PRESENCE 1 define BUTTON_BIT_NO 7 define LED_BIT_NO 17 define LED_OFF 0 define LED_ON 1 USER CODE END USER CODE BEGIN 2 uint16 buttonPressed USER CODE END
126. s to initialize the registers of both with the same values to prevent an erroneous fault detection Each core has its own set of registers but the two cores are not independently accessible When you write to one core s configuration registers the write is sent to both Determinism has to be checked as for a single core Lockstep architecture is not a special case while from a user and programmer point of view it is like having a single core Both cores access the full Flash and RAM address space The undocumented features should get disabled as in a single core MCP Documentation is sufficient for the use of Lockstep The initialization process that is described in the REQ1 of this document is thorougly analyzed in http www ti com lit an spna119 spna119 pdf The applicant shall set those registers and pins to disable those features Lockstep mode cannot be disabled The applicant shall plan develop document and verify a means that ensures that in the event of any safety critical configuration settings of the MCP being inadvertently altered an appropriate mitigation is implemented In case of a Lockstep error detection an interrupt is triggered to the Error Signaling Module ESM and the ERROR pin is asserted without having the option to restrict any of these actions In case of an inadvertently altered state of one of the cores of the MCP the error will be detected by the Core Comparator Module CCM 107 108 App
127. safety concept 2 1 3 Avionics Certification DO 178C and DO 254 The significance of the correct operation of avionic systems due to the conse quences of the hazards that they may cause in a potential malfunction makes the avionic certification a complex process with stricter criteria than the aforemen tioned standards The DO 178B and DO 254 are documents that provide guidelines for dealing with respectively software and hardware in the process of avionic systems development These two documents are written by people with formal software expertise and are quite similar Initially the hardware was not requiring the strict standards as the software did The evolution in the field of aviation revealed the need for a set of rules that have to be followed in the hardware development life cycle as well because the majority of the modern avionic systems consist of both hardware and software components 45 The series of the DO 178 DO 178A DO 178B DO 178C and DO 254 were devel oped by the commercial avionics community in cooperation with RTCA a Federal Advisory Committee in an attempt to establish common guidelines for system de velopers and government certifying agencies for the certification of avionic prod 12 Chapter 2 Functional Safety and Certification in Embedded Systems ucts It is worth to note at this point what is certified in the avionics world The Federal Aviation Administration FAA the national aviation a
128. sisting of the 1001D controller Lockstep architecture and a signal that is sent bidirectional between the controllers to no tify that a fault occurred maintaining the system availability Two main advantages are pointed out by this implementation It is simpler than the 2003 architecture and could be less susceptible to random failures while the system consists of fewer controllers Two out of Two with Diagnostics 2002 Figure 2 5 One out of One with Diagnostics 1001D Two out of Two 2002 Architectures 46 2 1 Functional Safety and Fault Tolerance 17 2 1 5 Texas Instruments Approach To Functional Safety One of the initial tasks in functional safety development is a hazard and risk analy sis of the target application Depending on this assessment the level or the accept able risk is determined regarding the SILs as already discussed in Section 2 1 1 Then the process of software hardware and system development must comply with the requirements of the derived SIL in order to handle efficiently systematic failures Hardware and systems can be assessed via quantitative metrics to reveal the likelihood of failure per hour and failure fraction in the final application prod uct 30 A SIL is only achieved by evaluating the whole system and at component level we could only say that a component is developed according to SILx or this com ponent is suitable for use in SILx systems The derived SIL of a system c
129. t a functional interference analysis to identify all the interference channels between the software hosted on the cores of the MCP There are no interference channels between the two cores as the system behave as a single core The secondary core is truly only a checker for the first It is not connected to the data instruction buses directly It will still do everything that the primary core does in regard to instantiating reads and writes but only the reads and writes from the primary core will reach the rest of the system The applicant shall design implement and verify a means of mitigation for each of those interference channels Shared Memory and Cache The applicant shall state in their software plans whether or not they intend to use shared memory between the processing cores The applicant shall describe in their software plans the means they intend to use to control access to shared memory locations and to prevent the disruptions to deterministic software execution caused by problems such as race conditions data starvation deadlocks or live locks If the applicant uses shared memory between the processing cores the applicant shall test the means that they have designed to control the access to shared memory The application is not able to handle the shared memory Both of the cores share the same address space RAM and FLASH simultaneously The read write operations are performed by the master core The c
130. t it is a di agnostic for core level error detection Both soft and hard errors are detected or even corrected depending on each implementation that occur by exposure of the system to high radiation levels or a malfunction that occurred during the manu facturing process Furthermore the main available literature that is not strongly dependent on the Hercules family of Texas Instrument was derived from the patents of other semi conductor companies The demonstration of alternative methods either in terms of hardware architecture or in error detection correction mechanisms reveal the two main challenges in such a system e Synchronization of two cores Different methods are utilized for the efficient in Lockstep operation of the two cores In case of running out of sync erro neous fault detection would be signaled by the comparator module e System availability Some systems require the continuation of the application execution even when an error is detected Due to the high criticality of the executed function methods were employed to continue the operation of the system after a fault detection by the Lockstep architecture While the current project is strongly related to safety critical aspect of embedded systems it was necessary to include a chapter introducing the reader to the termi nology around safety Moreover a brief overview of the functional safety related certification standards and processes was crucial while the final pr
131. te operation For this purpose the error blocking unit is employed and with this architecture write transactions can be blocked or totally be aborted The corruption of the subsequent stages of the system can also be prevented by operating these faulty write transaction to known memory ranges while part of the system will still be functional The invention suggests different implementations of the delay unit2 instead of de laying all the signals only the write operations may be delayed or only a category of write transactions i e write operations to peripherals and to a memory attached to a shared internal bus 6 3 Method and system for fault containment Infineon Technologies AG 59 60 receiving an input signal by a first CPU n 61 e4 feeding the input signal by a first delay unit to a second CPU E 62 delaying at least one output signal _ of the first CPU related to write transactions of the first CPU by a second delay unit comparing at least one output signal of the first CPU with at least one output signal of the second CPU by a comparator determining if an error occurred _ during processing the input signal by the first or the second CPU to joutput a corresponding error signal using the error signal by an error _ blocking unit to modify a write transaction of the first CPU such that the error is kept from propagating through the system Figure 6 4 Flow diagram of the Infineon
132. terconnection diagram of CCM R4F 35 onana aaa aaa 43 Compare Match Test vectors 42 oona 44 Compare Mismatch Test Vectors 42 noo 45 Example of orientation of the cores 35 ooon 46 ESM Block Diagram 42 ooa KIB EE EES 48 Block Diagram of CCM R5F 40 oonan aaa 49 List of the signals that must be inactive in Checker CPU 40 50 Dial VIM fot Safety 40 rre i ok ae e oe eel ae 4 51 Prior art and Texas Instruments invention 29 54 Error signaling Interface in a lockstep architecture by Freescale 47 56 Prior art delayed lockstep implementation and Infineon s enhance ment 2r 2 as ate Sota a Sheedy tee tes Gu 2a Meteo ed 58 ix 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6 11 6 12 6 13 7 1 7 2 7 3 7 4 7 5 7 6 7 7 7 8 7 9 A 1 A 2 Fl F 2 F 3 F4 Flow diagram of the Infineon s invention for avoiding error propa gation in multi CPU systems 2 wey he ee Pes See eS 59 Two extra comparator units for comparison of the error and the negated error signals 2 Oo aad Gk See Sled See Se ER 60 Prior art for prevention of loss of lockstep 23 61 Pair of processors in lockstep mode connected to a lockstep logic 23 62 Architectural details for recovery from loss of lockstep 23 62 Flow diagram for recovery from loss of lockstep hot standby processor 23 inca aa se Oo ae ey aes he Sale 63 Schematic diagram of the Infineon s patent introd
133. tic behavior of the software hosted on the MCP has designed Not Applicable Not Applicable Not Applicable Not Applicable Not Applicable Not Applicable The applicant shall implement and test means to deactivate those features or mitigate their effects Not Applicable 111 112 Appendix G Requirements for Multi core Processors REQ44 The applicant shall inform the certification Not Applicable authority of the features and the means of deactivation or mitigation in the applicable AEH or software plans or subsequent documentation if the problems are discovered at a later stage Error Detection and Handling and Safety Nets REQ44 The applicant shall identify the various types of Soft and Hard errors i e a bit or a logic gate flip in the core errors or failures that may occur within the MCP or cause by Electrostatic Discharge ESD hit noise glitch the software hosted upon it cosmic radiation radioactive decay power supply variations ionizing radiation cross talking manufacturing inconsistencies or exposure to high current which may cause metal migration All other kinds of errors have to be handled as ona single core The applicant shall plan design implement and _ n case of soft errors the application can recover after a verify means including a safety net that is system reset while the duration of such an error is not more external to the MCP by which to detect and than 2 mill
134. ucing the state butter P27 arae eo Ae ee fie eis Be MB ee eho 64 Flow diagram where the save state is triggered on a task start 27 65 A computer system including four processors with the suggested architecture by Hewlett Packard 1 0 66 Flow graph of a potential scenario using the current invention 1 67 Entering Error forcing mode by pressing the pushbutton 71 CPU Lockstep Test Time Measurement 73 CPU Error Forcing Test Time Measurement 74 CPU Self Test Error Forcing Time Measurement 74 Temperature and Acceleration Values in HSK Monitor GUI 75 Lockstep Compare Run time Fault Injection 76 Dataflow of C amp M Device In a Fault Injection Operation 33 77 Dataflow of SDUT Device In a Fault Injection Operation 33 77 SafeTI CCM R4F Related Functions Time in usec 20 78 CCMSE Reseister 42 3 2 2 pis aioa doi Boe ea A ae 89 COCMKEN RResister A2 T ice otek oan svete ab Oa dees 90 Enable RTI GIO Drivers once we akong a eas Be ee aR we S 103 Configure RTI Compare 0 to generate 500 usec delay 104 Activation of RTI Compare 0 and GIO Int Ain VIM 104 Activation of ESM Low Interrupt in VIM Module 105 List of Abbrevations 1oo1D ADC ASIL BIST CAN CandM ECC LBIST PBIST STC CCMKEYR CCMSR CCS CEC CEH CEO CIS COTS DMA DPM One out of One with Diagnostics Analog to Digital Convert
135. uration Screenshots SREP KORY CARAS oeoaaas Compare 0 Peri Update Compare 0 Compare 0 5000000 i a Actual Period ms 10 000000000 500 000 Compare 1 Period 5 000 Update Compare 1 50000 Compare 1 50000 poner mt Sors Actual Period ms unter O Courter E Ho 000000000 5 000 Interrupt DMA Compare 2 Period 8 000 Update Compare 2 80000 Compare 2 80000 Counter Ope ues i Actual Period ms Comer 1 __ 10 000000000 8 000 Compare 3 Period 10 000 Update Compare 2 100000 Compare 2 100000 Comp 3 Source At Maen masua saa oe Qea fo VIM Channel 0 31 Configuration Interupt Assignment CHANMAPO CHANMAP31 0 ESM High o ig m C m tS ie 1 Reserved o o H O O pi 2 RTI Compare 0 2 E palia 3 RTI Compare 1 g g ma B a me eet ind 4 RTI Compare 2 a a ef es 5 RTI Compare 3 B oH A TEDT 6 RTI Overflow 0 6 A F cee 7 RTI Overfow 1 Poo A A oe ca 8 RTI Timebase 1 g 0 F AAS 9 GI0 tA g E ane Figure F 3 Activation of RTI Compare 0 and GIO Int A in VIM 105 aSdG saa eaealieleoaads TMDXS570LS31HDK N TI
136. urces list replace all http ubuntu com with http old releases ubuntu com in that file save and exit Run following command to update package list It should not fail or else something has been wrong sudo apt get update Install common tools and libraries Install common tools and libraries needed by build process with below command sudo apt get install sudo apt get install apt src p zip full gawk gzip perl autoconf m4 automake libtool libncurses5 dev gettext gperf dejagnu expect tel autogen guile 1 6 flex flip bison tofrodos E 1 Build GNU Tools on Ubuntu 8 10 101 texinfo g gcc multilib libgmp3 dev libmpfr deo debhelper texlive texlive extra utils Note that the package management software might complain that several packages cannot be installed properly while installing texlive and texliveextra utils It won t harm our building process please just ignore it now Some of those tools might be unnecessary but it won t hurt if installed Download and deploy prebuilt native tools In order to save effort to prepare the native build tools we provide prebuilt ones at website https launchpad net gcc arm embedded misc native build tools 20140701 The related source package and script are also provided Please download the tool and decompress it to a proper place it will be used in subsequent steps to build
137. uthority of USA is responsible for ensuring that all air travel is safe There is a commonly erroneous notion about what is certifiable in the avionics world The FAA does not specifically certify software or hardware but systems The way and the standard that a system will be certified varies The required certificate depends on the product and its usability e Technical Standard Order Authorization TSOA Is used to approve that a specific device i e GPWS Radio Weather Radar can be used on a different aircraft e Type Certificate TC Certifies systems for new aircraft e Supplemental Type Certificate STC Certifies changes of design not requir ing anew TC The real intention of the certification process and the DO documents is to help the industry to prove that each software or hardware product meets its intended functionality The terminology that is used in the field of avionics reveals the different levels of compliance for a system 45 1 Certified is an entire system in which each component may have different certification level 2 Certifiable is a component of a system that achieves its higher certification status before certifying it within the whole system 3 Qualified is used in terms of a tool Since it is not participating in the fly operations does not require to be certified 4 Compliant is a certified system from a different entity than FAA i e mili tary or non commercia
138. utomatically disabled In order to set the Lockstep mode on and the CCM R4F in operational state a CPU reset is required 42 5 4 8 Error Signaling Module ESM A brief overview of the Error Signaling Module is required in this chapter as it is strongly related to the CCM R4F module The ESM collects and categorize the errors that are derived by the hardware diagnostic tools regarding their severity Low priority interrupt high priority interrupt or an external pin action are poten tial responses to the hardware detected errors depending on their severity level The severity levels are divided into 3 groups supporting up to 128 error channels in total 42 Group No of Channels Severity Level Interrupt Generation ERROR pin Behavior Group 1 64 Low Severity Configurable Configurable Group 2 32 High Severity Predefined Predefined Group 3 32 High Severity No Interrupt Generation Predefined Table 5 1 Groups of errors in ESM The errors belonging to Group 1 indicating low severity have a configurable in terrupt response and ERROR pin behavior The Group 2 are considered as high severity errors and an interrupt is always generated as well as the output on the ERROR pin Finally the detection of the errors in Group 3 are high severity errors that have already generated a CPU abort response and therefore they do not gen erate an interrupt However they always generate an ERROR pin output 48 C
139. void main void x USER CODE BEGIN 3 Initialize ESM driver esmInit Enable the Interrupts _enable_interrupt_ Initialize the flag that indicates if the button is pressed buttonPressed BUTTON_LOW Initialize the flag that indicates a presence of an error uint32 uiError ERROR_ABSENCE x Initialize RTI driver x rtilnit Set high end timer GIO port hetPort pin direction to all output gioSetDirection hetPORT1 OxFFFFFFFF Enable RTI Compare 0 interrupt notification rtiEnableNotification rtiNOTIFICATION_COMPAREO x Enable GIO driver giolnit while 1 93 Enable Notification on the 7th pin of GPIO PortA that is assigned to the programmable pushbutton gioEnableNotification gioPORTA BUTTON_BIT_NO switch buttonPressed case 1 if uiError else Clears the ERROR pin esmClearErrPin Turns the white LED off gioSetBit hetPORT1 LED_BIT_NO LED_OFF Disables the button notification gioDisableNotification gioPORTA BUTTON_BIT_NO Resets the error indicator flag uiError ERROR_ABSENCE Turns the white LED on gioSetBit hetPORT1 LED_BIT_NO LED_ON Sets the CCM R4 Module in Error forcing Mode CCM_R4_Compare Disables the button notification gioDisableNotification gioPORTA BUTTON_BIT_NO Sets the error indicator flag uiError ERRO_PRESE
140. w Approach To Vehicle Network Design And Automotive Safety Available http www automotive eetimes com en autosar and iso26262 a new approach to vehicle network design and automotive safety html Accessed 21 06 2015 KranZberg DE Bernard Fuessl MooSburg Aich DE Rainer Troppmann Delayed Lock step Cpu Compare Pat US 2008 0244305 A1 2008 K Greb S Cozart A Longhurst An Introduction to Software Development for Functional Safety on TI Processors User s Manual Texas Instruments B Arends P Metz B Enser S Kriso C Temple Functional Safety in accordance with ISO 26262 User s Manual ZVEI German Electrical and Electronic Man ufacturers Association e V SafeTI System Design Packages for Functional Manual User s Manual Texas In struments 2014 SafetTI Hitex Safety Kit HSK User s Manual v1 2 hitex Safety Integrity Level Quick Guide Sorinc Bibliography 115 35 36 37 38 39 40 41 42 43 44 45 46 47 Safety Manual for TMS570LS31x and TMS570LS21x Hercules ARM Based Safety Critical Microcontrollers User s Guide Texas Instruments November 2014 Revised March 2015 W Rothermel G Tingting Yu Srisa an Testing Inter layer and Inter task In teractions in RTES Applications In Software Engineering Conference APSEC 2010 17th Asia Pacific Neil Storey Safety Critical Computer Systems Addison Wesle Longman 1996 TM
141. ware based hypervisor in their MCP The applicant shall describe for each part of the Not Applicable functionality of the hypervisor whether they intend to activate it deactivate it or mitigate any undesirable behavior it may cause If the applicant intends to use a software Not Applicable hypervisor the applicant shall state in the software plans how they intend to show compliance of the software hypervisor with the certification authority s applicable guidance and has successfully conducted those activities that they planned The applicant shall describe in their plans any Not Applicable hardware features used by the software ie hypervisor The applicant shall verify any hardware features Not Applicable used by the software hypervisor If the applicant intends to use an MCP hardware Not Applicable based hypervisor the applicant shall describe in their AEH plans or other deliverable documents how they intend to verify the activated functionality of the MCP hardware based hypervisor The applicant shall verify any hardware features Not Applicable used by the hardware hypervisor The applicant shall identify any parts ofthe MCP Not Applicable hardware based hypervisor that are deactivated The applicant shall verify that any parts of the Not Applicable MCP hardware based hypervisor that are deactivated have been deactivated MCP Interference Channel 109 REQ18 The applicant shall conduc
142. with a known pattern and then a different pattern through the memory is marched This algorithm detects the following types of errors e Address decoder faults e Stuck At faults e Coupled faults e State coupling faults e Parametric faults e Write recovery faults e Read Write logic faults 2 Map Column Line sensitivities of the memory array are detected by the Map Column algo rithm The first row is filled with all 0s the second with all 1s and repeated through the whole array in this way Afterwards the values of each column is read on consecutive cycles and the pattern is inverted to proceed with the complementary check This algorithm is targeting to the detection of the following errors e Leakage due to a low resist path in a bit e An Open in the bit cell e Leakage on a BIT or BITN line e Miss balance in the sense amp e Leakage in the sense 34 Chapter 4 Safety Features of Hercules Family e High resist in the sense amp e Failure of the pre charge circuits after read operations 3 Pre Charge A similar algorithm to Map Column is employed to insure the correct pre charge capability of the SRAM array The same pattern with Os and 1s is used in this algorithm but a write operation is performed between two reads to force the pre charge circuits in the array to the worst case conditions A failure occurs when an increasing frequency of the system approaches the minimum access time of the SRAM array at this boundary
143. ys contin uously these values as shown in Figure 7 5 HSK Monitor amp Temperature Acceleration X Y z Figure 7 5 Temperature and Acceleration Values in HSK Monitor GUI This part of the experiment is targeting to demonstrate the basic feature of the HSK that is the run time injection of errors In Validating amp Profiling tab when Lock Step Compare is selected and the user presses the INJECT button an er ror is injected from the C amp M MCU to the SDUT MCU see Section 3 1 2 The error detection time of the injected error is displayed in the right bottom corner of the HSK Monitor GUI and in the measurement that is depicted at Figure 7 6 is 17 usec 76 Chapter 7 Results Fault parameter 0x00000000 Figure 7 6 Lockstep Compare Run time Fault Injection 7 1 3 Conclusions The first experiment is a demonstration of setting the CCM R4F module in Error Forcing mode The fault injection forcing is the only mechanism to make an error occur purposefully and reliably In order to conduct more sophisticated ex periments it would require to create a hard fault in the silicon by disassembling the MCU and create a hard fault in To create a soft error it would be possible by sub jecting the MCU to a concentrated amount of radiation alpha and beta particles without ensuring that it would create only a core compare error without affecting other element
144. ystem or a virtual machine The command lines provided in this document are all using user id build as an example so please create a new user called build in the system Otherwise you have to replace user id build with your own one E 1 2 Tune environment and install required software Change bin sh to bash Some shell scripts in gcc and other packages are incompatible with the dash shell which is the default bin sh for Ubuntu 8 10 You must make bin sh a symbolic link to one of the supported shells saying bash Here on Ubuntu 8 10 system this can be done by running following command firstly sudo dpkg reconfigure plow dash Then choose No in the Configuring dash popup dialog and press enter You can run following command and check that bin sh points to bash Is l bin sh 99 100 Appendix E How to Build the GNU ARM Toolchain on Ubuntu ies bin sh gt bash Change software sources to Main server On Ubuntu 8 10 system click System gt Administration gt Software Sources to open Software Sources dialog choose Main server in Download from list box then click close You will be prompted by a window saying The information about available soft ware is out of date please click Reload And then there will be a warning mes sage box popped up which can be just ignored by clicking Close Edit the file using command line sudo vi etc apt so

Download Pdf Manuals

image

Related Search

Related Contents

PROGRAMA CURSOS - Ingenieros de Caminos, Canales y Puertos  Mode d`emploi Gebrauchsanweisung Istruzioni per l  Manual de instalación y funcionamiento  Bria iPhone Edition User Guide    Sistemas SYNCHRON CX CA Ficha de Informação Química Cálcio  CONCEPTRONIC CHD2WLANU  ソフィー ウォッシャブルタイプ チェーン式 取扱説明書    自己表現塾 - 岐阜県パーソナル・サポート・センター  

Copyright © All rights reserved.
Failed to retrieve file