    @         1 Ensuring Proper Use of Interlocked Memory Instructions  D               The Alpha Architecture Reference Manual, Third EditionH               (AARM) describes strict rules for using interlocked memoryG               instructions. The forthcoming Alpha 21264 (EV6) processor E               and all future Alpha processors are more stringent than F               their predecessors in their requirement that these rulesH               be followed. As a result, code that has worked in the pastI               despite noncompliance may now fail when executed on systems E               featuring the new 21264 processor. Occurrences of these G               noncompliant code sequences are believed to be rare. Note H               that the 21264 processor is not supported prior to OpenVMS"               Alpha Version 7.1-2.  A               The result can be a loss of synchronization between A               processors when interprocessor locks are used or an F               infinite loop when an interlocked sequence always fails.B               This has occurred in some code sequences in programsB               compiled on old versions of the BLISS compiler, some@               versions of the MACRO-32 compiler and the MACRO-64<               assembler, and in some DEC C and C++ programs.  G               The affected code sequences use LDx_L/STx_C instructions, D               either directly in assembly language source or in codeB               generated by a compiler. Applications most likely toE               use interlocked instructions are complex, multithreaded D               applications or device drivers using highly optimized,B               hand-crafted locking and synchronization techniques.           2 Required Action   E               OpenVMS recommends that code that will run on the 21264 B               processor be checked for these sequences. ParticularI               attention should be paid to any code that does interprocess G               locking, multithreading, or interprocessor communication.   B               The SRM_CHECK tool (named after the System ReferenceD               Manual, which defines the Alpha architecture) has beenE               developed to analyze Alpha executables for noncompliant E               code sequences. The tool will detect sequences that may F               fail, report any errors, and display the machine code of#               the failing sequence.       I                                                                         1             (         2.1 Using the Code Analysis Tool  H               The SRM_CHECK tool is located in the following location on?               both of the OpenVMS Freeware V4.0 CD-ROM volumes:   %               [000TOOLS]SRM_CHECK.EXE   E               To run the SRM_CHECK tool, define it as a foreign image I               (or use the DCL$PATH mechanism) and invoke it with the name G               of the image to check. If a problem is found, the machine G               code will be displayed and some image information will be G               printed. The following example illustrates how to use the :               tool to analyze an image called myimage.exe:  "               $ define DCL$PATH []%               $ srm_check myimage.exe   D               The tool supports wildcard searches. Use the following9               command line to initiate a wildcard search:   &               $ srm_check [*...]* -log  G               Use the -log qualifier to generate a list of which images B               have been checked. The -output qualifier can be usedE               to write the output to a data file, as in the following <               example that writes to a file named check.dat.  2               $ srm_check 'file' -output check.dat  E               The output from the tool can be used to find the module C               that generated the sequence by looking in the image's F               MAP file. The addresses shown correspond directly to the:               addresses that can be found in the MAP file.  I               The following example illustrates the output from using the H               analysis tool on an image named system_synchronization.exe                  	         2           L                ** Potential Alpha Architecture Violation(s) found in file...5                ** Found an unexpected ldq at 00003618 E                0000360C   AD970130     ldq_l          R12, 0x130(R23) C                00003610   4596000A     and            R12, R22, R10 C                00003614   F5400006     bne            R10, 00003630 @                00003618   A54B0000     ldq            R10, (R11)4                Image Name:    SYSTEM_SYNCHRONIZATION!                Image Ident:   X-3 5                Link Time:      5-NOV-1998 22:55:58.10 +                Build Ident:   X6P7-SSB-0000 !                Header Size:   584 M                Image Section: 0, vbn: 3, va: 0x0, flags: RESIDENT EXE (0x880)   F               The MAP file for system_synchronization.exe contains the               following:  N     EXEC$NONPAGED_CODE       00000000 0000B317 0000B318 (      45848.) 2 **  5K     SMPROUT         	     00000000 000047BB 000047BC (      18364.) 2 **  5 N     SMPINITIAL               000047C0 000061E7 00001A28 (       6696.) 2 **  5  G               The address 360C is in the SMPROUT module (which contains H               the addresses from 0-47BB). By looking at the machine codeI               output from the module, you can locate the code and use the F               listing line number to identify the corresponding sourceH               code. If SMPROUT had a nonzero base, it would be necessaryG               to subtract the base from the address (360C in this case) ?               to find the relative address in the listing file.   D               Note that the tool reports potential violations in itsE               output. Although SRM_CHECK can normally identify a code D               section in an image by the section's attributes, it isG               possible for OpenVMS images to contain data sections with I               those same attributes. As a result, SRM_CHECK may scan data G               as if it were code, and occasionally, a block of data may H               look like a noncompliant code sequence. This has also beenG               found to be quite rare. This circumstance can be detected G               in the same way the noncompliant source code is found, by 2               examining the MAP and listing files.              I                                                                         3             .         3 Characteristics of Noncompliant Code  G               The areas of noncompliance detected by the SRM_CHECK tool H               can be grouped into the following four categories. Most ofF               these can be fixed by recompiling with new compilers. InF               rare cases, the source code may need to be modified. See@               Section 5 for information about compiler versions.  =               o  Some versions of OpenVMS compilers introduce C                  noncompliant code sequences during an optimization A                  called "loop rotation." This problem can only be E                  triggered in C or C++ programs which use LDx_L/STx_C H                  instructions in assembly language code that is embeddedB                  in the C/C++ source using the ASM function, or inF                  assembly language written in MACRO-32 or MACRO-64. InF                  some cases, a branch was introduced between the LDx_L(                  and STx_C instructions.  6                  This can be addressed by recompiling.  D               o  Some code compiled with very old Bliss and MACRO-32D                  compilers may contain noncompliant sequences. EarlyH                  versions of these compilers contained a code schedulingI                  bug where a load was incorrectly scheduled after a load_                   locked.  6                  This can be addressed by recompiling.  G               o  The MACRO-32 compiler may generate a noncompliant code H                  sequence for a BBSSI or BBCCI instruction in rare cases8                  where there are too few free registers.  6                  This can be addressed by recompiling.  G               o  Incorrectly coded MACRO-64 or MACRO-32 and incorrectly D                  coded assembly language embedded in C or C++ source(                  using the ASM function.  D                  This requires source code changes. The new MACRO-32F                  compiler will flag noncompliant code at compile time.  F               If the SRM_CHECK tool finds a violation in an image, theF               image should be recompiled with the appropriate compilerE               (see Section 5). After recompiling, the image should be E               analyzed again. If violations remain after recompiling, D               source code must be examined to determine why the codeG               scheduling violation exists. Modifications should then be &               made to the source code.  	         4                      4 Coding Requirements   F               The Alpha Architecture Reference Manual describes how anF               atomic update of data between processors must be formed.G               The Third Edition, in particular, has expanded greatly on G               this topic. In this edition, Section 5.5, "Data Sharing", H               and Section 4.2.4, which describes the LDx_L instructions,H               detail the conventions of the interlocked memory sequence.  H               The following two requirements are the source of all known                noncompliant code:  C               o  There cannot be a memory operation (load or store) A                  between the LDx_L (load locked) and STx_C (store E                  conditional) instructions in an interlocked sequence   E               o  There cannot be a branch taken between a LDx_L and a I                  STx_C instruction. Rather, execution must "fall through" E                  from the LDx_L to the STx_C without taking a branch.   H                  Any branch whose target is between a LDx_L and matchingD                  STx_C creates a noncompliant sequence. For example,D                  any branch to "label" in the following would result@                  in noncompliant code, regardless of whether theG                  branch instruction itself was within or outside of the                   sequence:  1                                  LDx_L  Rx, n(Ry) $                                  ...$                           label: ...1                                  STx_C  Rx, n(Ry)   D               Therefore, the SRM_CHECK tool looks for the following:  E               o  Any memory operation (LDx/STx) between a LDx_L and a                   STx_C.   G               o  Any branch which has a destination between a LDx_L and                   STx_C.   F               o  STx_C instructions that do not have a preceding LDx_L                  instruction.   I                  This typically indicates that a backward branch is taken E                  from a LDx_L to the STx_C. Note that hardware device H                  drivers that do device mailbox writes are an exception,E                  and use the STx_C to write the mailbox. This is only C                  found on early Alpha systems, and not on PCI based                   systems.   I                                                                         5             A               o  Excessive instructions between a LDx_L and STxC.   F                  The AARM recommends that no more than 40 instructionsB                  appear between a LDx_l and STx_c. In theory, moreF                  than 40 instructions can cause hardware interrupts toF                  keep the sequence from completing. There are no known%                  occurrences of this.   G               To illustrate, the following are examples of code flagged                by SRM_CHECK.   <                       ** Found an unexpected ldq at 0008291CF                       00082914   AC300000     ldq_l          R1, (R16)L                       00082918   2284FFEC     lda            R20, 0xFFEC(R4)J                       0008291C   A6A20038     ldq            R21, 0x38(R2)  G               In the above example, a LDQ instruction was found after a H               LDQ_L before the matching STQ_C. The LDQ must be moved outF               of the sequence, either by recompiling or by source code'               changes. (See Section 3.)   N               ** Backward branch from 000405B0 to a STx_C sequence at 0004059CB               00040598   C3E00003     br             R31, 000405A8A               0004059C   47F20400     bis            R31, R18, R0 >               000405A0   B8100000     stl_c          R0, (R16)A               000405A4   F4000003     bne            R0, 000405B4 >               000405A8   A8300000     ldl_l          R1, (R16)@               000405AC   40310DA0     cmple          R1, R17, R0A               000405B0   F41FFFFA     bne            R0, 0004059C   G               In the above example, a branch was discovered between the G               LDL_L and STQ_C. In this case, there is no "fall through" F               path between the LDx_L and STx_C, which the architecture               requires.   F                 ________________________ Note ________________________  C                 This branch backward from the LDx_L to the STx_C is E                 characteristic of the noncompliant code introduced by 1                 the "loop rotation" optimization.   F                 ______________________________________________________  B               The following MACRO-32 source code demonstrates codeC               where there is a fall through path, but that is still H               noncompliant because of the potential branch, AND a memory-               reference in the lock sequence.   	         6         D             getlck: evax_ldql  r0, lockdata(r8)  ; get the lock dataH                     movl       index, r2         ; and the current indexG                     tstl       r0                ; If the lock is zero, F                     beql       is_clear          ; skip ahead to storeJ                     movl       r3, r2            ; Else, set special index             is_clear: G                     incl       r0                ; increment lock count ?                     evax_stqc  r0, lockdata(r8)  ; and store it E                     tstl       r0                ; did store succeed? ?                     beql       getlck            ; retry if not   G               To correct this code, the memory access to read the value B               of INDEX must first be moved outside the LDQ_L/STQ_CE               sequence. Next, the branch between the LDQ_L and STQ_C, F               to the label IS_CLEAR, must be eliminated. In this case,E               it could be done using a CMOVEQ instruction. The CMOVxx I               instructions are frequently useful for eliminating branches H               around simple value moves. The following example shows the               corrected code.   I                      movl       index, r2         ; Get the current index J              getlck: evax_ldql  r0, lockdata(r8)  ; and then the lock dataN                      evax_cmoveq r0, r3, r2       ; If zero, use special indexH                      incl       r0                ; increment lock count@                      evax_stqc  r0, lockdata(r8)  ; and store itF                      tstl       r0                ; did write succeed?@                      beql       getlck            ; retry if not           5 Compiler Versions   A               This section contains information about versions of E               compilers that may generate noncompliant code sequences G               and the recommended versions to be used when recompiling.   A               Table 1 contains information for OpenVMS compilers.   I               Table_1_OpenVMS_Compilers__________________________________   I               Old_Version___________Recommended_Minimum_Version__________   .               BLISS V1.1            Bliss V1.3  I                                                  (continued on next page)     I                                                                         7             I               Table_1_(Cont.)_OpenVMS_Compilers__________________________   I               Old_Version___________Recommended_Minimum_Version__________   .               DEC C V5.x            DEC C V6.0  4               DEC C++ V5.x          DIGITAL C++ V6.0  B               MACRO-32 V3.0         V3.1 for OpenVMS Version 7.1-2@                                     V4.1 for OpenVMS Version 7.2  I               MACRO-64_V1.2_________See_below____________________________   B               Current versions of the MACRO-64 Assembler may stillB               encounter the loop rotation issue. However, MACRO-64E               does not perform code optimization by default, and this E               problem can only arise when optimization is enabled. If E               SRM_CHECK indicates a noncompliant sequence in MACRO-64 G               code, it should first be recompiled without optimization. H               If the sequence is still flagged when retested, the sourceF               code itself contains a noncompliant sequence and must be               corrected.                                              	         8 