================================================================================
DISCLAIMER:

This document makes reference to products developed by Intel. This section
highlights restrictions on how these products may be used, and what information
may be disclosed to others. Contact your Intel field representative for more
information.

Intel is making no claims of usability, efficacy or warranty. The end-user
license agreement contained herein completely defines the license and use of
this software except in the cases of the GPL components. This document contains
information on products in the design phase of development. The information here
is subject to change without notice. Do not finalize a design with this
information.

The code contained in these modules may be specific to the Intel(R) Xeon Phi(TM)
processor, and is not backward compatible with other Intel(R) products.
Additionally, Intel makes no commitments for support of the code or instruction
set in future products.

*Other names and brands may be claimed as the property of others.

Copyright(c) 2017, Intel Corporation. All rights reserved.
================================================================================
Intel(R) Xeon Phi(TM) processor software readme.

Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S.
and other countries.

Notes:
  o Consult the Intel(R) Xeon Phi(TM) processor software User's Guide
    (xpps_users_guide.pdf) for detailed configuration information and
    procedures.
  o In this document, lines preceded by:
    - "[host]$" denote a command entered on the host with user privileges.
    - "[host]#" denote a command entered on the host with root privileges.

================================================================================
  Table of Contents
================================================================================

  1.  Introduction
  2.  Hardware and software requirements
  3.  Installation
  4.  Virtualization

================================================================================
  1.  Introduction
================================================================================

Intel(R) Xeon Phi(TM) processor software is a set of software and utilities
that enable functionalities of the Intel(R) Xeon Phi(TM) processor. This
document provides quick references to hardware and software requirements and
installation instructions.

================================================================================
  2.  Hardware and software requirements
================================================================================

This document pertains to systems containing at least one Intel(R) Xeon Phi(TM)
processor.

Note: Some packages that will be installed require access to the standard
      distribution packages and repositories. If you disabled any of the
      standard repositories please consider re-enabling them to prevent failed
      dependency issues. To get more information please refer to the
      information provided by your operating system vendor.

  2.1  Supported operating systems
 ______________________________________________________________________________
|           Supported Host OS Version       |         Kernel Version           |
|-------------------------------------------|----------------------------------|
| CentOS* Linux* 7 (1708)                   | 3.10.0-693.5.2.el7.x86_64        |
|-------------------------------------------|----------------------------------|
| Red Hat* Enterprise Linux* Server 7.4     | 3.10.0-693.5.2.el7.x86_64        |
|-------------------------------------------|----------------------------------|
| SUSE* Linux* Enterprise Server 12 SP3     | 4.4.73-5-default                 |
|-------------------------------------------|----------------------------------|
| Ubuntu* 17.10                             | 4.13.0-16-generic                |
|___________________________________________|__________________________________|

Note: If your host runs Red Hat* Enterprise Linux* Server 7.4 or CentOS* Linux*
      7 (1708), ensure your OS kernel was updated to version
      3.10.0-693.2.1.el7.x86_64 or later which contains critical errata:
      https://access.redhat.com/errata/RHBA-2017:2581

       To obtain the host's running kernel version, execute:

        [host]$ uname -r

  2.2  Root access

       The installation process requires root access. Verify that you have such
       privileges to the machines you will configure.

       The use of sudo to acquire root privileges should be done carefully
       because its use may cause subtle and undesirable side effects. Sudo might
       not retain the non-root environment of the caller. This could, for
       example, result in use of a different PATH variable than expected, ending
       up with execution of the wrong code.

       When su is used to become root, the non-root environment is mostly
       retained. To retain HOME, SHELL, USER, and LOGNAME, use su with the -m
       switch. See the su man page for details.

  2.3  Distribution packages replacement

       Packages included in Intel(R) Xeon Phi(TM) processor software enable core
       functionalities of the processor. If packages listed below are already
       installed on your host OS but do not provide sufficient support for the
       processor, they will be replaced.

         o cpuid (delivered for SLES*)
         o hwloc (delivered for RHEL* and CentOS*)
         o memkind
         o micperf
         o sysdiag
         o zonesort

================================================================================
  3.  Installation
================================================================================

  3.1  Get the Intel(R) Xeon Phi(TM) processor software distribution package

       The latest Intel(R) Xeon Phi(TM) processor software distribution can
       be downloaded from https://premiersupportscft.intel.com/. Download the
       appropriate package for your operating system (RHEL* and CentOS* share a
       release package).

       Extract the downloaded package:

       [host]$ tar xvf xpps-<release>-<os>.tar

       Change to the directory containing the extracted packages:

       [host]$ cd xpps-<release>/<os_version>/packages/x86_64/core

  3.2  Installation process
--------------------------------------------------------------------------------
       RHEL*:
--------------------------------------------------------------------------------
       o Installing memkind:

         [host]# yum install memkind-<version>-<release>.x86_64.rpm \
         memkind-devel-<version>-<release>.x86_64.rpm

       o Installing sysdiag:

         [host]# yum install sysdiag-<version>-<release>.x86_64.rpm

       o Installing micperf:

         [host]# yum install micperf-<version>-<release>.x86_64.rpm

       o Installing hwloc:

Note: Enable rhel-7-server-optional-rpms repository to install the hwloc-devel
      package. For more information consult your OS documentation.

         [host]# yum install hwloc-<version>-<release>.x86_64.rpm \
         hwloc-devel-<version>-<release>.x86_64.rpm \
         hwloc-gui-<version>-<release>.x86_64.rpm \
         hwloc-libs-<version>-<release>.x86_64.rpm \
         hwloc-sbin-<version>-<release>.x86_64.rpm

       o Installing zonesort:

         [host]# yum install kmod-zonesort-<version>-<release>.x86_64.rpm
--------------------------------------------------------------------------------
       CentOS*:
--------------------------------------------------------------------------------
       o Installing memkind:

         [host]# yum install memkind-<version>-<release>.x86_64.rpm \
         memkind-devel-<version>-<release>.x86_64.rpm

       o Installing sysdiag:

         [host]# yum install sysdiag-<version>-<release>.x86_64.rpm

       o Installing micperf:

         [host]# yum install micperf-<version>-<release>.x86_64.rpm

       o Installing hwloc:

         [host]# yum install hwloc-<version>-<release>.x86_64.rpm \
         hwloc-devel-<version>-<release>.x86_64.rpm \
         hwloc-gui-<version>-<release>.x86_64.rpm \
         hwloc-libs-<version>-<release>.x86_64.rpm \
         hwloc-sbin-<version>-<release>.x86_64.rpm

       o Installing zonesort:

         [host]# yum install kmod-zonesort-<version>-<release>.x86_64.rpm
--------------------------------------------------------------------------------
       SLES*:
--------------------------------------------------------------------------------
       o Installing memkind:

         [host]# zypper install memkind-<version>-<release>.x86_64.rpm \
         memkind-devel-<version>-<release>.x86_64.rpm

       o Installing sysdiag:

         [host]# zypper install sysdiag-<version>-<release>.x86_64.rpm

       o Installing micperf:

         [host]# zypper install micperf-<version>-<release>.x86_64.rpm

       o Installing cpuid:

         [host]# zypper install cpuid-<version>-<release>.x86_64.rpm

       o Installing zonesort:

         [host]# zypper install zonesort-kmp-default-<version>\
         <kernel_version>-<release>.x86_64.rpm
--------------------------------------------------------------------------------
       Ubuntu*:
--------------------------------------------------------------------------------
       o Installing memkind:

         [host]# apt install memkind_<version>-<release>_amd64.deb

       o Installing sysdiag:

         [host]# apt install sysdiag_<version>-<release>_amd64.deb

       o Installing micperf:

         [host]# apt install micperf_<version>-<release>_amd64.deb

       o Installing zonesort:

         [host]# apt install zonesort_<version>-<release>_amd64.deb

  3.3  Software upgrade process

       Intel(R) Xeon Phi(TM) processor software supports automated updates.
       Use yum, zypper or apt to perform an update as described above.

  3.4  Software uninstallation process

       Execute the command below to list installed Intel(R) Xeon Phi(TM)
       processor software packages.

       RHEL*/CentOS*/SLES*:  [host]$ rpm -qa | grep +xpps

       Ubuntu*:  [host]$ apt list --installed | grep +xpps

       Uninstall the listed packages.

       RHEL*:  [host]# yum remove <package-name>

       SLES*:  [host]# zypper rm <package-name>

       CentOS*:  [host]# yum remove <package-name>

       Ubuntu*:  [host]# apt remove <package-name>

================================================================================
  4. Virtualization
================================================================================

QEMU version 2.9 or newer is required to fully support virtualization on the
Intel(R) Xeon Phi(TM) processor. Older versions may work, however, they will
only support up to 255 virtual CPUs.

  4.1 QEMU installation remarks

      Please note that some operating systems may provide old naming conventions
      for QEMU binary files to provide backward compatibility. Therefore, if
      QEMU must be used manually, the file /usr/libexec/qemu-kvm should be used
      instead of /usr/bin/qemu-system-x86_64. All functionalities should remain
      unchanged.

      The convention mentioned above can be disregarded if QEMU is used through
      the libvirt API. If your installation of QEMU was upgraded to version 2.9
      or later, you must restart the libvirt daemon with the following command:

          [host]# systemctl restart libvirtd

        RHEL*:

        QEMU version 2.9 is provided by Red Hat* Virtualization packages.
        Installing QEMU from standard repositories may result in installing an
        older version which lacks pass-through support for all available virtual
        CPUs.

        SLES*:

        SLES* provides all components required for virtualization. Install QEMU
        using the command below.

            [host]# zypper install qemu qemu-kvm libvirt

        CentOS*:

        Intel(R) Xeon Phi(TM) processor software provides QEMU in its release
        package. Extract the xpps-virt-<version>-<os>.tar package and install
        the software.

  4.2 Configuration

      To fully support more than 255 virtual CPUs, configure QEMU according to
      the instructions below.

       o Set the "-machine" Qemu option to q35 and the "kernel_irqchip" to
         "split".
       o Set Intel IOMMU with the -device intel-iommu,intremap=on,eim=on option.
       o Ensure the Intel IOMMU driver is present on the host OS.
       o Enable Virtualization in the CPU's BIOS options. You check whether
         it is active with the "lsmod" command, which should show the
         "kvm_intel" module.

      To get best VM performance, consider using the following optimizations:

       o In BIOS, change the cluster mode to "quadrant" and memory mode to
         "flat".

       o Boot the host kernel with options that enable adaptive-ticks on
         specific CPUs. Additionally, to achieve best performance, reserve
         some CPUs for sole use of the host OS. We recommend leaving two cores
         (eight threads) for the host OS while isolating and setting
         adaptive-ticks on the remaining cores.

         Choose cores for adaptive-tick operation by specifying the cores'
         thread numbers. Thread numbers of each core are offset by the number of
         physical cores, i. e. first physical core on a 68-core processor
         contains threads 0, 68, 136 and 204, second core contains threads 1,
         69, 137 and 205, etc.

       o Boot the guest kernel with options that enable adaptive-ticks
         on specific CPUs. Since we would like to use all virtual CPUs
         assigned to the machine, we enable adaptive-ticks on each core, except
         the boot CPU.

  4.2.1 Enabling adaptive-ticks optimization

      The optimization details may vary depending on the number of cores
      available in your hardware. Examples below are intended for a system
      with 68 physical cores (272 threads)

       o Host and guest kernel must be compiled with CONFIG_NO_HZ_FULL=y
         configuration option. To check this, you can execute the
         following command:

         [host]$ grep CONFIG_NO_HZ_FULL “/boot/config-$(uname -r)”

         If this option is not set, or set to no, kernel should be rebuilt.

       o To enable adaptive-ticks on the host, add arguments below to the host's
         kernel command line. As we mentioned in section 4.2 this operation sets
         adaptive-ticks operation on cores 2-67, leaving cores 0-1 for the sole
         use of the host OS.

            nohz_full=2-67,70-135,138-203,206-271 \
            isolcpus=2-67,70-135,138-203,206-271

       o To enable adaptive-ticks on the guest OS, add arguments below to the
          guest's kernel command line:

            isolcpus=1-263 nohz_full=1-263

  4.2.2 Configure VM with libvirt xml

        Examples below are suitable for a system with 68 physical cores. The
        examples assume that adaptive-ticks and bios settings described above
        were implemented.

       o To set the virtual machine to q35 model add the lines below to the
         "<os>" section in the XML configuration file:

         <os>
           <type machine='q35' arch='x86_64'>hvm</type>
           (...)
         </os>

       o To enable intel iommu add the lines below to the "<devices>" section in
         the XML configuration file.

         <devices>
           <iommu model='intel'>
             <driver intremap='on' eim='on'/>
           </iommu>
           (...)
         </devices>

       o Currently, libvirt (version 3.2.0) does not support the
         "kernel_irqchip=split" option, which enables the qemu option in the
         XML's params. Work around this by adding lines below to the "<domain>"
         section in the XML configuration file.

         <qemu:commandline>
           <qemu:arg value='-machine'/>
           <qemu:arg value='kernel_irqchip=split'/>
         </qemu:commandline>

       o To achieve best performance results, virtual CPU architecture should be
         the same as host's, but with two cores less (see 4.2.1).

         Information about virtual sockets and threads should match the
         number of cores in your processor (one socket and four threads per
         core).

         <!-- mimic host CPU model and capabilities -->
         <cpu mode='host-passthrough'>
           <!-- define desired cpu topology -->
           <topology sockets="1" cores="66" threads="4" />
           (...)
         </cpu>

       o Libvirt has an internal limitation that prohibits creation of NUMA
         nodes with no vCPUs assigned. Work around this by creating four
         additional vCPUs and assigning them to the MCDRAM NUMA node. To disable
         unwanted vCPUs execute the following commands on the guest OS:

            echo “0” > /sys/devices/system/cpu260/online
            echo “0” > /sys/devices/system/cpu261/online
            echo “0” > /sys/devices/system/cpu262/online
            echo “0” > /sys/devices/system/cpu263/online

         If additional vCPUs are not disabled, and if a thread is run on one of
         them, by default it will use MCDRAM for memory allocations. This can
         negatively impact performance of processes using MCDRAM.

         Virtual NUMA nodes are defined in the XML configuration file using the
         commands below:

         <cpu mode='host-passthrough'>
           (...)
           <numa>
             <!-- assign 255 vCPUs to NUMA node 0 by default -->
             <cell id='0' cpus='0-255' memory='40' unit='GiB'/>
             <!-- libvirt requires nonempty cpus argument; if amount of vCPUs
                  on NUMA node is not equal to number of threads per core, CPU
                  topology is incorrectly recalculated -->
             <cell id='1' cpus='256-263' memory='16' unit='GiB'/>
           </numa>
         </cpu>

         Next, virtual NUMA nodes are mapped to the actual NUMA nodes, supported
         by the processor. For a processor configured in quadrant mode, with
         memory configured as flat, we have two NUMA nodes: first one with DDR
         and second one with MCDRAM memory:

         <numatune>
           <!-- pin virtual NUMA nodes to physical NUMA nodes -->
           <memnode cellid='0' mode='strict' nodeset='0'/>
           <memnode cellid='1' mode='strict' nodeset='1'/>
         </numatune>

       o Pinning virtual CPUs to the physical threads is done the in the XML
         configuration file using commands below. Note that numeration of CPUs
         on host differs from numeration of virtual CPUs on guest:

         <cputune> <!-- pin vCPUs to physical CPUs -->
           <!-- omit first 2 cores that are reserved for host -->
           <vcpupin vcpu='0' cpuset='2'/>
           <!-- VM topology is different from host.
                Instead of core0:thread0, core1:thread0, core2:thread0....
                it has core0:thread0,core0:thread1,core0:thread2...
                Need to assign accordingly for optimal performance -->
           <vcpupin vcpu='1' cpuset='70' />
           <vcpupin vcpu='2' cpuset='138' />
           <vcpupin vcpu='3' cpuset='206' />
           <vcpupin vcpu='4' cpuset='3' /> <!-- next physical core -->
           <vcpupin vcpu='5' cpuset='71' />
           (...)

  4.2.3 Qemu raw command

        The code snippet below presents an example QEMU command that creates a
        virtual machine for a processor with 68 physical cores.

/usr/libexec/qemu-kvm \
-machine q35,accel=kvm,usb=off,dump-guest-core=off,kernel_irqchip=split \
-device intel-iommu,intremap=on,eim=on \
-m 56G -cpu host \
-smp cpus=264,cores=66,threads=4,sockets=1 \
-drive format=qcow2,file=/tmp/OS_images/OS_image.qcow2,index=0,media=disk \
-object memory-backend-ram,size=40G,\
prealloc=yes,host-nodes=0,policy=bind,id=node0 \
-numa node,nodeid=0,cpus=0-263,memdev=node0 \
-object memory-backend-ram,size=16G,\
prealloc=yes,host-nodes=1,policy=bind,id=node1 \
-numa node,nodeid=1,memdev=node1 \
-netdev user,id=network0 -device e1000,netdev=network0 \
-name vmm,process=vmm,debug-threads=on \
-nographic

  4.2.4 Example libvirt xml configuration

        Review the ./examples/virt_xpps_<hw_cores>_pin.xml
        files included in the Intel(R) Xeon Phi(TM) processor software
        release package.
