Installing TensorFlow 2.4.1 on Huawei Kunpeng 920 - A Complete Memo
The company required me to run TensorFlow on aarch64, but it was difficult to find a suitable version, so I had to compile it myself. The compilation process was challenging and filled with various problems and frustrations.(English version Translated by GPT-3.5, 返回中文)
Explanation
- The installation process will be conducted entirely in a Docker (CentOS 7) environment without Conda. I will be using Python 3.8 (although I have also tested it on Python 3.6).
- Installation will be done with gcc 5.5.
- Any encountered errors will be fully documented in this article. Please note that this article is intended as a log and not a tutorial.
- The CentOS 7 in Docker is the native CentOS 7 image from Docker Hub.
Creating Docker Container and Preparing for Installation
Creating the Docker Container
To create a new container using the official CentOS 7 Docker image, use the following command:
1 | docker run -it --name centos7 centos:7 /bin/bash |
Then, enter the Docker container by executing the command:
1 | docker start -i centos7 |
Here are the executed commands and their corresponding outputs:
1 | # Command: |
Installing the Necessary Components
Next, connect to the Docker container and install the necessary components using yum. This includes the SSH tool because the newly created CentOS 7 container is clean.
1 | yum install -y openssh-server |
Wait for the installation to complete, and don’t forget to change the root password and enable OpenSSH.
1 | passwd root |
Compiling and Installing Python 3.8
Compilation time: about 8 minutes
Downloading and Compiling Python 3.8 Source Code
Download Python 3.8 source code from the following link: https://www.python.org/ftp/python/3.8.7/Python-3.8.7.tgz
1 | wget https://www.python.org/ftp/python/3.8.7/Python-3.8.7.tgz |
Here is the output of the execution:
1 | (Excerpted for brevity) |
Extracting and Compiling Python
- Extract the Python-3.8.7.tgz file:
1 | tar -xvf Python-3.8.7.tgz |
- Install and enable GCC 7 (using the scl method, although most of the later compilation uses GCC 5.5):
1 | scl enable devtoolset-7 bash |
- Verify the GCC version:
1 | gcc --version |
- Start compiling Python:
1 | cd Python-3.8.7 |
Once the installation is successful, you can verify it by entering python3
in the command line. The compilation and installation of Python are now complete.
Preparing for TensorFlow
Refer to the official website: Building from Source
Updating pip
To speed up the pip download process, change the mirror to Aliyun.
1 | pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/ |
The output will be as follows:
1 | (Excerpted for brevity) |
Installing Dependencies as Required
1 | pip3 install -U pip six numpy wheel setuptools mock 'future>=0.17.1' |
The console output will be as follows:
1 | (Excerpted for brevity) |
Downloading the TensorFlow Source Code
- Download the TensorFlow 2.4.1.zip (67MB) from the TensorFlow GitHub repository. It is recommended to download the latest release version, not the master branch.
1 | wget https://github.com/tensorflow/tensorflow/archive/v2.4.1.zip |
The console output will be as follows:
1 | (Excerpted for brevity) |
- Follow the official documentation, specifically the “Installing Bazel” section, at https://www.tensorflow.org/install/source?hl=en#install_bazel. It suggests installing Bazel via the Bazelink method for faster installation.
1 | wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip |
Manually Compiling Bazel
Ignore any errors. This is not the official gcc5 compilation. Compilation time: about 3 minutes
Downloading the Appropriate Version of Bazel
Visit the official TensorFlow website - Installing Bazel - and follow the instructions.
Please note the sentence on the webpage: “Be sure to install a supported Bazel version, which can be any version between _TF_MIN_BAZEL_VERSION and _TF_MAX_BAZEL_VERSION specified in tensorflow/configure.py.”
Inspect the contents of the folder to see what directories are required. The contents are as follows:
1 | (Excerpted for brevity) |
The minimum version required is 3.1.0, so download the file bazel-3.1.0-dist.zip (257 MB) from Release 3.1.0 · bazelbuild/bazel.
Extract the bazel-3.1.0-dist.zip file:
1 | unzip bazel-3.1.0-dist.zip |
Compiling Bazel
- Install the required software (Java), choosing Java 11:
1 | yum install java-11-openjdk-devel |
- Execute the compilation:
1 | export PATH=/usr/lib/jvm/java-11/bin:$PATH |
It is highly recommended to use a proxy or manually download the files to the corresponding directory to avoid slow speed.
The console output will be as follows:
1 | (Excerpted for brevity) |
If you encounter an error, such as “Error downloading,” it means there is a problem with the network. Running the command multiple times should resolve it. (Note: The download is slow due to the combination of a slow network and downloading multiple files simultaneously. After a timeout, the download will verify the checksum, which will fail if any file is incomplete. Even when attempting to download the files to a specific directory, the script will still redownload the file if run again.)
For example, here is the error:
1 | (Excerpted for brevity) |
- After running the command multiple times, all the files should be successfully downloaded. Proceed with the compilation. This process will take a considerable amount of time (approximately 1-2 hours). It is recommended to use screen or nohup to prevent connection failures.
1 | bazel build -c opt --local_ram_resources=2048 --host_crosstool_top=@local_config_host//third_party/toolchains/host_x86_64:toolchain //tensorflow/tools/pip_package:build_pip_package |
The console output will be as follows:
1 | (Excerpted for brevity) |
In the middle of the process, there may be an error:
1 | (Excerpted for brevity) |
Compiling Successfully
To fix the “ModuleNotFoundError: No module named ‘_ctypes’” error, it seems that it cannot be automatically installed. Therefore, manually install it using pip:
1 | pip3 install grpcio |
If an error occurs during the installation of grpcio:
1 | (Excerpted for brevity) |
After adding the environment variable for the installed gcc5.5 (or adding /usr/local/gcc5.5/bin
to the PATH), rerun the command python3 setup.py install
:
1 | (Excerpted for brevity) |
Next, try installing h5py (version 2.10.0 recommended, as it is the version downloaded during TensorFlow installation):
1 | pip3 install h5py==2.10.0 |
The console output will be as follows:
1 | (Excerpted for brevity) |
Apparently, HDF5 needs to be installed. Download hdf5-1.12.0.tar.gz from the official HDF5 website. It is recommended to update the environment variable to include gcc5.5, as it is tedious to keep appending /usr/local/gcc5.5/bin
to the PATH every time.
1 | (Excerpted for brevity) |
Continue installing HDF5:
1 | (Excerpted for brevity) |
Returning to the h5py directory:
1 | (Excerpted for brevity) |
Error encountered:
1 | (Excerpted for brevity) |
According to an answer on Stack Overflow - Why is HDF5 giving a “too few arguments” error here?, it appears that the HDF5 version is too new. Downgrade to version 1.10.7 and download hdf5-1.10.7.tar.gz from the website.
1 | (Excerpted for brevity) |
Then return to the h5py directory and proceed with the installation:
1 | (Excerpted for brevity) |
Error encountered:
1 | (Excerpted for brevity) |
To fix this, we need to link the two .so files that h5py cannot find (libhdf5.so
and libhdf5_hl.so
) to a directory where it can be found. The files should be copied to /opt/local/lib
and /usr/local/lib
, as those are the directories h5py searches. After that, retry the command:
1 | (Excerpted for brevity) |
The console output will be as follows:
1 | (Excerpted for brevity) |
Finally, it is time to install TensorFlow:
1 | (Excerpted for brevity) |
Unexpected error encountered:
1 | (Excerpted for brevity) |
From the error, we can see that there are errors related to grpcio
and h5py
.
1 | (Excerpted for brevity) |
Fixing the grpcio error can be done by manually installing it if the automatic installation fails:
1 | pip3 install grpcio |
If installing grpcio returns an error:
1 | (Excerpted for brevity) |
After adding the environment variable for the installed gcc5.5 (or adding /usr/local/gcc5.5/bin
to the PATH), rerun the command python3 setup.py install
.
1 | (Excerpted for brevity) |
Next, try installing h5py:
1 | pip3 install h5py==2.10.0 |
The console output will be as follows:
1 | (Excerpted for brevity) |
It seems that hdf5 needs to be installed. Download hdf5-1.12.0.tar.gz from the official website. It is recommended to update the environment variable to include gcc5.5, as typing /usr/local/gcc5.5/bin
every time can be tiring.
1 | (Excerpted for brevity) |
Continue installing hdf5:
1 | (Excerpted for brevity) |
Returning to the h5py directory, attempt the installation once again:
1 | (Excerpted for brevity) |
Error encountered…
1 | (Excerpted for brevity) |
Based on the answer from Why is HDF5 giving a “too few arguments” error here?, it appears that the hdf5 version is too high. Downgrade to version 1.10.7 and download hdf5-1.10.7.tar.gz.
1 | (Excerpted for brevity) |
Return to the h5py directory and proceed with the installation:
1 | (Excerpted for brevity) |
Error encountered…
1 | (Excerpted for brevity) |
To resolve this issue, link the two .so
files that h5py cannot find (libhdf5.so
and libhdf5_hl.so
) to a directory where they can be found. Let’s link them to /opt/local/lib
and /usr/local/lib
.
1 | (Excerpted for brevity) |
Rerun the command:
1 | (Excerpted for brevity) |
The console output will be as follows:
1 | (Excerpted for brevity) |
Congratulations! The installation is now complete.