华为鲲鹏920安装Tensorflow 2.4.1 从下载到安装完整备忘录

公司要求我在aarch64下运行tensorflow, 找真的不好找, 所以我就只能靠自己编译了, 编译过程很难受,各种问题各种痛苦

说明

  1. 安装过程将全程在docker(CentOS 7)中进行,没安装conda, 使用python3.8环境(当然我也实测过可以在python3.6下安装)
  2. 将使用gcc5.5进行安装
  3. 中途的报错我也是完整记录, 该文章为日志文, 不建议当教程使用
  4. Docker中的CentOS 7为Docker Hub中的CentOS 7原生镜像

创建Docker容器, 准备安装

先创建Docker容器

我将会用下面的命令行来创建一个全新的 容器为centos7的docker镜像, 这个centos7为官方

1
docker run -d -p 9222:22 --name=tensorflow-test  -e "container=docker" --privileged=true --restart=always centos:7 /usr/sbin/init

然后, 进入这个docker容器中

1
docker exec -it 48097947e31b bash

执行命令和返回结果如下

1
2
3
4
[[email protected] ~]# docker run -d -p 9222:22 --name=hanlp-tester  -e "container=docker" --privileged=true --restart=always centos:7 /usr/sbin/init
48097947e31b9950af3e5252001d66f927b2d05b2546d70582db6ffbce3c0813
[[email protected] ~]# docker exec -it 48097947e31b bash
[[email protected] /]#

安装必要组件

然后, 连接进入docker里面, 使用yum安装必要组件, 其中包括ssh工具, 因为新创的centos7容器, 干净得很

1
yum install wget curl telnet make net-tools initscripts sudo su openssh-server openssh-clients openssl-devel openssl zlib-devel -y

稍等一会, 就完成了, 别忘了改个root密码,开启openssh

1
2
3
4
5
6
7
8
[[email protected] download]# passwd root
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[[email protected] download]# systemctl start sshd
[[email protected] download]# systemctl enable sshd
[[email protected] download]#

编译Python3.8并安装

编译耗时: 约8分钟

下载Python3.8源码, 并进行编译

我们从这里下载 Python 3.8版本 https://www.python.org/ftp/python/3.8.7/Python-3.8.7.tgz

1
wget -c https://www.python.org/ftp/python/3.8.7/Python-3.8.7.tgz

执行的输出如下

1
2
3
4
5
6
7
8
9
10
11
[[email protected] download]# https_proxy= wget -c https://www.python.org/ftp/python/3.8.7/Python-3.8.7.tgz
--2021-02-05 05:21:28-- https://www.python.org/ftp/python/3.8.7/Python-3.8.7.tgz
Resolving www.python.org (www.python.org)... 151.101.228.223, 2a04:4e42:1a::223
Connecting to www.python.org (www.python.org)|151.101.228.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24468684 (23M) [application/octet-stream]
Saving to: 'Python-3.8.7.tgz'

100%[=======================================================>] 24,468,684 1.79MB/s in 12s

2021-02-05 05:21:07 (1.90 MB/s) - 'Python-3.8.7.tgz' saved [24468684/24468684]

解压, 并进行编译

  1. 解压Python-3.8.7.tgz

    1
    tar -zxvf Python-3.8.7.tgz
  2. 安装并启用gcc7 (使用 scl方式,其实后面基本用gcc5.5编译的。。。)

    1
    2
    3
    yum install centos-release-scl -y
    yum install devtoolset-7 -y
    scl enable devtoolset-7 bash
  3. 验证GCC版本

    1
    2
    3
    4
    5
    [[email protected] Python-3.8.7]# gcc --version
    gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
    Copyright (C) 2017 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  4. 开始编译Python

    make -j6 表示用6个线程进行同时编译, 提升编译速度,前提是这会消耗大量的内存和CPU,如果CPU核数高但是内存不多的可能会报内存不足。

    1
    2
    3
    ./configure --with-ssl-default-suites=openssl --enable-optimizations
    make -j6
    make install

    安装成功, 命令行输入python3, 编译安装结束

    1
    2
    3
    4
    5
    [[email protected] Python-3.8.7]# python3 
    Python 3.8.7 (default, Feb 5 2021, 05:34:43)
    [GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>>

准备好Tensorflow

参考的官网是 从源代码构建 | TensorFlow

更新 pip, 这里开始得用 pip3,为了加速pip下载, 先把镜像改成aliyun的。不然等死人

1
2
pip3 install -i https://mirrors.aliyun.com/pypi/simple/ pip -U
pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/

输出如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[[email protected] Python-3.8.7]# pip3 install -i https://mirrors.aliyun.com/pypi/simple/ pip -U
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting pip
Downloading https://mirrors.aliyun.com/pypi/packages/fe/ef/60d7ba03b5c442309ef42e7d69959f73aacccd0d86008362a681c4698e83/pip-21.0.1-py3-none-any.whl (1.5 MB)
|████████████████████████████████| 1.5 MB 1.4 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.2.3
Uninstalling pip-20.2.3:
Successfully uninstalled pip-20.2.3
Successfully installed pip-21.0.1
[[email protected] Python-3.8.7]# pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/
Writing to /root/.config/pip/pip.conf
[[email protected] Python-3.8.7]#

按官网要求, 安装依赖

1
2
pip3 install -U --user pip numpy wheel
pip3 install -U --user keras_preprocessing --no-deps

控制台输出如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[[email protected] Python-3.8.7]# pip install -U --user pip numpy wheel
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: pip in /usr/local/lib/python3.8/site-packages (21.0.1)
Collecting numpy
Downloading https://mirrors.aliyun.com/pypi/packages/3d/e3/56781e03ba3f7eb713af03ad8050957d357fd31685b356c446626436ff3e/numpy-1.20.0-cp38-cp38-manylinux2014_aarch64.whl (12.7 MB)
|████████████████████████████████| 12.7 MB 238 kB/s
Collecting wheel
Downloading https://mirrors.aliyun.com/pypi/packages/65/63/39d04c74222770ed1589c0eaba06c05891801219272420b40311cd60c880/wheel-0.36.2-py2.py3-none-any.whl (35 kB)
Installing collected packages: wheel, numpy
WARNING: The script wheel is installed in '/root/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts f2py, f2py3 and f2py3.8 are installed in '/root/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed numpy-1.20.0 wheel-0.36.2
[[email protected] Python-3.8.7]# pip install -U --user keras_preprocessing --no-deps
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting keras_preprocessing
Downloading https://mirrors.aliyun.com/pypi/packages/79/4c/7c3275a01e12ef9368a892926ab932b33bb13d55794881e3573482b378a7/Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
|████████████████████████████████| 42 kB 863 kB/s
Installing collected packages: keras-preprocessing
Successfully installed keras-preprocessing-1.1.2
[[email protected] Python-3.8.7]#

先下载tensorflow的源码包

  1. Tensorflow的github上下载 TensorFlow 2.4.1.zip 67MB, 不clone master版本, 使用最后一次的release版, 并解压

    1
    2
    wget -c https://github.com/tensorflow/tensorflow/archive/v2.4.1.zip
    unzip v2.4.1.zip

    控制台输出

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    [[email protected] download]# wget -c https://github.com/tensorflow/tensorflow/archive/v2.4.1.zip
    --2021-02-05 06:02:22-- https://github.com/tensorflow/tensorflow/archive/v2.4.1.zip
    Connecting to 127.0.0.1:7890... connected.
    Proxy request sent, awaiting response... 302 Found
    Location: https://codeload.github.com/tensorflow/tensorflow/zip/v2.4.1 [following]
    --2021-02-05 06:02:23-- https://codeload.github.com/tensorflow/tensorflow/zip/v2.4.1
    Connecting to 127.0.0.1:7890... connected.
    Proxy request sent, awaiting response... 200 OK
    Length: unspecified [application/zip]
    Saving to: 'v2.4.1.zip'

    [ <=> ] 16,283,488 1.99MB/s
    2021-02-05 06:03:53 (765 KB/s) - 'v2.4.1.zip' saved [69346072]
    [[email protected] download]# unzip v2.4.1.zip
    Archive: v2.4.1.zip
    85c8b2a817f95a3e979ecd1ed95bff1dc1335cff
    creating: tensorflow-2.4.1/
    inflating: tensorflow-2.4.1/.bazelrc
    extracting: tensorflow-2.4.1/.bazelversion
    creating: tensorflow-2.4.1/.github/
    creating: tensorflow-2.4.1/.github/ISSUE_TEMPLATE/
    ...
    inflating: tensorflow-2.4.1/tools/tf_env_collect.sh
    finishing deferred symbolic links:
    tensorflow-2.4.1/.pylintrc -> tensorflow/tools/ci_build/pylintrc
    [[email protected] download]#
  2. 按照官方文档 安装Bazel这一段, 说要安装bazelink可以快速安装合适的bazel

    1
    原文: 您需要安装 Bazel,才能构建 TensorFlow。您可以使用 Bazelisk 轻松安装 Bazel,并且 Bazelisk 可以自动为 TensorFlow 下载合适的 Bazel 版本。为便于使用,请在 PATH 中将 Bazelisk 添加为 bazel 可执行文件。

    我看了下, 并没有bazelink for aarch64架构的bin, 所以我选择手动编译

手动编译 Bazel

忽略报错, 不算gcc5编译,编译耗时: 约3分钟

下载Bazel合适的版本

前往Tensorflow官网 - 安装Bazel, 然后跟着指引慢慢来

注意网页上这句话: 请务必安装受支持的 Bazel 版本,可以是 tensorflow/configure.py 中指定的介于 _TF_MIN_BAZEL_VERSION 和 _TF_MAX_BAZEL_VERSION 之间的任意版本。

然后,去这个文件夹看下, 它需要什么目录, 内容如下

1
2
3
4
5
6
7
8
9
10
cat /root/download/tensorflow-2.4.1/configure.py

...
_TF_BAZELRC_FILENAME = '.tf_configure.bazelrc'
_TF_WORKSPACE_ROOT = ''
_TF_BAZELRC = ''
_TF_CURRENT_BAZEL_VERSION = None
_TF_MIN_BAZEL_VERSION = '3.1.0'
_TF_MAX_BAZEL_VERSION = '3.99.0'
...

最低3.1.0, 最高3.99.0, 那就下3.1.0就成

选择下载文件 bazel-3.1.0-dist.zip 257 MB 文件来自于 Release 3.1.0 · bazelbuild/bazel

解压 bazel-3.1.0-dist.zip

1
2
wget -c https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip
unzip -d bazel-dist bazel-3.1.0-dist.zip

编译Bazel

  1. 安装依赖软件 Java, 这里选择Java11

    1
    yum install java-11-openjdk java-11-openjdk-devel -y
  2. 执行编译

    这里强烈建议使用代理, 或者自行下载文件到对应目录下, 不然速度慢的很

    1
    2
    cd bazel-dist
    ./compile.sh

    控制台输出

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    [[email protected] bazel-dist]# ./compile.sh 
    🍃 Building Bazel from scratch.. (这一步预计要3分钟)
    🍃 Building Bazel with Bazel.(这里开始非常耗时间)
    DEBUG: /tmp/bazel_ZlOoVY5r/out/external/bazel_toolchains/rules/rbe_repo/version_check.bzl:59:9:
    ...
    bazel_tools/tools/jdk/include/linux' resolves to 'external/bazel_tools/tools/jdk/include/linux' not below the relative path of its package 'src/main/java/com/google/devtools/build/lib/syntax'. This will be an error in the future
    Analyzing: target //src:bazel_nojdk (275 packages loaded, 8010 targets configu\
    red)
    Fetching @remotejdk11_linux_aarch64; fetching 36s
    Fetching https://mirror.bazel.build/...nux_aarch64.tar.gz; 67,991,108B 34s (视网络情况而定, 412MB, 文件下载地址 https://mirror.bazel.build/openjdk/azul-zulu11.37.48-ca-jdk11.0.6/zulu11.37.48-ca-jdk11.0.6-linux_aarch64.tar.gz)
    ...
    [300 / 1,450] 8 actions, 7 running (这一步会等待较长时间,)
    JavacBootstrap .../buildjar/libskylark-deps.jar [for host]; 14s local
    @com_google_protobuf//:protobuf; 2s local
    @com_google_protobuf//:protobuf; 1s local
    @com_google_protobuf//:protobuf; 1s local
    @com_google_protobuf//:protobuf; 0s local
    @com_google_protobuf//:protobuf; 0s local
    @com_google_protobuf//:protobuf; 0s local
    [Scann] @com_google_protobuf//:protobuf

    出现了报错

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    ERROR: /tmp/bazel_ZlOoVY5r/out/external/bazel_tools/third_party/ijar/BUILD:72:1: Linking of rule '@bazel_tools//third_party/ijar:zipper' failed (Exit 1): gcc failed: error executing command 
    (cd /tmp/bazel_ZlOoVY5r/out/execroot/io_bazel && \
    exec env - \
    LD_LIBRARY_PATH=/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib:/opt/rh/devtoolset-7/root/usr/lib64/dyninst:/opt/rh/devtoolset-7/root/usr/lib/dyninst:/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib \
    PATH=/opt/rh/devtoolset-7/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin \
    PWD=/proc/self/cwd \
    /opt/rh/devtoolset-7/root/usr/bin/gcc @bazel-out/host/bin/external/bazel_tools/third_party/ijar/zipper-2.params)
    Execution platform: //:default_host_platform
    bazel-out/host/bin/external/bazel_tools/src/main/cpp/util/_objs/filesystem/path_posix.o:path_posix.cc:function blaze_util::SplitPath(std::string const&): error: undefined reference to 'std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&, unsigned long, std::allocator<char> const&)'
    bazel-out/host/bin/external/bazel_tools/src/main/cpp/util/_objs/filesystem/path_posix.o:path_posix.cc:function blaze_util::SplitPath(std::string const&): error: undefined reference to 'std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&, unsigned long, std::allocator<char> const&)'
    collect2: error: ld returned 1 exit status
    Target //src:bazel_nojdk failed to build
    INFO: Elapsed time: 365.761s, Critical Path: 54.37s
    INFO: 659 processes: 659 local.
    FAILED: Build did NOT complete successfully

    似乎查了半天没有找到解决方案, 解决不掉。。。我试着装个gcc5.5版本试试看(编译耗时 约25分钟

    1
    2
    3
    4
    5
    6
    7
    wget -c http://ftp.gnu.org/gnu/gcc/gcc-5.5.0/gcc-5.5.0.tar.gz
    tar -zxvf gcc-5.5.0.tar.gz
    cd gcc-5.5.0
    yum install gmp-devel mpfr-devel libmpc-devel -y
    ./configure --prefix=/usr/local/gcc5.5
    make -j
    make install

    再次编译

    1
    PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin LD_LIBRARY_PATH=/usr/local/gcc5.5/lib64:/usr/local/gcc5.5/lib ./compile.sh

    控制台输出

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    [[email protected] bazel-dist]# PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin LD_LIBRARY_PATH=/usr/local/gcc5.5/lib64:/usr/local/gcc5.5/lib ./compile.sh
    🍃 Building Bazel from scratch......
    🍃 Building Bazel with Bazel.
    ...
    Analyzing: target //src:bazel_nojdk (275 packages loaded, 8010 targets configu\
    red)
    Fetching @remotejdk11_linux_aarch64; fetching 11s
    Fetching https://mirror.bazel.build/...linux_aarch64.tar.gz; 8,101,840B 9s(412MB, 这里重新下载有点无语,1.5MB/s情况下约 5分钟)
    [104 / 1,430] 8 actions running
    JavacBootstrap .../buildjar/libskylark-deps.jar [for host]; 22s local
    @com_google_protobuf//:protobuf; 7s local
    @com_google_protobuf//:protobuf; 4s local
    @com_google_protobuf//:protobuf_lite; 3s local
    @com_google_protobuf//:protobuf; 2s local
    @com_google_protobuf//:protobuf_lite; 0s local
    @com_google_protobuf//:protobuf_lite; 0s local
    @com_google_protobuf//:protobuf; 0s local

    再次报错

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    	ERROR: /root/download/bazel-dist/src/main/protobuf/BUILD:202:1: Action src/main/protobuf/command_server.grpc.pb.h failed (Exit 1): protoc failed: error executing command 
    (cd /tmp/bazel_fsjZQa4T/out/execroot/io_bazel && \
    exec env - \
    bazel-out/host/bin/external/com_google_protobuf/protoc '--plugin=protoc-gen-PLUGIN=bazel-out/host/bin/third_party/grpc/cpp_plugin' '--PLUGIN_out=bazel-out/aarch64-opt/bin' '--proto_path=.' '--proto_path=.' '--proto_path=bazel-out/aarch64-opt/bin/external/com_google_protobuf/_virtual_imports/descriptor_proto' '--proto_path=bazel-out/aarch64-opt/bin' src/main/protobuf/command_server.proto)
    Execution platform: //:default_host_platform
    /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc)
    /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc)
    /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc)
    Target //src:bazel_nojdk failed to build
    ERROR: /root/download/bazel-dist/src/main/cpp/BUILD:98:1 Action src/main/protobuf/command_server.grpc.pb.h failed (Exit 1): protoc failed: error executing command
    (cd /tmp/bazel_fsjZQa4T/out/execroot/io_bazel && \
    exec env - \
    bazel-out/host/bin/external/com_google_protobuf/protoc '--plugin=protoc-gen-PLUGIN=bazel-out/host/bin/third_party/grpc/cpp_plugin' '--PLUGIN_out=bazel-out/aarch64-opt/bin' '--proto_path=.' '--proto_path=.' '--proto_path=bazel-out/aarch64-opt/bin/external/com_google_protobuf/_virtual_imports/descriptor_proto' '--proto_path=bazel-out/aarch64-opt/bin' src/main/protobuf/command_server.proto)
    Execution platform: //:default_host_platform
    INFO: Elapsed time: 370.840s, Critical Path: 58.40s
    INFO: 616 processes: 589 local, 27 worker.
    FAILED: Build did NOT complete successfully

    不错, 这次报错有很明显的错误点了

    1
    2
    3
    /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc)
    /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc)
    /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /tmp/bazel_fsjZQa4T/out/execroot/io_bazel/bazel-out/host/bin/external/com_google_protobuf/protoc)

    错误原因是 /lib64/libstdc++.so.6中没有GLIBCXX_3.4.20版本, 检查下刚刚编译出来的/usr/local/gcc5.5/lib64/libstdc++.so.6支持的GLIBCXX版本

    1
    strings /usr/local/gcc5.5/lib64/libstdc++.so.6  | grep ^GLIBCXX

    控制台返回

    1
    2
    3
    4
    5
    6
    7
    8
    9
    [[email protected] bazel-dist]# strings /usr/local/gcc5.5/lib64/libstdc++.so.6  | grep ^GLIBCXX
    GLIBCXX_3.4
    ...
    GLIBCXX_3.4.18
    GLIBCXX_3.4.19
    GLIBCXX_3.4.20 #### 目标版本
    GLIBCXX_3.4.21 #### 目标版本
    ...
    GLIBCXX_3.4.4

    再检查 CXXABI_1.3.8版本是否在

    1
    strings /usr/local/gcc5.5/lib64/libstdc++.so.6  | grep ^CXXABI

    控制台返回

    1
    2
    3
    4
    5
    [[email protected] bazel-dist]# strings /usr/local/gcc5.5/lib64/libstdc++.so.6  | grep ^CXXABI
    ...
    CXXABI_1.3.8 #### 目标版本
    ...
    CXXABI_1.3.3

    即gcc5.5中包含了 CXXABI_1.3.8, GLIBCXX_3.4.20, 那把这个文件链接到 /lib64就成了

    1
    2
    unlink /lib64/libstdc++.so.6
    ln -s /usr/local/gcc5.5/lib64/libstdc++.so.6 /lib64/libstdc++.so.6

    控制台输出

    1
    2
    3
    4
    5
    6
    7
    [[email protected] bazel-dist]# ll /lib64/libstdc++.so.6
    lrwxrwxrwx 1 root root 19 Nov 13 01:54 /lib64/libstdc++.so.6 -> libstdc++.so.6.0.19
    [[email protected] bazel-dist]# unlink /lib64/libstdc++.so.6
    [[email protected] bazel-dist]# ln -s /usr/local/gcc5.5/lib64/libstdc++.so.6 /lib64/libstdc++.so.6
    [[email protected] bazel-dist]# ll /lib64/libstdc++.so.6
    lrwxrwxrwx 1 root root 38 Feb 5 08:29 /lib64/libstdc++.so.6 -> /usr/local/gcc5.5/lib64/libstdc++.so.6
    [[email protected] bazel-dist]#

    再再次编译,命令行不发了, 同上。。。(又重下那个412MB的文件。。。都想构建本地http了。。。), 控制台输出

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    [[email protected] bazel-dist]# PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin LD_LIBRARY_PATH=/usr/local/gcc5.5/lib64:/usr/local/gcc5.5/lib https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 ./compile.sh
    🍃 Building Bazel from scratch......
    🍃 Building Bazel with Bazel.
    ....
    Fetching https://mirror.bazel.build/...linux_aarch64.tar.gz; 8,101,840B 9s(412MB,后面省略)
    INFO: Found 1 target...
    [0 / 1,066] [Prepa] Writing file src/embedded_tools_nojdk.params
    ....
    [323 / 1,483] 8 actions running
    [326 / 1,483] 8 actions running
    INFO: From JavacBootstrap src/java_tools/buildjar/java/com/google/devtools/build/buildjar/libskylark-deps.jar [for host]:
    .....
    DEBUG: /tmp/bazel_3S4oONKX/out/external/build_bazel_rules_nodejs/internal/common/check_bazel_version.bzl:49:9: Make sure that you are running at least Bazel 0.21.0.

    Build successful! Binary is here: /root/download/bazel-dist/output/bazel

    编译成功, 把bazel包复制到/usr/local/bin中

    1
    cp /root/download/bazel-dist/output/bazel /usr/local/bin/

    控制台输出

    1
    2
    3
    4
    [[email protected] bazel-dist]# cp /root/download/bazel-dist/output/bazel /usr/local/bin/
    [[email protected] bazel-dist]# bazel --version
    bazel 3.1.0- (@non-git)
    [[email protected] bazel-dist]#

开始编译Tensorflow 2.4.1(非常耗时)

忽略报错, 不算gcc5编译,编译耗时: 约2小时40分钟

配置构建

输入如下命令, 按照自己需求进行选项, 我的选项如下

1
./configure

我选的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[[email protected] tensorflow-2.4.1]# ./configure 
You have bazel 3.1.0- (@non-git) installed.
Please specify the location of python. [Default is /usr/local/bin/python3]:


Found possible Python library paths:
/usr/local/lib/python3.8/site-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.8/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=mkl_aarch64 # Build with oneDNN support for Aarch64.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
[[email protected] tensorflow-2.4.1]#

构建 pip 软件包

  1. 下载依赖软件git

    1
    yum install git -y
  2. 使用bazel build进行构建

    最简单的命令是

    1
    bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package

    但是为了保证gcc正确, 我加了gcc的PATH, 也是用gcc5编译, 加上--local_ram_resources=2048防止内存炸了

    1
    PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin LD_LIBRARY_PATH=/usr/local/gcc5.5/lib64:/usr/local/gcc5.5/lib bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --verbose_failures --local_ram_resources=2048

    控制台输出

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    [[email protected] tensorflow-2.4.1]# PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --verbose_failures --local_ram_resources=2048
    ...
    open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --
    ...
    Analyzing: target //tensorflow/tools/pip_package:build_pip_package (16 packages loaded, 439 targets configured)
    Fetching @go_sdk; fetching 49s
    Fetching @boringssl; fetching 49s
    Fetching @llvm-project; fetching 49s
    Fetching @aws; fetching 49s
    Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/boringssl/archive/80ca9f9f6ece29ab132cce4cf807a9465a18cfac.tar.gz; 15,210,540B 48s
    Fetching https://dl.google.com/go/go1.12.5.linux-arm64.tar.gz; 13,590,432B 46s
    Fetching https://github.com/aws/aws-sdk-cpp/archive/1.7.336.tar.gz; 11,919,054B 44s
    Fetching https://github.com/llvm/llvm-project/archive/f402e682d0ef5598eeffc9a21a691b03e602ff58.tar.gz; 12,222,637B 44s ... (16 fetches)

    中间可能会报错, 如果错误类型是 Error downloading的错误, 就是网络炸了, 多运行几次即可(p.s. 这段下载简直是实习生写的, 本身国内网络慢, 还十几个文件一起下载, 然后timeout概率爆炸, timeout后还不会去重试, 直接checksum, timeout后文件肯定不完整, 结果, 十几个文件中1个文件失败就会所有文件直接停止下载并报错,哪怕其他文件还在下载,而且我试过将文件下载到指定目录, 结果运行后还是会重新下载该文件。)

    例如像下面的错误,

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: Traceback (most recent call last):
    File "/root/.cache/bazel/_bazel_root/6dc75189e5d225c7bfaf488577a89e58/external/io_bazel_rules_go/go/private/sdk.bzl", line 50
    _remote_sdk(ctx, <3 more arguments>)
    File "/root/.cache/bazel/_bazel_root/6dc75189e5d225c7bfaf488577a89e58/external/io_bazel_rules_go/go/private/sdk.bzl", line 120, in _remote_sdk
    ctx.download(url = urls, <2 more arguments>)
    java.io.IOException: Error downloading [https://dl.google.com/go/go1.12.5.linux-arm64.tar.gz] to /root/.cache/bazel/_bazel_root/6dc75189e5d225c7bfaf488577a89e58/external/go_sdk/go_sdk.tar.gz: Checksum was aac5b83aa2838dcc69817d928b4921ad1db945da9bbc70dd3e55a48ad7259505 but wanted ff09f34935cd189a4912f3f308ec83e4683c309304144eae9cf60ebc552e7cd8
    INFO: Elapsed time: 31.488s
    INFO: 0 processes.
    FAILED: Build did NOT complete successfully (202 packages loaded, 3829 targets configured)
    Fetching @io_bazel_rules_docker; Cloning 251f6a68b439744094faff800cd029798edf9faa of https://github.com/bazelbuild/rules_docker.git 27s
  3. 冒着怒火, 相同命令运行了近10次, 终于全部文件下载成功了。。进入编译。。。这段时间非常漫长, 预计1-2小时, 建议使用screen或nohup方式编译, 防止控制台断开失败

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (177 packages loaded, 26606 targets configured).
    INFO: Found 1 target...
    [103 / 556] 8 actions, 7 running # 别被这里欺骗了, 一共约1.5万个, 它会不断增加
    Compiling external/com_googlesource_code_re2/re2/set.cc [for host]; 1s local
    Compiling external/com_googlesource_code_re2/re2/parse.cc [for host]; 0s local
    Compiling external/com_googlesource_code_re2/re2/re2.cc [for host]; 0s local
    Compiling external/com_google_absl/absl/time/duration.cc [for host]; 0s local
    Compiling external/com_googlesource_code_re2/re2/compile.cc [for host]; 0s local
    Compiling external/com_googlesource_code_re2/re2/bitstate.cc [for host]; 0s local
    Compiling external/com_googlesource_code_re2/re2/onepass.cc [for host]; 0s local
    [Scann] Compiling external/com_googlesource_code_re2/re2/dfa.cc [for host]
  4. 编译到一半的时候,又报错了

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    ERROR: /root/download/tensorflow-2.4.1/tensorflow/python/keras/api/BUILD:111:1: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen failed (Exit 1): bash failed: error executing command
    (cd /root/.cache/bazel/_bazel_root/6dc75189e5d225c7bfaf488577a89e58/execroot/org_tensorflow && \
    exec env - \
    PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin \
    PYTHON_BIN_PATH=/usr/local/bin/python3 \
    PYTHON_LIB_PATH=/usr/local/lib/python3.8/site-packages \
    TF2_BEHAVIOR=1 \
    TF_CONFIGURE_IOS=0 \
    /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen --apidir=bazel-out/aarch64-opt/bin/tensorflow/python/ke
    ras/api --apiname=keras --apiversion=1 --loading=default --package=tensorflow.python,tensorflow.python.keras,tensorflow.python.keras.activations,tensorflow.python.keras.applications.densenet,tensorflow.python.keras.applications.
    efficientnet,tensorflow.python.keras.applications.imagenet_utils,tensorflow.python.keras.applications.inception_resnet_v2,tensorflow.python.keras.applications.inception_v3,tensorflow.python.keras.applications.mobilenet,tensorflow
    ...
    scikit_learn/__init__.py')
    Execution platform: @local_execution_config_platform//:platform
    Traceback (most recent call last):
    File "/root/.cache/bazel/_bazel_root/6dc75189e5d225c7bfaf488577a89e58/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen.runfiles/org_tensorflow/tensorflow/python/tools/api/generator/create_python_api.py", line 26, in <module>
    from tensorflow.python.tools.api.generator import doc_srcs
    File "/root/.cache/bazel/_bazel_root/6dc75189e5d225c7bfaf488577a89e58/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen.runfiles/org_tensorflow/tensorflow/python/__init__.py", line 28, in <module>
    import ctypes
    File "/usr/local/lib/python3.8/ctypes/__init__.py", line 7, in <module>
    from _ctypes import Union, Structure, Array
    ModuleNotFoundError: No module named '_ctypes'
    Target //tensorflow/tools/pip_package:build_pip_package failed to build
    ERROR: /root/download/tensorflow-2.4.1/tensorflow/python/tools/BUILD:82:1 Executing genrule //tensorflow/python/keras/api:keras_python_api_gen failed (Exit 1): bash failed: error executing command
    (cd /root/.cache/bazel/_bazel_root/6dc75189e5d225c7bfaf488577a89e58/execroot/org_tensorflow && \
    exec env - \
    PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin \
    PYTHON_BIN_PATH=/usr/local/bin/python3 \
    PYTHON_LIB_PATH=/usr/local/lib/python3.8/site-packages \
    TF2_BEHAVIOR=1 \
    TF_CONFIGURE_IOS=0 \
    /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen --apidir=bazel-out/aarch64-opt/bin/tensorflow/python/keras/api --apiname=keras --apiversion=1 --loading=default --package=tensorflow.python,tensorflow.python.keras,tensorflo
    ...
    python/keras/api/keras/wrappers/__init__.py bazel-out/aarch64-opt/bin/tensorflow/python/keras/api/keras/wrappers/scikit_learn/__init__.py')
    Execution platform: @local_execution_config_platform//:platform
    INFO: Elapsed time: 9034.655s, Critical Path: 5345.41s
    INFO: 13843 processes: 13843 local.
    FAILED: Build did NOT complete successfully

    错误很明显, ModuleNotFoundError: No module named '_ctypes'

    查询后, 得知需要安装 libffi-devel, 然后, 还得编译一次python。。。

    所以。。

    1
    2
    3
    4
    5
    yum install libffi-devel -y
    cd ../Python-3.8.7
    ./configure --with-ssl-default-suites=openssl --enable-optimizations
    make -j
    make install

    完成后, 看下Python能否import from _ctypes import Union, Structure, Array

    1
    2
    3
    4
    5
    6
    [[email protected] Python-3.8.7]# python3
    Python 3.8.7 (default, Feb 5 2021, 13:40:10)
    [GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from _ctypes import Union, Structure, Array
    >>>

    ok, 继续编译 TensorFlow

    1
    PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --verbose_failures --local_ram_resources=2048

    经过漫长的等待,最后的最后,控制台输出

    1
    2
    3
    4
    5
    6
    INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (177 packages loaded, 26606 targets configured).
    ......
    INFO: Found 1 target...
    [15,493 / 15,513] 8 actions, 4 running Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2; 7s local
    Target //tensorflow/tools/pip_package:build_pip_package up-to-date: bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 9638.762s, Critical Path: 7750.25s
    INFO: 14845 processes: 14845 local. INFO: Build completed successfully, 14992 total actions
  1. 编译终于成功了, 总耗时 2小时40分钟39秒。。

生成whl文件安装Tensorflow

参考官方引导 从源代码构建 | TensorFlow - 构建软件包

执行如下命令

1
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

控制台输出

1
2
3
4
5
6
7
8
9
10
11
12
13
[[email protected] tensorflow-2.4.1]# ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Fri Feb 5 16:37:59 UTC 2021 : === Preparing sources in dir: /tmp/tmp.xc3LmhLdsV
~/download/tensorflow-2.4.1 ~/download/tensorflow-2.4.1
~/download/tensorflow-2.4.1
~/download/tensorflow-2.4.1/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow ~/download/tensorflow-2.4.1
~/download/tensorflow-2.4.1
/tmp/tmp.xc3LmhLdsV/tensorflow/include ~/download/tensorflow-2.4.1
~/download/tensorflow-2.4.1
Fri Feb 5 16:38:19 UTC 2021 : === Building wheel
warning: no files found matching 'README'
...
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
Fri Feb 5 16:38:56 UTC 2021 : === Output wheel file is in: /tmp/tensorflow_pkg

至此, Tensorflow编译过程结束

安装Tensorflow

1
pip3 install /tmp/tensorflow_pkg/tensorflow-2.4.1-cp38-cp38-linux_aarch64.whl

然后又出现错误

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
[[email protected] tensorflow-2.4.1]# pip3 /tmp/tensorflow_pkg/tensorflow-2.4.1-cp38-cp38-linux_aarch64.whl 
ERROR: unknown command "/tmp/tensorflow_pkg/tensorflow-2.4.1-cp38-cp38-linux_aarch64.whl"
[[email protected] tensorflow-2.4.1]# pip3 install /tmp/tensorflow_pkg/tensorflow-2.4.1-cp38-cp38-linux_aarch64.whl
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Processing /tmp/tensorflow_pkg/tensorflow-2.4.1-cp38-cp38-linux_aarch64.whl
...
creating None
creating None/tmp
creating None/tmp/tmprhi4x77s
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.8 -c /tmp/tmprhi4x77s/a.c -o None/tmp/tmprhi4x77s/a.o
Traceback (most recent call last):
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 117, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
...
File "/usr/local/lib/python3.8/distutils/spawn.py", line 157, in _spawn_posix
raise DistutilsExecError(
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/src/python/grpcio/commands.py", line 264, in build_extensions
build_ext.build_ext.build_extensions(self)
...
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/setup.py", line 448, in <module>
setuptools.setup(
...
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/src/python/grpcio/commands.py", line 268, in build_extensions
raise CommandError(
commands.CommandError: Failed `build_ext` step:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 117, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
...
File "/usr/local/lib/python3.8/distutils/spawn.py", line 157, in _spawn_posix
raise DistutilsExecError(
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/src/python/grpcio/commands.py", line 264, in build_extensions
build_ext.build_ext.build_extensions(self)
...
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'gcc' failed with exit status 1

----------------------------------------
ERROR: Failed building wheel for grpcio
Running setup.py clean for grpcio
Building wheel for h5py (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kzx34i5t/h5py_e6d17e17103d4499aed3a8bb670913cc/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kzx34i5t/h5py_e6d17e17103d4499aed3a8bb670913cc/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-5mhhdi83
cwd: /tmp/pip-install-kzx34i5t/h5py_e6d17e17103d4499aed3a8bb670913cc/
Complete output (64 lines):
...
Loading library to get version: libhdf5.so
error: libhdf5.so: cannot open shared object file: No such file or directory
----------------------------------------
ERROR: Failed building wheel for h5py
Running setup.py clean for h5py
Building wheel for termcolor (setup.py) ... done
Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4829 sha256=c661e70e3028f244fac6d4cfa199d6599a89995531ce30945d91f622ea9bfda4
Stored in directory: /root/.cache/pip/wheels/b6/0a/af/edcc2d17bf3441ecfa19393aa5cfbc60af7cb00db2b9763c10
Building wheel for wrapt (setup.py) ... done
Created wheel for wrapt: filename=wrapt-1.12.1-cp38-cp38-linux_aarch64.whl size=72096 sha256=aba1fa72f3efc17dcb841450a0cec14bd9ee5f267aefb1dd72e2f087bc81a6f8
Stored in directory: /root/.cache/pip/wheels/51/11/73/14504528339206a620d8c024472c3073d06f56f1714ce9ae83
Successfully built termcolor wrapt
Failed to build grpcio h5py
Installing collected packages: urllib3, pyasn1, idna, chardet, certifi, six, rsa, requests, pyasn1-modules, oauthlib, cachetools, requests-oauthlib, google-auth, werkzeug, tensorboard-plugin-wit, protobuf, numpy, markdown, grpcio, google-auth-oauthlib, absl-py, wrapt, typing-extensions, termcolor, tensorflow-estimator, tensorboard, opt-einsum, h5py, google-pasta, gast, flatbuffers, astunparse, tensorflow
Attempting uninstall: numpy
Found existing installation: numpy 1.20.0
Uninstalling numpy-1.20.0:
Successfully uninstalled numpy-1.20.0
Running setup.py install for grpcio ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-qz54i1sf/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/grpcio
cwd: /tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/
Complete output (847 lines):
/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/src/python/grpcio/commands.py:104: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if exit_code is not 0:
ASM Builds for BoringSSL currently not supported on: linux-aarch64
Cython-generated files are missing...
...
creating None/tmp/tmpo7bvr3zb
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.8 -c /tmp/tmpo7bvr3zb/a.c -o None/tmp/tmpo7bvr3zb/a.o
Traceback (most recent call last):
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 117, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
...
File "/usr/local/lib/python3.8/distutils/spawn.py", line 157, in _spawn_posix
raise DistutilsExecError(
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/src/python/grpcio/commands.py", line 264, in build_extensions
build_ext.build_ext.build_extensions(self)
...
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/setup.py", line 448, in <module>
setuptools.setup(
...
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/src/python/grpcio/commands.py", line 268, in build_extensions
raise CommandError(
commands.CommandError: Failed `build_ext` step:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 117, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
...
File "/usr/local/lib/python3.8/distutils/spawn.py", line 157, in _spawn_posix
raise DistutilsExecError(
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/src/python/grpcio/commands.py", line 264, in build_extensions
build_ext.build_ext.build_extensions(self)
...
File "/usr/local/lib/python3.8/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'gcc' failed with exit status 1

----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python3.8 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kzx34i5t/grpcio_1c63fd6dfd80475d95e24de0055b36c1/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-qz54i1sf/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/grpcio Check the logs for full command output.

从中看出有以下错误, grpcio的错误, 以及h5py的错误

1
2
3
4
5
6
7
8
...
ERROR: Failed building wheel for grpcio
Running setup.py clean for grpcio
Building wheel for h5py (setup.py) ... error
ERROR: Command errored out with exit status 1
...
error: libhdf5.so: cannot open shared object file: No such file or directory
...

首先, 修复grpcio 的错误, 既然自动安装不了, 就从pip中下载来, 手动安装下

1
2
3
4
wget -c https://files.pythonhosted.org/packages/0e/5f/eeb402746a65839acdec78b7e757635f5e446138cc1d68589dfa32cba593/grpcio-1.32.0.tar.gz
tar -zxvf grpcio-1.32.0.tar.gz
cd grpcio-1.32.0
python3 setup.py install

安装grpcio返回错误

1
2
3
4
5
6
7
8
9
10
11
12
[[email protected] ~]# cd grpcio-1.32.0
[[email protected] grpcio-1.32.0]# python3 setup.py install
Traceback (most recent call last):
File "setup.py", line 229, in <module>
if check_linker_need_libatomic():
File "setup.py", line 176, in check_linker_need_libatomic
cpp_test = subprocess.Popen([cxx, '-x', 'c++', '-std=c++11', '-'],
File "/usr/local/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'c++'

添加之前安装的gcc5.5的环境变量后(其实也可以把/usr/local/gcc5.5/bin放到环境变量中。。。), 重新执行 python3 setup.py install

1
PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin python3 setup.py install

控制台返回

1
2
3
4
5
6
7
8
9
Installed /usr/local/lib/python3.8/site-packages/grpcio-1.32.0-py3.8-linux-aarch64.egg
Processing dependencies for grpcio==1.32.0
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/site-packages
Finished processing dependencies for grpcio==1.32.0
[[email protected] grpcio-1.32.0]#

然后尝试安装h5py(这里选择2.10.0, 是因为, tensorflow在安装的时候, 就是下载这个版本的)

1
2
3
4
wget -c https://files.pythonhosted.org/packages/5f/97/a58afbcf40e8abecededd9512978b4e4915374e5b80049af082f49cebe9a/h5py-2.10.0.tar.gz
tar -zxvf h5py-2.10.0.tar.gz
cd h5py-2.10.0
PATH=/usr/local/gcc5.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin python3 setup.py install

控制台返回

1
2
3
4
5
...
copying h5py/tests/test_vds/test_virtual_source.py -> build/lib.linux-aarch64-3.8/h5py/tests/test_vds
running build_ext
Loading library to get version: libhdf5.so
error: libhdf5.so: cannot open shared object file: No such file or directory

看样子得安装hdf5,找到官网, 下载 hdf5-1.12.0.tar.gz, 我先把gcc5.5放到环境变量好了,每次都PATH下太累了。。

1
2
3
echo 'export PATH=/usr/local/gcc5.5/bin:$PATH' >> /etc/profile

# 这句是将 export PATH=/usr/local/gcc5.5/bin:$PATH 添加到 /etc/profile 的底部

控制台输出

1
2
3
4
5
6
7
8
9
[[email protected] download]# echo 'export PATH=/usr/local/gcc5.5/bin:$PATH' >> /etc/profile
[[email protected] download]# source /etc/profile
[[email protected] download]# gcc --version
gcc (GCC) 5.5.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[[email protected] download]#

继续安装 hdf5

1
2
3
4
5
6
wget -c https://hdf-wordpress-1.s3.amazonaws.com/wp-content/uploads/manual/HDF5/HDF5_1_12_0/source/hdf5-1.12.0.tar.gz # 这个地址最好去上面访问下
tar -zxvf hdf5-1.12.0.tar.gz
cd hdf5-1.12.0
./configure --prefix=/usr/local/hdf5
make -j
make install

回到h5py目录

1
LD_LIBRARY_PATH=/usr/local/hdf5/lib C_INCLUDE_PATH=/usr/local/hdf5/include python3 setup.py install

报错。。。

1
2
3
4
5
6
7
In file included from /usr/local/hdf5/include/H5public.h:32:0,
from /usr/local/hdf5/include/hdf5.h:22,
from /root/h5py-2.10.0/h5py/api_compat.h:27,
from /root/h5py-2.10.0/h5py/defs.c:654:
/root/h5py-2.10.0/h5py/defs.c: In function '__pyx_f_4h5py_4defs_H5Sencode':
*** /root/h5py-2.10.0/h5py/defs.c:34523:15: error: too few arguments to function 'H5Sencode2' ***
__pyx_t_1 = H5Sencode(__pyx_v_obj_id, __pyx_v_buf, __pyx_v_nalloc); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 3303, __pyx_L1_error)

Why is HDF5 giving a “too few arguments” error here?回答, 似乎hdf5版本太高了,降一个版本, 在网站上下载 hdf5-1.10.7.tar.gz看看

1
2
3
4
5
6
https_proxy=http://127.0.0.1:7890/ wget -c https://hdf-wordpress-1.s3.amazonaws.com/wp-content/uploads/manual/HDF5/HDF5_1_10_7/src/hdf5-1.10.7.tar.gz
tar -zxvf hdf5-1.10.7.tar.gz
cd hdf5-1.10.7
./configure --prefix=/usr/local/hdf5-1.10
make -j
make install

回到h5py目录, 然后再进行h5py的安装

1
LD_LIBRARY_PATH=/usr/local/hdf5-1.10/lib C_INCLUDE_PATH=/usr/local/hdf5-1.10/include python3 setup.py install

控制台错误

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
[[email protected] h5py-2.10.0]# python3 setup.py install
running install
running bdist_egg
...
running build_ext
Loading library to get version: libhdf5.so
Autodetected HDF5 1.10.7
********************************************************************************
Summary of the h5py configuration
Path to HDF5: None
HDF5 Version: '1.10.7'
MPI Enabled: False
Rebuild Required: True
********************************************************************************
Executing cythonize()
[ 1/23] Cythonizing /root/h5py-2.10.0/h5py/_conv.pyx
...
[23/23] Cythonizing /root/h5py-2.10.0/h5py/utils.pyx
building 'h5py.defs' extension
...
npy_1_7_deprecated_api.h:17:2:
#
^
/root/h5py-2.10.0/h5py/defs.c: In function '__pyx_f_4h5py_4defs_H5Pget_driver_info':
/root/h5py-2.10.0/h5py/defs.c:21763:13:
__pyx_t_1 = H5Pget_driver_info(__pyx_v_plist_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2016, __pyx_L1_error)
^
In file included from /usr/local/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:21:0,
from /usr/local/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /root/h5py-2.10.0/h5py/api_compat.h:26,
from /root/h5py-2.10.0/h5py/defs.c:654:
/root/h5py-2.10.0/h5py/defs.c: At top level:
/usr/local/lib/python3.8/site-packages/numpy/core/include/numpy/__multiarray_api.h:1463:1:
_import_array(void)
^
gcc -pthread -shared build/temp.linux-aarch64-3.8/root/h5py-2.10.0/h5py/defs.o -L/opt/local/lib -L/usr/local/lib -Wl,--enable-new-dtags,-R/opt/local/lib -Wl,--enable-new-dtags,-R/usr/local/lib -lhdf5 -lhdf5_hl -o build/lib.linux-aarch64-3.8/h5py/defs.cpython-38-aarch64-linux-gnu.so
/usr/bin/ld: cannot find -lhdf5
/usr/bin/ld: cannot find -lhdf5_hl
collect2: error: ld returned 1 exit status
error: command 'gcc' failed with exit status 1

大意是,读不到hdf5和hdf5_hl,LD_LIBRARY_PATH没生效,看上面人家只从 /opt/local/lib,/usr/local/lib中读取,所以只能将这2个so文件链接到人家能读到的目录下,应该就成了

1
2
ln -s /usr/local/hdf5-1.10/lib/libhdf5.so /usr/local/lib/
ln -s /usr/local/hdf5-1.10/lib/libhdf5_hl.so /usr/local/lib/

再运行一次

1
python3 setup.py install

控制台输出

1
2
3
4
...
Using /usr/local/lib/python3.8/site-packages
Finished processing dependencies for h5py==2.10.0
[[email protected] h5py-2.10.0]#

成功了, 重新安装tensorflow

1
pip3 install /tmp/tensorflow_pkg/tensorflow-2.4.1-cp38-cp38-linux_aarch64.whl

控制台输出

1
2
3
4
5
6
7
Installing collected packages: grpcio, google-auth-oauthlib, absl-py, wrapt, typing-extensions, termcolor, tensorflow-estimator, tensorboard, opt-einsum, google-pasta, gast, flatbuffers, astunparse, tensorflow
Attempting uninstall: grpcio
Found existing installation: grpcio 1.35.0
Uninstalling grpcio-1.35.0:
Successfully uninstalled grpcio-1.35.0
Successfully installed absl-py-0.11.0 astunparse-1.6.3 flatbuffers-1.12 gast-0.3.3 google-auth-oauthlib-0.4.2 google-pasta-0.2.0 grpcio-1.32.0 opt-einsum-3.3.0 tensorboard-2.4.1 tensorflow-2.4.1 tensorflow-estimator-2.4.0 termcolor-1.1.0 typing-extensions-3.7.4.3 wrapt-1.12.1
[[email protected] h5py-2.10.0]#

至此,安装结束。。。