Building x265 with NEON Support in macOS with Brew
This issue has now been fixed, but it has not made it into a release yet. All you have to do now is brew install x265 --HEAD
!
Original Post #
With Brew, the current release of x265 doesn’t support NEON, so encodes on Apple Silicon are much slower than expected. If you try to build from HEAD, you’ll run into this issue. Someone has posted a patch to fix this, but it hasn’t even made it into a PR yet so we have to do some manual work to get it installed.
Modifying the Brew Formula #
Luckily for us, Brew has a system in place for patching formulas. To start editing the formula, run brew edit x265
. Here’s the patch which details what we’ll be changing:
--- a/x265.rb 2023-02-28 16:35:34
+++ b/x265.rb 2023-02-28 15:34:27
@@ -25,6 +25,8 @@
depends_on "nasm" => :build
end
+ patch :DATA
+
def install
# Build based off the script at ./build/linux/multilib.sh
args = std_cmake_args + %W[
@@ -79,3 +81,34 @@
assert_equal header.unpack("m"), [x265_path.read(10)]
end
end
+
+__END__
+diff --git a/source/common/aarch64/asm.S b/source/common/aarch64/asm.S
+index 399c37cf2..b81fb254e 100644
+--- a/source/common/aarch64/asm.S
++++ b/source/common/aarch64/asm.S
+@@ -28,6 +28,11 @@
+ #define PFX2(prefix, name) PFX3(prefix, name)
+ #define PFX(name) PFX2(X265_NS, name)
+
++
++#define UPFX3(prefix, name) _ ## prefix ## _ ## name
++#define UPFX2(prefix, name) UPFX3(prefix, name)
++#define UPFX(name) UPFX2(X265_NS, name)
++
+ #ifdef __APPLE__
+ #define PREFIX 1
+ #endif
+diff --git a/source/common/aarch64/pixel-util.S b/source/common/aarch64/pixel-util.S
+index fba9a90d5..49c6b0492 100644
+--- a/source/common/aarch64/pixel-util.S
++++ b/source/common/aarch64/pixel-util.S
+@@ -2407,7 +2407,7 @@ function PFX(costCoeffNxN_neon)
+ // x5 - scanFlagMask
+ // x6 - baseCtx
+ mov x0, #0
+- movrel x1, x265_entropyStateBits
++ movrel x1, UPFX(entropyStateBits)
+ mov x4, #0
+ mov x11, #0
+ movi v31.16b, #0
First, we add patch :DATA
just before the install script. This tells Brew to apply the patch file that we place after the __END__
statement. We then add the __END__
statement, and put the patch from the issue below. For reference, here is my entire formula, since that wasn’t the best explanation:
class X265 < Formula
desc "H.265/HEVC encoder"
homepage "https://bitbucket.org/multicoreware/x265_git"
url "https://bitbucket.org/multicoreware/x265_git/get/3.5.tar.gz"
sha256 "5ca3403c08de4716719575ec56c686b1eb55b078c0fe50a064dcf1ac20af1618"
license "GPL-2.0-only"
head "https://bitbucket.org/multicoreware/x265_git.git", branch: "master"
bottle do
rebuild 1
sha256 cellar: :any, arm64_ventura: "fc0bf01af954762a85e8b808d5b03d28b9e36e8e71035783e39bb9dc0307abea"
sha256 cellar: :any, arm64_monterey: "e60559191a9aba607e512ad33ac9f66688b12837df7e6a3cf57ceae26968235b"
sha256 cellar: :any, arm64_big_sur: "adc617eed2e065af669994fb5b538195fd46db4ac7b13c7ca2490dc8abaf6466"
sha256 cellar: :any, ventura: "42bac1c3760905fc0f6c8ee2af2b97c5ef371d6135f6822357afe91f4014a2dd"
sha256 cellar: :any, monterey: "be446f5c7cb4872205f260b8821fc7ebd5bd7c4b8837888c98c08e051dff2e3f"
sha256 cellar: :any, big_sur: "55bb46a5dc1924e59b7fa7bc800a21c0cf21355e48cb38b941d8e786427c70a0"
sha256 cellar: :any, catalina: "5e5bc106e1cf971a176dd5b37a61d28769e353f81102c011b4230cc8732eca7a"
sha256 cellar: :any, mojave: "c61ebdf9dcd4aedf5da2a7eb2b3a5154fd355c105a19a0471d43a3aa67f3cb88"
sha256 cellar: :any_skip_relocation, x86_64_linux: "c80f18988caea25e95ca87dd648f5ff8b0856e24d26adc8d68ca68cc6d4faabf"
end
depends_on "cmake" => :build
on_intel do
depends_on "nasm" => :build
end
patch :DATA
def install
# Build based off the script at ./build/linux/multilib.sh
args = std_cmake_args + %W[
-DLINKED_10BIT=ON
-DLINKED_12BIT=ON
-DEXTRA_LINK_FLAGS=-L.
-DEXTRA_LIB=x265_main10.a;x265_main12.a
-DCMAKE_INSTALL_RPATH=#{rpath}
]
high_bit_depth_args = std_cmake_args + %w[
-DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF
-DENABLE_SHARED=OFF -DENABLE_CLI=OFF
]
(buildpath/"8bit").mkpath
mkdir "10bit" do
system "cmake", buildpath/"source", "-DENABLE_HDR10_PLUS=ON", *high_bit_depth_args
system "make"
mv "libx265.a", buildpath/"8bit/libx265_main10.a"
end
mkdir "12bit" do
system "cmake", buildpath/"source", "-DMAIN12=ON", *high_bit_depth_args
system "make"
mv "libx265.a", buildpath/"8bit/libx265_main12.a"
end
cd "8bit" do
system "cmake", buildpath/"source", *args
system "make"
mv "libx265.a", "libx265_main.a"
if OS.mac?
system "libtool", "-static", "-o", "libx265.a", "libx265_main.a",
"libx265_main10.a", "libx265_main12.a"
else
system "ar", "cr", "libx265.a", "libx265_main.a", "libx265_main10.a",
"libx265_main12.a"
system "ranlib", "libx265.a"
end
system "make", "install"
end
end
test do
yuv_path = testpath/"raw.yuv"
x265_path = testpath/"x265.265"
yuv_path.binwrite "\xCO\xFF\xEE" * 3200
system bin/"x265", "--input-res", "80x80", "--fps", "1", yuv_path, x265_path
header = "AAAAAUABDAH//w=="
assert_equal header.unpack("m"), [x265_path.read(10)]
end
end
__END__
diff --git a/source/common/aarch64/asm.S b/source/common/aarch64/asm.S
index 399c37cf2..b81fb254e 100644
--- a/source/common/aarch64/asm.S
+++ b/source/common/aarch64/asm.S
@@ -28,6 +28,11 @@
#define PFX2(prefix, name) PFX3(prefix, name)
#define PFX(name) PFX2(X265_NS, name)
+
+#define UPFX3(prefix, name) _ ## prefix ## _ ## name
+#define UPFX2(prefix, name) UPFX3(prefix, name)
+#define UPFX(name) UPFX2(X265_NS, name)
+
#ifdef __APPLE__
#define PREFIX 1
#endif
diff --git a/source/common/aarch64/pixel-util.S b/source/common/aarch64/pixel-util.S
index fba9a90d5..49c6b0492 100644
--- a/source/common/aarch64/pixel-util.S
+++ b/source/common/aarch64/pixel-util.S
@@ -2407,7 +2407,7 @@ function PFX(costCoeffNxN_neon)
// x5 - scanFlagMask
// x6 - baseCtx
mov x0, #0
- movrel x1, x265_entropyStateBits
+ movrel x1, UPFX(entropyStateBits)
mov x4, #0
mov x11, #0
movi v31.16b, #0
Building The Brew Formula #
Now that we’ve edited the formula, we can build the formula. First, unlink your current installation of x265 with brew unlink x265
. Then, run HOMEBREW_NO_INSTALL_FROM_API=1 brew install x265 -vd --HEAD
HOMEBREW_NO_INSTALL_FROM_API=1
ensures that Brew doesn’t fetch the formula, and instead uses our edited local copy.-vd
makes Brew show debug info while building, which can be helpful when it breaks.--HEAD
makes Brew install from master instead of the (very old) release.
If everything goes to plan, x265 should build (as well as a few other things, such as ffmpeg due to the linking no longer being valid).
Results #
So, is doing this worth it? Yes! I observed a 2x increase in H265 encode speeds after doing this. When encoding Dolby Vision 1080p ProRes, I got 7.01 fps without NEON, and 14.80 fps with it. The specific command I used was:
ffmpeg -i 1080p.MOV -c:v libx265 -c:a copy -x265-params "repeat-headers=1:hrd=cbr:colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1):max-cll=1000,400" -tag:v hvc1 neon-1080p.mp4