Files
cpp-httplib/test/run_issue_2431_repro.sh
yhirose d14e4fc05f Reproducer test for #2431 (getaddrinfo_a use-after-free) (#2433)
* Add reproducer for #2431 (getaddrinfo_a use-after-free)

On Linux/glibc, getaddrinfo_with_timeout() runs DNS asynchronously via
getaddrinfo_a(GAI_NOWAIT) using a stack-local gaicb. When gai_suspend()
hits the connection timeout, gai_cancel() is called and the function
returns immediately — but gai_cancel() is non-blocking and can return
EAI_NOTCANCELED, leaving the resolver worker thread alive and still
referencing the destroyed stack frame.

Adds three opt-in gtest cases (GetAddrInfoAsyncCancelTest.*) that
exercise the cancel path repeatedly. They are gated on Linux/glibc +
CPPHTTPLIB_USE_NON_BLOCKING_GETADDRINFO at compile time, and on the
CPPHTTPLIB_TEST_ISSUE_2431=1 env var at runtime, so a normal `make
test` run is unaffected.

Also adds a dedicated CI job (issue-2431-repro) and a Docker-based
local runner (test/run_issue_2431_repro.sh) that sinkhole UDP/53 so
the timeout branch is taken, and run the test under ASAN/LSAN. With
the bug present these runs are expected to fail; with a fix applied
they should pass.

Refs: https://github.com/yhirose/cpp-httplib/issues/2431

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix split build for #2431 reproducer tests

The new GetAddrInfoAsyncCancelTest cases call detail::getaddrinfo_with_timeout
directly. In split builds (make test_split) split.py moves the definition into
httplib.cc and strips `inline`, so the symbol is not declared in the public
httplib.h and test.cc fails to compile -- breaking the ubuntu/test-no-exceptions
CI jobs that the PR description says should be unaffected.

Add a forward declaration in test.cc, gated by the same #if as the tests
themselves, so it links against the split-build symbol without changing the
header-only build.

* Cap issue-2431 repro job at 5 minutes

The bug manifests as orphan getaddrinfo_a resolver workers that keep the
runner from completing job teardown -- the previous run had all steps
succeed in ~1m37s but then hung in "Cleaning up orphan processes" for
~57m before GitHub force-killed the job.

A job-level timeout-minutes makes the failure signal fast and predictable:
bug present -> killed at 5 min, bug fixed -> ~2 min pass. Step-level timeout
isn't enough since the hang is in post-job cleanup, not the test step.

* Enable ASAN detect_stack_use_after_return for #2431 repro

The bug is a textbook stack-use-after-return: a stack-local struct gaicb
is destroyed when getaddrinfo_with_timeout returns after gai_cancel()
yields EAI_NOTCANCELED, then the still-live resolver worker thread writes
back into the freed frame. ASAN's detect_stack_use_after_return is the
direct detector for exactly this pattern -- enabling it lets the failure
surface as a clear ASAN diagnostic during the test run instead of as an
orphan-process hang at job teardown.

* Revert ASAN detect_stack_use_after_return for #2431 repro

The option did not detect the bug in CI -- the resolver worker write
likely lands on the heap (via the gaicb's pai pointer) or happens after
the test process exits, neither of which stack-use-after-return can
catch. Roll back to relying on the job-level timeout: bug present ->
post-cleanup hangs ~8min then job-level timeout cancels at 10min total;
bug fixed -> job completes in ~2min.

* Switch issue-2431 repro to a delayed loopback DNS test fixture

The previous repro setup dropped UDP/53 outright, which made glibc's
resolver hang forever on every lookup -- the worker never actually
received a response and so never reached the buggy write-back path
that #2431 is about. As a result, neither the broken HEAD nor the
fix made any visible difference in CI: both produced "tests pass +
post-cleanup hangs ~10min" because the orphan resolver thread is a
structural property of *any* getaddrinfo path on a hung resolver,
not a property of the bug.

Replace the sinkhole with a small loopback test fixture
(test/dns_test_fixture.py, ~50 lines, stdlib only) that answers DNS
queries after a 3s delay -- longer than the test's 1s timeout. An
iptables NAT rule routes the test job's lookups to the fixture
without touching /etc/resolv.conf, so the rest of the runner's DNS
behaviour is unaffected.

With ASAN's detect_stack_use_after_return enabled, the worker's
late write-back into the destroyed gaicb stack frame is now caught
as a stack-use-after-return diagnostic, so the broken HEAD fails
fast at the test step (clear red) and the fix turns the same job
green in well under a minute.

Same fixture is wired into both the GitHub Actions job and the
docker-based test/run_issue_2431_repro.sh script, so local repro on
macOS and CI repro on Linux exercise the identical path.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 18:17:19 +09:00

103 lines
3.5 KiB
Bash
Executable File

#!/usr/bin/env bash
# Reproducer runner for Issue #2431
# (https://github.com/yhirose/cpp-httplib/issues/2431).
#
# Spins up an Ubuntu container, runs the loopback DNS test fixture
# (test/dns_test_fixture.py), routes the container's DNS lookups to
# that fixture via an iptables NAT rule, builds the test suite with
# g++ + ASAN, and runs the GetAddrInfoAsyncCancelTest cases.
#
# Expected outcomes:
# - HEAD prior to the fix: ASAN reports stack-use-after-return inside
# getaddrinfo_with_timeout's getaddrinfo_a path during one of the
# GetAddrInfoAsyncCancelTest cases.
# - HEAD with the fix applied: all three cases PASS.
#
# Usage:
# bash test/run_issue_2431_repro.sh
#
# Requirements: Docker (Linux container support). The container needs
# --privileged because the test binary uses `setarch -R` to disable ASLR
# for ASAN compatibility, and because the test job manages iptables
# rules inside the container.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
docker run --rm --privileged \
-v "$REPO_ROOT:/work" \
-w /work/test \
ubuntu:24.04 bash -c '
set -euo pipefail
export DEBIAN_FRONTEND=noninteractive
apt-get update -qq
apt-get install -y -qq --no-install-recommends \
ca-certificates g++ make pkg-config iptables iproute2 util-linux coreutils file \
python3 \
libssl-dev zlib1g-dev libbrotli-dev libzstd-dev libcurl4-openssl-dev \
>/dev/null
# Force DNS-only resolution: Ubuntu defaults nsswitch.conf to
# "hosts: files mdns4_minimal [NOTFOUND=return] dns ...", which
# short-circuits to NOTFOUND before reaching glibc DNS code, so the
# gai_cancel() branch never gets exercised.
sed -i "s/^hosts:.*/hosts: dns/" /etc/nsswitch.conf
# Start the loopback DNS test fixture (delayed UDP responder).
DNS_FIXTURE_PORT=15353
DNS_FIXTURE_DELAY=3
python3 /work/test/dns_test_fixture.py "$DNS_FIXTURE_PORT" "$DNS_FIXTURE_DELAY" \
>/tmp/dns_fixture.log 2>&1 &
FIXTURE_PID=$!
# Route the container DNS lookups to the fixture; conntrack handles the
# reply path automatically. /etc/resolv.conf is left untouched.
iptables -t nat -I OUTPUT -p udp --dport 53 \
-j REDIRECT --to-port "$DNS_FIXTURE_PORT"
trap '"'"'iptables -t nat -F OUTPUT 2>/dev/null || true; kill "$FIXTURE_PID" 2>/dev/null || true'"'"' EXIT
# Wait for the fixture to start listening.
for _ in $(seq 1 50); do
if ss -lun "( sport = :$DNS_FIXTURE_PORT )" | grep -q ":$DNS_FIXTURE_PORT"; then
break
fi
sleep 0.1
done
ss -lun "( sport = :$DNS_FIXTURE_PORT )" | grep -q ":$DNS_FIXTURE_PORT" || {
echo "ERROR: dns_test_fixture failed to start" >&2
cat /tmp/dns_fixture.log >&2 || true
exit 1
}
# Sanity check: a DNS lookup must take at least the fixture delay
# (proving the NAT rule routes the query to the fixture).
start=$(date +%s)
getent hosts unresolvable-host.invalid >/dev/null 2>&1 || true
elapsed=$(( $(date +%s) - start ))
if [ "$elapsed" -lt 2 ]; then
echo "ERROR: lookup returned in ${elapsed}s; fixture not in DNS path" >&2
exit 1
fi
echo "[ok] DNS lookups are routed to the test fixture (took ${elapsed}s)"
cd /work/test
echo "=== building test binary (g++ + ASAN) ==="
make CXX=g++ test 2>&1 | tail -5
ARCH=$(uname -m)
echo "=== running GetAddrInfoAsyncCancelTest with CPPHTTPLIB_TEST_ISSUE_2431=1 ==="
set +e
CPPHTTPLIB_TEST_ISSUE_2431=1 \
ASAN_OPTIONS=detect_stack_use_after_return=1 \
setarch "$ARCH" -R \
./test --gtest_filter="GetAddrInfoAsyncCancelTest.*" 2>&1
rc=$?
set -e
echo "=== test exit: $rc ==="
exit $rc
'