mirror of
https://github.com/yhirose/cpp-httplib.git
synced 2026-06-10 16:47:14 +00:00
* Add reproducer for #2431 (getaddrinfo_a use-after-free) On Linux/glibc, getaddrinfo_with_timeout() runs DNS asynchronously via getaddrinfo_a(GAI_NOWAIT) using a stack-local gaicb. When gai_suspend() hits the connection timeout, gai_cancel() is called and the function returns immediately — but gai_cancel() is non-blocking and can return EAI_NOTCANCELED, leaving the resolver worker thread alive and still referencing the destroyed stack frame. Adds three opt-in gtest cases (GetAddrInfoAsyncCancelTest.*) that exercise the cancel path repeatedly. They are gated on Linux/glibc + CPPHTTPLIB_USE_NON_BLOCKING_GETADDRINFO at compile time, and on the CPPHTTPLIB_TEST_ISSUE_2431=1 env var at runtime, so a normal `make test` run is unaffected. Also adds a dedicated CI job (issue-2431-repro) and a Docker-based local runner (test/run_issue_2431_repro.sh) that sinkhole UDP/53 so the timeout branch is taken, and run the test under ASAN/LSAN. With the bug present these runs are expected to fail; with a fix applied they should pass. Refs: https://github.com/yhirose/cpp-httplib/issues/2431 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix split build for #2431 reproducer tests The new GetAddrInfoAsyncCancelTest cases call detail::getaddrinfo_with_timeout directly. In split builds (make test_split) split.py moves the definition into httplib.cc and strips `inline`, so the symbol is not declared in the public httplib.h and test.cc fails to compile -- breaking the ubuntu/test-no-exceptions CI jobs that the PR description says should be unaffected. Add a forward declaration in test.cc, gated by the same #if as the tests themselves, so it links against the split-build symbol without changing the header-only build. * Cap issue-2431 repro job at 5 minutes The bug manifests as orphan getaddrinfo_a resolver workers that keep the runner from completing job teardown -- the previous run had all steps succeed in ~1m37s but then hung in "Cleaning up orphan processes" for ~57m before GitHub force-killed the job. A job-level timeout-minutes makes the failure signal fast and predictable: bug present -> killed at 5 min, bug fixed -> ~2 min pass. Step-level timeout isn't enough since the hang is in post-job cleanup, not the test step. * Enable ASAN detect_stack_use_after_return for #2431 repro The bug is a textbook stack-use-after-return: a stack-local struct gaicb is destroyed when getaddrinfo_with_timeout returns after gai_cancel() yields EAI_NOTCANCELED, then the still-live resolver worker thread writes back into the freed frame. ASAN's detect_stack_use_after_return is the direct detector for exactly this pattern -- enabling it lets the failure surface as a clear ASAN diagnostic during the test run instead of as an orphan-process hang at job teardown. * Revert ASAN detect_stack_use_after_return for #2431 repro The option did not detect the bug in CI -- the resolver worker write likely lands on the heap (via the gaicb's pai pointer) or happens after the test process exits, neither of which stack-use-after-return can catch. Roll back to relying on the job-level timeout: bug present -> post-cleanup hangs ~8min then job-level timeout cancels at 10min total; bug fixed -> job completes in ~2min. * Switch issue-2431 repro to a delayed loopback DNS test fixture The previous repro setup dropped UDP/53 outright, which made glibc's resolver hang forever on every lookup -- the worker never actually received a response and so never reached the buggy write-back path that #2431 is about. As a result, neither the broken HEAD nor the fix made any visible difference in CI: both produced "tests pass + post-cleanup hangs ~10min" because the orphan resolver thread is a structural property of *any* getaddrinfo path on a hung resolver, not a property of the bug. Replace the sinkhole with a small loopback test fixture (test/dns_test_fixture.py, ~50 lines, stdlib only) that answers DNS queries after a 3s delay -- longer than the test's 1s timeout. An iptables NAT rule routes the test job's lookups to the fixture without touching /etc/resolv.conf, so the rest of the runner's DNS behaviour is unaffected. With ASAN's detect_stack_use_after_return enabled, the worker's late write-back into the destroyed gaicb stack frame is now caught as a stack-use-after-return diagnostic, so the broken HEAD fails fast at the test step (clear red) and the fix turns the same job green in well under a minute. Same fixture is wired into both the GitHub Actions job and the docker-based test/run_issue_2431_repro.sh script, so local repro on macOS and CI repro on Linux exercise the identical path. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
102
test/run_issue_2431_repro.sh
Executable file
102
test/run_issue_2431_repro.sh
Executable file
@@ -0,0 +1,102 @@
|
||||
#!/usr/bin/env bash
|
||||
# Reproducer runner for Issue #2431
|
||||
# (https://github.com/yhirose/cpp-httplib/issues/2431).
|
||||
#
|
||||
# Spins up an Ubuntu container, runs the loopback DNS test fixture
|
||||
# (test/dns_test_fixture.py), routes the container's DNS lookups to
|
||||
# that fixture via an iptables NAT rule, builds the test suite with
|
||||
# g++ + ASAN, and runs the GetAddrInfoAsyncCancelTest cases.
|
||||
#
|
||||
# Expected outcomes:
|
||||
# - HEAD prior to the fix: ASAN reports stack-use-after-return inside
|
||||
# getaddrinfo_with_timeout's getaddrinfo_a path during one of the
|
||||
# GetAddrInfoAsyncCancelTest cases.
|
||||
# - HEAD with the fix applied: all three cases PASS.
|
||||
#
|
||||
# Usage:
|
||||
# bash test/run_issue_2431_repro.sh
|
||||
#
|
||||
# Requirements: Docker (Linux container support). The container needs
|
||||
# --privileged because the test binary uses `setarch -R` to disable ASLR
|
||||
# for ASAN compatibility, and because the test job manages iptables
|
||||
# rules inside the container.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
|
||||
docker run --rm --privileged \
|
||||
-v "$REPO_ROOT:/work" \
|
||||
-w /work/test \
|
||||
ubuntu:24.04 bash -c '
|
||||
set -euo pipefail
|
||||
export DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
apt-get update -qq
|
||||
apt-get install -y -qq --no-install-recommends \
|
||||
ca-certificates g++ make pkg-config iptables iproute2 util-linux coreutils file \
|
||||
python3 \
|
||||
libssl-dev zlib1g-dev libbrotli-dev libzstd-dev libcurl4-openssl-dev \
|
||||
>/dev/null
|
||||
|
||||
# Force DNS-only resolution: Ubuntu defaults nsswitch.conf to
|
||||
# "hosts: files mdns4_minimal [NOTFOUND=return] dns ...", which
|
||||
# short-circuits to NOTFOUND before reaching glibc DNS code, so the
|
||||
# gai_cancel() branch never gets exercised.
|
||||
sed -i "s/^hosts:.*/hosts: dns/" /etc/nsswitch.conf
|
||||
|
||||
# Start the loopback DNS test fixture (delayed UDP responder).
|
||||
DNS_FIXTURE_PORT=15353
|
||||
DNS_FIXTURE_DELAY=3
|
||||
python3 /work/test/dns_test_fixture.py "$DNS_FIXTURE_PORT" "$DNS_FIXTURE_DELAY" \
|
||||
>/tmp/dns_fixture.log 2>&1 &
|
||||
FIXTURE_PID=$!
|
||||
|
||||
# Route the container DNS lookups to the fixture; conntrack handles the
|
||||
# reply path automatically. /etc/resolv.conf is left untouched.
|
||||
iptables -t nat -I OUTPUT -p udp --dport 53 \
|
||||
-j REDIRECT --to-port "$DNS_FIXTURE_PORT"
|
||||
|
||||
trap '"'"'iptables -t nat -F OUTPUT 2>/dev/null || true; kill "$FIXTURE_PID" 2>/dev/null || true'"'"' EXIT
|
||||
|
||||
# Wait for the fixture to start listening.
|
||||
for _ in $(seq 1 50); do
|
||||
if ss -lun "( sport = :$DNS_FIXTURE_PORT )" | grep -q ":$DNS_FIXTURE_PORT"; then
|
||||
break
|
||||
fi
|
||||
sleep 0.1
|
||||
done
|
||||
ss -lun "( sport = :$DNS_FIXTURE_PORT )" | grep -q ":$DNS_FIXTURE_PORT" || {
|
||||
echo "ERROR: dns_test_fixture failed to start" >&2
|
||||
cat /tmp/dns_fixture.log >&2 || true
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Sanity check: a DNS lookup must take at least the fixture delay
|
||||
# (proving the NAT rule routes the query to the fixture).
|
||||
start=$(date +%s)
|
||||
getent hosts unresolvable-host.invalid >/dev/null 2>&1 || true
|
||||
elapsed=$(( $(date +%s) - start ))
|
||||
if [ "$elapsed" -lt 2 ]; then
|
||||
echo "ERROR: lookup returned in ${elapsed}s; fixture not in DNS path" >&2
|
||||
exit 1
|
||||
fi
|
||||
echo "[ok] DNS lookups are routed to the test fixture (took ${elapsed}s)"
|
||||
|
||||
cd /work/test
|
||||
echo "=== building test binary (g++ + ASAN) ==="
|
||||
make CXX=g++ test 2>&1 | tail -5
|
||||
|
||||
ARCH=$(uname -m)
|
||||
echo "=== running GetAddrInfoAsyncCancelTest with CPPHTTPLIB_TEST_ISSUE_2431=1 ==="
|
||||
set +e
|
||||
CPPHTTPLIB_TEST_ISSUE_2431=1 \
|
||||
ASAN_OPTIONS=detect_stack_use_after_return=1 \
|
||||
setarch "$ARCH" -R \
|
||||
./test --gtest_filter="GetAddrInfoAsyncCancelTest.*" 2>&1
|
||||
rc=$?
|
||||
set -e
|
||||
echo "=== test exit: $rc ==="
|
||||
exit $rc
|
||||
'
|
||||
Reference in New Issue
Block a user