Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.

Oct 30 01:22:14 ubuntu dockerd[2293083]: time="2024-10-30T01:22:14.752476083Z" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:59546" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:59546->127.0.0.53:53: i/o
 timeout" question=";sqs.ap-northeast-2.amazonaws.com.\tIN\t A" spanID=0e95ec0f4aa8fcbc traceID=c69346a57036fa48d3850134bb60b134
Oct 30 01:24:37 ubuntu newrelic-infra-service[3023646]: time="2024-10-30T01:24:37Z" level=warning msg="[engine] failed to flush chunk '3024031-1730251471.397479652.flb', retry in 9 seconds: task_id=0, input=tail.9 > output=newrelic.0 (out_id=0)" component=inte
grations.Supervisor output=stderr process=log-forwarder

μœ„ 였λ₯˜λŠ” AWS SDK Java 의 SQS ν΄λΌμ΄μ–ΈνŠΈλ₯Ό μ‚¬μš©ν•˜μ—¬ 큐에 λ“±λ‘λœ λ©”μ‹œμ§€λ₯Ό μ²˜λ¦¬ν•˜κΈ° μœ„ν•΄μ„œ HTTP 톡신을 μˆ˜ν–‰ν•  λ•Œ λ°œμƒν•  수 μžˆλŠ” μ˜ˆμ™Έ μƒν™©μž…λ‹ˆλ‹€. κ°œλ°œμžκ°€ μ•Œμ•„μ•Όν•  DNS와 같이 κ°œλ°œμžκ°€ DNS에 λŒ€ν•œ κ°œλ…μ„ μ•Œκ³  μžˆμ–΄λ„ μœ„μ™€ 같은 상황에 λŒ€ν•΄ 원인을 μ°Ύκ³  λΉ λ₯΄κ²Œ λŒ€μ²˜ν•  수 μžˆμ„κΉŒμš”? 그리고 이 λ„€νŠΈμ›Œν¬ λ¬Έμ œκ°€ λ°œμƒν•œ μ΄μœ λŠ” λ¬΄μ—‡μΌκΉŒμš”.

/etc/resolv.conf

μš°μ„  λ¦¬λˆ…μŠ€μ—μ„œλŠ” NetworkManagerλ₯Ό 톡해 /etc/resolv.conf 톡해 둜컬 DNS와 μ™ΈλΆ€ DNS에 λŒ€ν•œ 정보λ₯Ό κ΄€λ¦¬ν•©λ‹ˆλ‹€. ν•΄λ‹Ή λ¬Έμ œκ°€ λ°œμƒν•œ 사내 μ»΄ν“¨ν„°μ—λŠ” λΌμš°ν„°μ— λŒ€ν•œ 아이피와 Cloudflare(1.1.1.1)이 DNS μ„œλ²„λ‘œ μ§€μ •λ˜μ–΄ μžˆμ—ˆμŠ΅λ‹ˆλ‹€.

JVM의 DNS 캐싱 기본값은 30초

The Java virtual machine (JVM) caches DNS name lookups. When the JVM resolves a hostname to an IP address, it caches the IP address for a specified period of time, known as the time-to-live (TTL). Because AWS resources use DNS name entries that occasionally change, we recommend that you configure your JVM with a TTL value of 5 seconds.

AWS SDK Java μ—μ„œλŠ” InetAddress.getAllByNameλ₯Ό μ‚¬μš©ν•˜λ©° 이둜 인해 JVM의 DNS TTL 섀정에 μ˜μ‘΄ν•©λ‹ˆλ‹€. 그리고 λ‹€μŒμ€ Amazon Corretto 17의 java.security νŒŒμΌμ— 기재된 주석 μ„€λͺ…μž…λ‹ˆλ‹€. κ·ΈλŸ¬λ―€λ‘œ, κΈ°λ³Έμ μœΌλ‘œλŠ” (Security Managerλ₯Ό μ„€μ •ν•˜μ§€ μ•ŠκΈ° λ•Œλ¬Έμ—) 30초 λ™μ•ˆ DNS κ²°κ³Όλ₯Ό μΊμ‹±ν•˜κ²Œ λ©λ‹ˆλ‹€.

/usr/lib/jvm/java-17-amazon-corretto.aarch64/conf/security/java.security
# # The Java-level namelookup cache policy for successful lookups: # # any negative value: caching forever # any positive value: the number of seconds to cache an address for # zero: do not cache # # default value is forever (FOREVER). For security reasons, this # caching is made forever when a security manager is set. When a security # manager is not set, the default behavior in this implementation # is to cache for 30 seconds. # # NOTE: setting this to anything other than the default value can have # serious security implications. Do not set it unless # you are sure you are not exposed to DNS spoofing attack. # #networkaddress.cache.ttl=-1

λ”°λΌμ„œ, μ •μƒμ μœΌλ‘œ 싀행쀑인 μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ—μ„œ κ°‘μžκΈ° DNS μš”μ²­μ΄ μˆ˜ν–‰λ˜μ—ˆλŠ”μ§€λ₯Ό 이해할 수 있고, ν•΄λ‹Ή μš”μ²­μ„ μˆ˜ν–‰ν•œ μ‹œμ μ— DNS μ„œλ²„μ—μ„œλŠ” μš”μ²­μ— λŒ€ν•œ 응닡을 ν•  수 μ—†μ—ˆλ‹€λŠ” 것을 (failed to query external DNS server 였λ₯˜ λ©”μ‹œμ§€λ₯Ό 톡해) μ•Œ 수 있게 λ©λ‹ˆλ‹€.

DNS μš”μ²­μ΄ μ‹€νŒ¨ν•œ 이유

dig sqs.ap-northeast-2.amazonaws.com

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.13.8 <<>> sqs.ap-northeast-2.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45612
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;sqs.ap-northeast-2.amazonaws.com. IN   A

;; ANSWER SECTION:
sqs.ap-northeast-2.amazonaws.com. 16 IN A       3.34.228.79

;; Query time: 0 msec
;; SERVER: 192.168.0.2#53(192.168.0.2)
;; WHEN: Sun Nov 03 05:53:46 UTC 2024
;; MSG SIZE  rcvd: 77

dig(λ˜λŠ” nslookup) λͺ…λ Ήμ–΄λ₯Ό 톡해 sqs.ap-northeast-2.amazonaws.com에 λŒ€ν•œ DNS 질의λ₯Ό μˆ˜ν–‰ν•΄λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€. 정상적인 경우 μ•„λž˜μ™€ 같이 UDPλ₯Ό 톡해 DNS μ§ˆμ˜μ— λŒ€ν•œ κ²°κ³Όλ₯Ό 받을 수 μžˆμ–΄μ•Ό ν•©λ‹ˆλ‹€. μ•žμ„œ 였λ₯˜μ— λŒ€ν•œ λ©”μ‹œμ§€λ₯Ό μ‚΄νŽ΄λ³΄λ©΄ DNS μ§ˆμ˜μ— λŒ€ν•œ μš”μ²­μ΄ νƒ€μž„μ•„μ›ƒ λ˜μ–΄λ²„λ ΈμŠ΅λ‹ˆλ‹€. λ’€λŠ¦κ²Œ μ•Œκ²Œλœ μ •λ³΄μ΄μ§€λ§Œ μ‚¬λ‚΄μ—μ„œ ν”„λ‘œμ νŠΈ κ΄€λ ¨ λ‚΄μš©μ„ κ³΅μœ ν•˜κΈ° μœ„ν•΄ ꡬ글 λ“œλΌμ΄λΈŒμ— μ•½ 60GB 정도 λ˜λŠ” λ¬Έμ„œλ₯Ό μ—…λ‘œλ“œ 및 λ‹€μš΄λ‘œλ“œ ν–ˆλ‹€κ³  ν•©λ‹ˆλ‹€.

사싀 μ• ν”Œλ¦¬μΌ€μ΄μ…˜ μž…μž₯μ—μ„œ ν¬λ¦¬ν‹°μ»¬ν•œ λ¬Έμ œλŠ” μ•„λ‹ˆλ‹€

AWS SDKλ₯Ό 톡해 SQS λ©”μ‹œμ§€ 처리λ₯Ό μˆ˜ν–‰ν•˜λŠ” μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ€ μ‚¬μš©μžμ—κ²Œ μ „λ‹¬λœ 카카였 μ•Œλ¦Όν†‘ λ©”μ‹œμ§€μ— λŒ€ν•œ λ°œμ†‘ κ²°κ³Όλ₯Ό μˆ˜μ‹ ν•˜μ—¬ μ²˜λ¦¬ν•˜κΈ° μœ„ν•œ μž‘μ—…μ„ μˆ˜ν–‰ν•©λ‹ˆλ‹€. λ”°λΌμ„œ, μΌμ‹œμ μœΌλ‘œ SQS에 μ €μž₯된 μ•Œλ¦Όν†‘ κ²°κ³Ό λ©”μ‹œμ§€λ₯Ό μ²˜λ¦¬ν•˜μ§€ λͺ»ν•˜λ”라도 (μ§€μ†μ μœΌλ‘œ SQS 톡신을 μˆ˜ν–‰ν•  수 μ—†λŠ” μƒνƒœκ°€ μ•„λ‹ˆλΌλ©΄) ν¬λ¦¬ν‹°μ»¬ν•œ λ¬Έμ œλŠ” μ•„λ‹™λ‹ˆλ‹€. κ·ΈλŸΌμ—λ„ λΆˆκ΅¬ν•˜κ³  DNS 였λ₯˜μ— λŒ€ν•œ μ•Œλ¦Όμ„ ν™•μΈν•˜κ³  μ• ν”Œλ¦¬μΌ€μ΄μ…˜ κΈ°λŠ₯에 λŒ€ν•΄ 주기적인 λͺ¨λ‹ˆν„°λ§μ€ ν•„μš”ν•œ λΆ€λΆ„μž…λ‹ˆλ‹€.

μ•„λ¬΄νŠΌ 해프닝!…