【Go】http.Server graceful shutdown遇到的奇怪问题

代码如下:

package main
import (
 "context"
 "log"
 "net/http"
 "os"
 "os/signal"
 "sync"
 "time"
 "github.com/gin-gonic/gin"
)
var wg sync.WaitGroup
var apiQuit = make(chan bool)
func apiQuitSignal() {
 log.Println("quit signal")
 apiQuit <- true
}
func main() {
 router := gin.New()
 router.GET("/quit", func(c *gin.Context) {
 log.Println("GET /quit")
 apiQuitSignal()
 //time.AfterFunc(5*time.Second, apiQuitSignal)
 c.String(200, "quit")
 //c.String(200, "quit in 5 seconds")
 })
 router.GET("/hello", func(c *gin.Context) {
 log.Println("GET /hello")
 c.String(200, "hello")
 })
 srv := &http.Server{
 Addr: ":8888",
 Handler: router,
 }
 register := func(f func()) {
 wg.Add(1)
 srv.RegisterOnShutdown(func() {
 defer wg.Done()
 f()
 })
 }
 register(func() {
 time.Sleep(10 * time.Second)
 })
 go func() {
 // 服务连接
 if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
 log.Fatalf("listen: %s\n", err)
 }
 }()
 quit := make(chan os.Signal)
 signal.Notify(quit, os.Interrupt)
 select {
 case <-quit:
 log.Println("quit from os.Signal")
 case <-apiQuit:
 log.Println("quit from api")
 }
 ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
 //ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
 defer cancel()
 if err := srv.Shutdown(ctx); err != nil {
 log.Fatal("Server Shutdown:", err)
 }
 wg.Wait()
 log.Println("Server Exit")
}

上面的服务使用apiQuit从请求quit时获得shutdown的信号,然后进行graceful shutdown操作。

然后在请求的过程中,出现了一些奇怪的现象,表述如下:

一,在服务器启动后,仅请求quit接口,会报
Server Shutdown:context deadline exceeded
这个明显是srv.Shutdown(ctx)的时候context超时了。

二,在服务器启动后,先请求hello接口,再请求quit接口,可以正常退出。

三,在服务器启动后,先请求quit接口,再立刻请求hello接口,可以正常退出。

四,增加context的超时时间到10秒,可以正常退出。

这明显是Shutdown的代码有些奇怪的feature(或者bug)。

看代码:

// file: net/http/server.go
func (srv *Server) Shutdown(ctx context.Context) error {
 srv.inShutdown.setTrue()
 srv.mu.Lock()
 lnerr := srv.closeListenersLocked()
 srv.closeDoneChanLocked()
 for _, f := range srv.onShutdown {
 go f()
 }
 srv.mu.Unlock()
 pollIntervalBase := time.Millisecond
 nextPollInterval := func() time.Duration {
 // Add 10% jitter.
 interval := pollIntervalBase + time.Duration(rand.Intn(int(pollIntervalBase/10)))
 // Double and clamp for next time.
 pollIntervalBase *= 2
 if pollIntervalBase > shutdownPollIntervalMax {
 pollIntervalBase = shutdownPollIntervalMax
 }
 return interval
 }
 timer := time.NewTimer(nextPollInterval())
 defer timer.Stop()
 for {
 if srv.closeIdleConns() && srv.numListeners() == 0 {
 return lnerr
 }
 select {
 case <-ctx.Done():
 return ctx.Err()
 case <-timer.C:
 timer.Reset(nextPollInterval())
 }
 }
}
func (s *Server) closeIdleConns() bool {
 s.mu.Lock()
 defer s.mu.Unlock()
 quiescent := true
 for c := range s.activeConn {
 st, unixSec := c.getState()
 // Issue 22682: treat StateNew connections as if
 // they're idle if we haven't read the first request's
 // header in over 5 seconds.
 if st == StateNew && unixSec < time.Now().Unix()-5 {
 st = StateIdle
 }
 if st != StateIdle || unixSec == 0 {
 // Assume unixSec == 0 means it's a very new
 // connection, without state set yet.
 quiescent = false
 continue
 }
 c.rwc.Close()
 delete(s.activeConn, c)
 }
 return quiescent
}

先找到Shutdown函数,看到ctx.Done(),超时报错在这里,然后定位srv.closeIdleConns()
再找到closeIdleConns方法,可以看到主要代码就是一个
for c := range s.activeConn
然后注意下面的代码

// Issue 22682: treat StateNew connections as if
// they're idle if we haven't read the first request's
// header in over 5 seconds.
if st == StateNew && unixSec < time.Now().Unix()-5 {
 st = StateIdle
}

如果连接的状态是StateNew的时候,会延迟到5秒(实际上并不是严格的5秒,参考Shutdown函数里的timer机制)才能转为StateIdle。
并且这个问题已经有人提到了,参考Issue 22682
https://github.com/golang/go/issues/22682

解决的方法也就很简单:
方法一、在quit请求结束5秒后再发信号之后再调用Shutdown
方法二、在发送了quit信号后阻塞5秒再调用Shutdown

作者:赞原文地址:https://segmentfault.com/a/1190000043788014

%s 个评论

要回复文章请先登录注册