druid活跃线程数量持续增长问题

2023/06/05 Mybatis

druid活跃线程数量持续增长问题

1、问题现象

前一阵子,在一个老项目里面加了一个接口,分页查询数据库里面的记录,用于前端展示。

(嗯,先别急,我要说的不是分页查询的性能导致的问题。)

需求很easy,三两下就搞定了,结果上线后过不了多久就收到告警druid活跃线程数量超过90%。

image-20230605150201914

以前活跃线程数平均下来几乎为0,这次就加了一个分页查询接口,竟然导致活跃线程数量飙升?奇怪,我也没改特别的呀。

分页查询嘛,需要查询总记录数 + 当前页的记录。一般习惯用PageHelper之类的分页插件,但需要额外引入PageHelper这个jar,我看原来项目里已经有mybatis-paginator。想着还是少引入一个jar吧,就基于mybatis-paginator干吧。

2、问题分析

突然收到这个报警,也是觉得挺莫名其妙的,毕竟改动很小,就加了一个分页查询。

仔细确认了一下分页的代码,嗯,没啥毛病!

那咋弄呢?好在druid有monitor页面,先把monitor监控页面整出来,借助它来分析一下。

2.1、配置druid-monitor

首先,需要在web.xml里加上一些配置

	<filter>
        <filter-name>DruidWebStatFilter</filter-name>
        <filter-class>com.alibaba.druid.support.http.WebStatFilter</filter-class>
        <init-param>
            <param-name>exclusions</param-name>
            <param-value>/static/*,*.js,*.gif,*.jpg,*.png,*.css,*.ico,/druid/*</param-value>
        </init-param>
        <init-param>
            <param-name>profileEnable</param-name>
            <param-value>true</param-value>
        </init-param>
    </filter>
    <filter-mapping>
        <filter-name>DruidWebStatFilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>
    <servlet>
        <servlet-name>DruidStatView</servlet-name>
        <servlet-class>com.alibaba.druid.support.http.StatViewServlet</servlet-class>
        <init-param>
            <param-name>loginUsername</param-name>
            <param-value>druid</param-value>
        </init-param>
        <init-param>
            <param-name>loginPassword</param-name>
            <param-value>druid</param-value>
        </init-param>
    </servlet>
    <servlet-mapping>
        <servlet-name>DruidStatView</servlet-name>
        <url-pattern>/druid/*</url-pattern>
    </servlet-mapping>

然后,需要在DruidDataSource对象申明的地方,配置上

<property name="filters" value="config,stat,slf4j" />

配置好之后,就可以打开druid-monitor页面了。没什么意外的话,就可以看到数据源、连接数之类的信息了。

没意外的话,意外发生了—页面是出来了,数据源信息死活没出来。

image-20230605152305683

这是什么情况?难道是配置哪里不对?查找了各种配置说明之后,才发现这可能是druid的bug。升级一下druid就解决了。

于是把druid从1.0.12升级到 1.0.29,数据源tab页就正常了。

2.2、观察数据指标

通过druid-monitor,重点关注【数据源】页的三个参数:

  • 活跃连接数
  • 逻辑连接打开次数
  • 逻辑连接关闭次数

在测试环境试了一段时间,发现有些时候逻辑连接打开数 略大于 关闭数,这说明有些连接确实没关闭了。只不过测试环境请求量比较小,问题不容易暴漏。

那有啥办法定位是哪里没关闭连接吗?

2.3、连接泄露检测

百度谷歌了下,找到了温少的wiki-连接泄露检测

  <bean id="dataSource" class="com.alibaba.druid.pool.DruidDataSource" destroy-method="close">
  	... ...
  	<property name="removeAbandoned" value="true" /> <!-- 打开removeAbandoned功能 -->
  	<property name="removeAbandonedTimeout" value="1800" /> <!-- 1800秒,也就是30分钟 -->
  	<property name="logAbandoned" value="true" /> <!-- 关闭abanded连接时输出错误日志 -->
  	... ...
  </bean>

当removeAbandoned=true之后,可以在内置监控界面datasource.html中的查看ActiveConnection StackTrace属性的,可以看到未关闭连接的具体堆栈信息,从而方便查出哪些连接泄漏了。

比如下面这样:

image-20230605154338376

这些堆栈信息貌似也没啥特殊的,没有一行是项目的代码

java.lang.Thread.getStackTrace(Thread.java:1552)
com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1140)
com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4544)
com.alibaba.druid.filter.logging.LogFilter.dataSource_getConnection(LogFilter.java:854)
com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540)
com.alibaba.druid.filter.stat.StatFilter.dataSource_getConnection(StatFilter.java:670)
com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540)
com.alibaba.druid.filter.FilterAdapter.dataSource_getConnection(FilterAdapter.java:2723)
com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540)
com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1064)
com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1056)
com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:104)
org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:81)
org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67)
org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:315)
org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:75)
org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:61)
org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:303)
org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:154)
org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:102)
org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:82)
sun.reflect.GeneratedMethodAccessor300.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:497)
org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:49)
com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor$1.call(OffsetLimitInterceptor.java:87)
com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor$1.call(OffsetLimitInterceptor.java:85)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

不过,多来几次之后,就发现一个类名出现的次数比较多—-OffsetLimitInterceptor!难道是这玩意导致的?

OffsetLimitInterceptor是mybatis-paginator实现分页查询的类。

于是乎尝试不使用mybatis-paginator,而是换了最原始的分页查询。结果问题解决了~

是的,就这么解决了! 这谁能想的到!

3、原因分析

前面猜测是mybatis-paginator的OffsetLimitInterceptor引起的,没想到蒙对了。改了之后问题就消失了。

那来看看OffsetLimitInterceptor到底做了什么

@Intercepts({@Signature(
		type= Executor.class,
		method = "query",
		args = {MappedStatement.class, Object.class, RowBounds.class, ResultHandler.class})})
public class OffsetLimitInterceptor implements Interceptor{
    private static Logger logger = LoggerFactory.getLogger(OffsetLimitInterceptor.class);
	static int MAPPED_STATEMENT_INDEX = 0;
	static int PARAMETER_INDEX = 1;
	static int ROWBOUNDS_INDEX = 2;
	static int RESULT_HANDLER_INDEX = 3;

    static ExecutorService Pool;
    String dialectClass;
    boolean asyncTotalCount = false;
	
	public Object intercept(final Invocation invocation) throws Throwable {
        final Executor executor = (Executor) invocation.getTarget();
        final Object[] queryArgs = invocation.getArgs();
        final MappedStatement ms = (MappedStatement)queryArgs[MAPPED_STATEMENT_INDEX];
        final Object parameter = queryArgs[PARAMETER_INDEX];
        final RowBounds rowBounds = (RowBounds)queryArgs[ROWBOUNDS_INDEX];
        final PageBounds pageBounds = new PageBounds(rowBounds);

        if(pageBounds.getOffset() == RowBounds.NO_ROW_OFFSET
                && pageBounds.getLimit() == RowBounds.NO_ROW_LIMIT
                && pageBounds.getOrders().isEmpty()){
            return invocation.proceed();
        }

        final Dialect dialect;
        try {
            Class clazz = Class.forName(dialectClass);
            Constructor constructor = clazz.getConstructor(MappedStatement.class, Object.class, PageBounds.class);
            dialect = (Dialect)constructor.newInstance(new Object[]{ms, parameter, pageBounds});
        } catch (Exception e) {
            throw new ClassNotFoundException("Cannot create dialect instance: "+dialectClass,e);
        }

        final BoundSql boundSql = ms.getBoundSql(parameter);

        queryArgs[MAPPED_STATEMENT_INDEX] = copyFromNewSql(ms,boundSql,dialect.getPageSQL(), dialect.getParameterMappings(), dialect.getParameterObject());
        queryArgs[PARAMETER_INDEX] = dialect.getParameterObject();
        queryArgs[ROWBOUNDS_INDEX] = new RowBounds(RowBounds.NO_ROW_OFFSET,RowBounds.NO_ROW_LIMIT);

        Boolean async = pageBounds.getAsyncTotalCount() == null ? asyncTotalCount : pageBounds.getAsyncTotalCount();
        // 查记录
        Future<List> listFuture = call(new Callable<List>() {
            public List call() throws Exception {
                return (List)invocation.proceed();
            }
        }, async);


        // 查总数
        if(pageBounds.isContainsTotalCount()){
            Callable<Paginator> countTask = new Callable() {
                public Object call() throws Exception {
                    Integer count;
                    Cache cache = ms.getCache();
                    if(cache != null && ms.isUseCache() && ms.getConfiguration().isCacheEnabled()){
                        CacheKey cacheKey = executor.createCacheKey(ms,parameter,new PageBounds(),copyFromBoundSql(ms,boundSql,dialect.getCountSQL(), boundSql.getParameterMappings(), boundSql.getParameterObject()));
                        count = (Integer)cache.getObject(cacheKey);
                        if(count == null){
                            count = SQLHelp.getCount(ms,executor.getTransaction(),parameter,boundSql,dialect);
                            cache.putObject(cacheKey, count);
                        }
                    }else{
                        count = SQLHelp.getCount(ms,executor.getTransaction(),parameter,boundSql,dialect);
                    }
                    return new Paginator(pageBounds.getPage(), pageBounds.getLimit(), count);
                }
            };
            Future<Paginator> countFutrue = call(countTask, async);
            return new PageList(listFuture.get(),countFutrue.get());
        }

        return listFuture.get();
	}

    // 这里可以同步 or 异步 去查
    private <T> Future<T> call(Callable callable, boolean async){
        if(async){
             return Pool.submit(callable);
        }else{
            FutureTask<T> future = new FutureTask(callable);
            future.run();
            return future;
        }
    }
    // ......
}

从源码看,这里支持同步或者异步去执行2个sql。然后通过Future.get()返回结果。

异步使用的pool线程池是

public void setPoolMaxSize(int poolMaxSize) {
    if(poolMaxSize > 0){
        logger.debug("poolMaxSize: {} ", poolMaxSize);
        Pool = Executors.newFixedThreadPool(poolMaxSize);
    }else{
        Pool = Executors.newCachedThreadPool();
    }
}

至此,问题应该比较清楚了。Pool连接池线程终止后,数据库连接未释放,导致活跃数持续增长。

(为啥使用这个Pool线程池就会出现数据库连接未回收问题呢?我跟着源码 也没找到答案,o(╥﹏╥)o)

看网上很多类似的问题,但最后都是配置removeAbandoned=true就完了。但

RemoveAbandanded功能不建议在生产环境中使用,仅用于连接泄露检测诊断

4、其他

1、为啥以前就没出问题呢?

项目里面的mybatis.xml中配置的是:

<plugins>
	<plugin interceptor="com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor">
		<property name="dialectClass" value="com.github.miemiedev.mybatis.paginator.dialect.MySQLDialect"/>
		<property name="asyncTotalCount" value="false"/>
	</plugin>
</plugins>

asyncTotalCount=false,那为啥实际情况是使用的Pool线程池去异步执行的呢?

原来是依赖的一个二方包里面,包含了一个mybatis的配置文件,里面配置的asyncTotalCount为true,而且它先被先一步加载!

2、druid为什么不打印日志?

druid不依赖任何的log组件,但支持多种log组件,会检测当前环境,选择一种合适的log实现。选择日志的优先级依次是:

log4j > log4j2 > slf4j > commons-logging > jdk-logging

具体检测方法是尝试加载对应的class,如果加载成功就认为当前环境支持此种log组件,就使用它来处理日志打印。

如果想要指定特定的log组件,你可以通过JVM启动参数来配置,比如:

-Ddruid.logType=slf4j
-Ddruid.logType=log4j
-Ddruid.logType=log4j2
-Ddruid.logType=commonsLog
-Ddruid.logType=jdkLog

源码见:

// com.alibaba.druid.support.logging.LogFactory
public class LogFactory {

    private static Constructor logConstructor;

    static {
        String logType= System.getProperty("druid.logType");
        if(logType != null){
            if(logType.equalsIgnoreCase("slf4j")){
                tryImplementation("org.slf4j.Logger", "com.alibaba.druid.support.logging.SLF4JImpl");
            }else if(logType.equalsIgnoreCase("log4j")){
                tryImplementation("org.apache.log4j.Logger", "com.alibaba.druid.support.logging.Log4jImpl");
            }else if(logType.equalsIgnoreCase("log4j2")){
                tryImplementation("org.apache.logging.log4j.Logger", "com.alibaba.druid.support.logging.Log4j2Impl");
            }else if(logType.equalsIgnoreCase("commonsLog")){
                tryImplementation("org.apache.commons.logging.LogFactory",
                        "com.alibaba.druid.support.logging.JakartaCommonsLoggingImpl");
            }else if(logType.equalsIgnoreCase("jdkLog")){
                tryImplementation("java.util.logging.Logger", "com.alibaba.druid.support.logging.Jdk14LoggingImpl");
            }
        }
        // 优先选择log4j,而非Apache Common Logging. 因为后者无法设置真实Log调用者的信息
        tryImplementation("org.apache.log4j.Logger", "com.alibaba.druid.support.logging.Log4jImpl");
        tryImplementation("org.apache.logging.log4j.Logger", "com.alibaba.druid.support.logging.Log4j2Impl");
        tryImplementation("org.slf4j.Logger", "com.alibaba.druid.support.logging.SLF4JImpl");
        tryImplementation("org.apache.commons.logging.LogFactory",
                          "com.alibaba.druid.support.logging.JakartaCommonsLoggingImpl");
        tryImplementation("java.util.logging.Logger", "com.alibaba.druid.support.logging.Jdk14LoggingImpl");

        if (logConstructor == null) {
            try {
                logConstructor = NoLoggingImpl.class.getConstructor(String.class);
            } catch (Exception e) {
                throw new IllegalStateException(e.getMessage(), e);
            }
        }
    }
    // ...
}

配置上启动参数后【-Ddruid.logType=slf4j】,有连接未正常关闭,就会打印如下日志:

2023-05-25 10:19:49.153 [TID:] [trace ] [userName ] [eid ] [Druid-ConnectionPool-Destroy-1392407412] ERROR com.alibaba.druid.pool.DruidDataSource - abandon connection, owner thread: pool-1-thread-2, connected at : 1684980819318, open stackTrace
	at java.lang.Thread.getStackTrace(Thread.java:1559)
	at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1140)
	at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4544)
	at com.alibaba.druid.filter.logging.LogFilter.dataSource_getConnection(LogFilter.java:854)
	at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540)
	at com.alibaba.druid.filter.stat.StatFilter.dataSource_getConnection(StatFilter.java:670)
	at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540)
	at com.alibaba.druid.filter.FilterAdapter.dataSource_getConnection(FilterAdapter.java:2723)
	at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540)
	at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1064)
	at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1056)
	at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:104)
	at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
	at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
	at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:81)
	at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67)
	at com.github.miemiedev.mybatis.paginator.support.SQLHelp.getCount(SQLHelp.java:56)
	at com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor$2.call(OffsetLimitInterceptor.java:105)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
ownerThread current state is TERMINATED, current stackTrace

Search

    欢迎关注我的微信公众号

    钢仔

    Table of Contents