release branch 0.24.0

2026-03-22 00:17:35 +08:00 · 2024-12-22 15:18:09 +08:00
parent acd58efb60
commit f0e67b75ed
18 changed files with 44341 additions and 124 deletions
--- a/CHANGE_LOG.md
+++ b/CHANGE_LOG.md
@@ -385,4 +385,11 @@

 | 序号 | 变更类型 | 说明                                          | 时间                  | 备注          |
 |:---|:-----|---------------------------------------------|:--------------------|:------------|
-| 1  | O    | WordResultHandlerWordTags+获取tags时，统一格式化处理优化 | 2024-12-22 12:45:53 |  保持统一，简化词库信息           |
+| 1  | O    | WordResultHandlerWordTags+获取tags时，统一格式化处理优化 | 2024-12-22 12:45:53 |  保持统一，简化词库信息           |
+
+# release_0.24.0
+
+| 序号 | 变更类型 | 说明             | 时间                  | 备注            |
+|:---|:-----|----------------|:--------------------|:--------------|
+| 1  | A    | 内置支持多个单词标签实现策略 | 2024-12-22 14:08:20 | 强化单词标签能力，方便复用 |
+| 2  | O    | 升级 heaven 依赖   | 2024-12-22 14:08:20 |  |
--- a/README.md
+++ b/README.md
@@ -14,13 +14,17 @@

 ## 创作目的

-实现一款好用敏感词工具。
+大家好，我是老马。
+
+一直想实现一款简单好用敏感词工具，于是开源实现了这个工具。

 基于 DFA 算法实现，目前敏感词库内容收录 6W+（源文件 18W+，经过一次删减）。

 后期将进行持续优化和补充敏感词库，并进一步提升算法的性能。

-希望可以细化敏感词的分类，感觉工作量比较大，暂时没有进行。
+v0.24.0 开始内置支持对敏感词的分类细化，不过工作量比较大，难免存在疏漏。
+
+欢迎 PR 改进， github 提需求，或者加入技术交流群沟通吹牛！

 ## 特性

@@ -54,17 +58,13 @@

 [CHANGE_LOG.md](https://github.com/houbb/sensitive-word/blob/master/CHANGE_LOG.md)

-### V0.22.0
-
- 修正单个敏感词修改时，对应的格式处理问题
-
 ### V0.23.0

 - 结果条件拓展支持 wordTags 和 chains 

-### V0.23.1
+### V0.24.0

- 敏感词标签统一格式化处理优化
+- 初步内置实现单词标签，丰富单词标签内置策略

 ## 更多资料

@@ -82,6 +82,8 @@

 > [v0.11.0-敏感词新特性及对应标签文件](https://mp.weixin.qq.com/s/m40ZnR6YF6WgPrArUSZ_0g)

+目前 v0.24.0 已内置实现单词标签，需要的建议升级到最新版本。
+
 # 快速开始

 ## 准备
@@ -96,7 +98,7 @@
 <dependency>
    <groupId>com.github.houbb</groupId>
    <artifactId>sensitive-word</artifactId>
-    <version>0.23.1</version>
+    <version>0.24.0</version>
 </dependency>
 ```

@@ -825,11 +827,11 @@ Assert.assertEquals("[傻@冒, 狗+东西]", wordList2.toString());

 支持版本：v0.10.0

-## 入门例子
+主要特性支持版本：v0.24.0

-### 接口
+## 标签接口

-这里只是一个抽象的接口，用户可以自行定义实现。比如从数据库查询等。
+这里只是一个抽象的接口，用户可以自行定义实现。比如从数据库查询、文件读取、api 调用等。

 ```java
 public interface IWordTag {
@@ -844,49 +846,113 @@ public interface IWordTag {
 }
 ```

-### 配置文件
+## 内置实现

-我们可以自定义 dict 标签文件，通过 WordTags.file() 创建一个 WordTag 实现。
+### 方法列表

- dict_tag_test.txt
+为了方便大部分情况使用，内置实现一些场景策略在 `WordTags` 类中
+
+| 实现方法                                                              | 说明                   | 备注         |
+|:------------------------------------------------------------------|:---------------------|:-----------|
+| none()                                                            | 空实现                  | v0.10.0 支持 |
+| file(String filePath)                                             | 指定文件路径               | v0.10.0 支持 |
+| file(String filePath, String wordSplit, String tagSplit)          | 指定文件路径，以及单词分隔符、标签分隔符 | v0.10.0 支持 |
+| map(final Map<String, Set<String>> wordTagMap)                    | 根据 map初始化            | v0.24.0 支持 |
+| lines(Collection<String> lines)                                   | 字符串列表                | v0.24.0 支持 |
+| lines(Collection<String> lines, String wordSplit, String tagSpli) | 字符串列表，以及单词分隔符、标签分隔符  | v0.24.0 支持 |
+| system()                                                          | 系件文件内置实现，整合网络分类      | v0.24.0 支持 |
+| defaults()                                                        | 默认策略，目前为 system      | v0.24.0 支持 |
+| chains(IWordTag... others)                 | 链式方法，支持用户整合实现多个策略    | v0.24.0 支持 |
+
+### 格式约定
+
+敏感词标签的格式我们默认约定如下 `敏感词 tag1,tag2`，代表这 `敏感词` 的标签为 tag1 和 tag2
+
+比如 

 ```
 五星红旗 政治,国家
 ```

-格式如下：
+所有的文件行内容，和指定的字符串行内容也建议用这种方式。如果不满足，自定义实现即可。

-```
-敏感词 tag1,tag2
-```
+## 系统内置实现（默认效果）

-### 实现
+v0.24.0 版本开始，默认的单词标签为 `WordTags.system()`。

-具体的效果如下，在引导类设置一下即可。
-
-默认的 wordTag 是空的。
+说明：目前数据统计自网络，存在不少疏漏。也欢迎大家指正，持续改进中...

 ```java
-String filePath = "dict_tag_test.txt";
-IWordTag wordTag = WordTags.file(filePath);
+SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+.wordTag(WordTags.system())
+.init();
+Set<String> tagSet = sensitiveWordBs.tags("博彩");
+Assert.assertEquals("[3]", tagSet.toString());
+```

+这里为了压缩大小优化，对应的类别用数字表示。
+
+数字的含义列表如下：
+
+```
+0 政治
+1 毒品
+2 色情
+3 赌博
+4 违法
+```
+
+## 文件入门例子
+
+这里以文件为例子，演示一下如何使用。
+
+```java
+final String path = "~\\test\\resources\\dict_tag_test.txt";
+
+// 演示默认方法
+IWordTag wordTag = WordTags.file(path);
 SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
        .wordTag(wordTag)
        .init();

-Assert.assertEquals("[政治, 国家]", sensitiveWordBs.tags("五星红旗").toString());;
+Set<String> tagSet = sensitiveWordBs.tags("零售");
+        Assert.assertEquals("[广告, 网络]", tagSet.toString());
+
+
+// 演示指定分隔符
+IWordTag wordTag2 = WordTags.file(path, " ", ",");
+SensitiveWordBs sensitiveWordBs2 = SensitiveWordBs.newInstance()
+        .wordTag(wordTag2)
+        .init();
+Set<String> tagSet2 = sensitiveWordBs2.tags("零售");
+        Assert.assertEquals("[广告, 网络]", tagSet2.toString());
 ```

-后续会考虑引入一个内置的标签文件策略。
+其中 `dict_tag_test.txt` 我们自定义的内容如下：

-### 敏感词标签文件
+```
+零售 广告,网络
+```

-梳理了大量的敏感词标签文件，可以让我们的敏感词更加方便。
+## 单词标签和敏感词发现的联动

-这两个资料阅读可在下方文章获取：
+我们在获取敏感词的时候，是可以设置对应的结果处理策略，从而获取对应的敏感词标签信息

-> [v0.11.0-敏感词新特性及对应标签文件](https://mp.weixin.qq.com/s/m40ZnR6YF6WgPrArUSZ_0g)
+```java
+// 自定义测试标签类
+IWordTag wordTag = WordTags.lines(Arrays.asList("天安门 政治,国家,地址"));

+// 指定初始化
+SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+        .wordTag(wordTag)
+        .init()
+        ;
+
+List<WordTagsDto> wordTagsDtoList1 = sensitiveWordBs.findAll("天安门", WordResultHandlers.wordTags());
+Assert.assertEquals("[WordTagsDto{word='天安门', tags=[政治, 国家, 地址]}]", wordTagsDtoList1.toString());
+```
+
+我们自定义了 `天安门` 关键词的标签，然后通过指定 findAll 的结果处理策略为 `WordResultHandlers.wordTags()`，就可以在获取敏感词的同时，获取对应的标签列表。

 # 动态加载（用户自定义）

@@ -1129,7 +1195,7 @@ ps: 不同环境会有差异，但是比例基本稳定。

 - [x] 移除单个汉字的敏感词，在中国，要把词组当做一次词，降低误判率。

- [ ] 支持单个的敏感词变化？
+- [x] 支持单个的敏感词变化？

 remove、add、edit?

--- a/pom.xml
+++ b/pom.xml
@@ -6,7 +6,7 @@

    <groupId>com.github.houbb</groupId>
    <artifactId>sensitive-word</artifactId>
-    <version>0.23.1</version>
+    <version>0.24.0</version>

    <properties>
        <!--============================== All Plugins START ==============================-->
@@ -25,7 +25,7 @@
        <project.compiler.level>1.7</project.compiler.level>

        <!--============================== INTER ==============================-->
-        <heaven.version>0.11.0</heaven.version>
+        <heaven.version>0.13.0</heaven.version>
        <opencc4j.version>1.8.1</opencc4j.version>

        <!--============================== OTHER ==============================-->
--- a/release.bat
+++ b/release.bat
@@ -10,9 +10,9 @@ ECHO "============================= RELEASE START..."

 :: 版本号信息(需要手动指定)
 :::: 旧版本名称
-SET version=0.23.1
+SET version=0.24.0
 :::: 新版本名称
-SET newVersion=0.24.0
+SET newVersion=0.25.0
 :::: 组织名称
 SET groupName=com.github.houbb
 :::: 项目名称
--- a/src/main/java/com/github/houbb/sensitive/word/api/IWordTag.java
+++ b/src/main/java/com/github/houbb/sensitive/word/api/IWordTag.java
@@ -15,6 +15,6 @@ public interface IWordTag {
     * @param word 脏词
     * @return 结果
     */
-    Set<String> getTag(String word);
+    Set<String> getTag(final String word);

 }
--- a/src/main/java/com/github/houbb/sensitive/word/bs/SensitiveWordBs.java
+++ b/src/main/java/com/github/houbb/sensitive/word/bs/SensitiveWordBs.java
@@ -168,7 +168,7 @@ public class SensitiveWordBs implements ISensitiveWordDestroy {
     * 单词标签
     * @since 0.10.0
     */
-    private IWordTag wordTag = WordTags.none();
+    private IWordTag wordTag = WordTags.defaults();

    /**
     * 忽略的字符策略
--- a/src/main/java/com/github/houbb/sensitive/word/constant/enums/WordTagType.java
+++ b/src/main/java/com/github/houbb/sensitive/word/constant/enums/WordTagType.java
@@ -0,0 +1,41 @@
+package com.github.houbb.sensitive.word.constant.enums;
+
+/**
+ * 单词标签类别
+ *
+ * @since 0.24.0
+ */
+public enum WordTagType {
+    ZHENGZHI("0", "政治"),
+    DUPIN("1", "毒品"),
+    SEQING("2", "色情"),
+    DUBO("3", "赌博"),
+    FANZUI("4", "违法犯罪"),
+    ;
+
+    private final String code;
+    private final String desc;
+
+    WordTagType(String code, String desc) {
+        this.code = code;
+        this.desc = desc;
+    }
+
+    public String getCode() {
+        return code;
+    }
+
+    public String getDesc() {
+        return desc;
+    }
+
+    public static String getDescByCode(final String code) {
+        for(WordTagType tagType : WordTagType.values()) {
+            if(tagType.code.equals(code)) {
+                return tagType.desc;
+            }
+        }
+        return code;
+    }
+
+}
--- a/src/main/java/com/github/houbb/sensitive/word/support/tag/AbstractWordTagInit.java
+++ b/src/main/java/com/github/houbb/sensitive/word/support/tag/AbstractWordTagInit.java
@@ -0,0 +1,44 @@
+package com.github.houbb.sensitive.word.support.tag;
+
+import com.github.houbb.heaven.support.pipeline.Pipeline;
+import com.github.houbb.heaven.support.pipeline.impl.DefaultPipeline;
+import com.github.houbb.heaven.util.util.CollectionUtil;
+import com.github.houbb.sensitive.word.api.IWordTag;
+
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * 抽象的单词标签初始化引导类
+ *
+ * @since 0.24.0
+ */
+public abstract class AbstractWordTagInit extends AbstractWordTag {
+
+    /**
+     * 初始化列表
+     *
+     * @param pipeline 当前列表泳道
+     * @since 0.24.0
+     */
+    protected abstract void init(final Pipeline<IWordTag> pipeline);
+
+    @Override
+    public Set<String> doGetTag(String word) {
+        Pipeline<IWordTag> pipeline = new DefaultPipeline<>();
+        this.init(pipeline);
+
+        Set<String> resultSet = new HashSet<>();
+        List<IWordTag> wordTagList = pipeline.list();
+        for (IWordTag wordTag : wordTagList) {
+            Set<String> tempTagSet = wordTag.getTag(word);
+            if(CollectionUtil.isNotEmpty(tempTagSet)) {
+                resultSet.addAll(tempTagSet);
+            }
+        }
+
+        return resultSet;
+    }
+
+}
--- a/src/main/java/com/github/houbb/sensitive/word/support/tag/FileWordTag.java
+++ b/src/main/java/com/github/houbb/sensitive/word/support/tag/FileWordTag.java
@@ -2,10 +2,10 @@ package com.github.houbb.sensitive.word.support.tag;

 import com.github.houbb.heaven.util.common.ArgUtil;
 import com.github.houbb.heaven.util.io.FileUtil;
-import com.github.houbb.heaven.util.lang.StringUtil;
-import com.github.houbb.heaven.util.util.CollectionUtil;
+import com.github.houbb.sensitive.word.api.IWordTag;

-import java.util.*;
+import java.util.List;
+import java.util.Set;

 /**
 * 基于文件的标签
@@ -15,20 +15,10 @@ import java.util.*;
 */
 public class FileWordTag extends AbstractWordTag {

-    /**
-     * 文件路径
-     */
-    protected final String filePath;
    /**
     * 词和标签的分隔符
     */
-    protected final String wordSplit;
-    /**
-     * 标签的分隔符
-     */
-    protected final String tagSplit;
-
-    protected Map<String, Set<String>> wordTagMap = new HashMap<>();
+    protected final IWordTag wordTag;

    public FileWordTag(String filePath) {
        this(filePath, " ", ",");
@@ -39,51 +29,13 @@ public class FileWordTag extends AbstractWordTag {
        ArgUtil.notEmpty(wordSplit, "wordSplit");
        ArgUtil.notEmpty(tagSplit, "tagSplit");

-        this.wordSplit = wordSplit;
-        this.tagSplit = tagSplit;
-        this.filePath = filePath;
-
-        this.initWordTagMap();
-    }
-
-
-    /**
-     * 初始化
-     */
-    protected synchronized void initWordTagMap() {
        List<String> lines = FileUtil.readAllLines(filePath);
-        if(CollectionUtil.isEmpty(lines)) {
-            return;
-        }
-
-        for(String line : lines) {
-            if(StringUtil.isEmpty(line)) {
-                continue;
-            }
-
-            // 处理每一行
-            handleInitLine(line);
-        }
-    }
-
-    protected synchronized void handleInitLine(String line) {
-        String[] strings = line.split(wordSplit);
-        if(strings.length < 2) {
-            return;
-        }
-
-        String word = strings[0];
-        String tagText = strings[1];
-
-
-        String[] tags = tagText.split(tagSplit);
-        Set<String> tagSet = new HashSet<>(Arrays.asList(tags));
-        wordTagMap.put(word, tagSet);
+        wordTag = WordTags.lines(lines, wordSplit, tagSplit);
    }

    @Override
    protected Set<String> doGetTag(String word) {
-        return wordTagMap.get(word);
+        return wordTag.getTag(word);
    }

 }
--- a/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTagLines.java
+++ b/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTagLines.java
@@ -0,0 +1,66 @@
+package com.github.houbb.sensitive.word.support.tag;
+
+import com.github.houbb.heaven.util.common.ArgUtil;
+import com.github.houbb.heaven.util.lang.StringUtil;
+import com.github.houbb.sensitive.word.api.IWordTag;
+
+import java.util.*;
+
+/**
+ * 根据标准的行来处理
+ *
+ * 行规范：
+ *
+ * 单词 标签1,标签2
+ *
+ * @since 0.24.0
+ */
+public class WordTagLines extends AbstractWordTag {
+
+    private final IWordTag wordTag;
+
+    /**
+     * 词和标签的分隔符
+     */
+    private final String wordSplit;
+    /**
+     * 标签的分隔符
+     */
+    private final String tagSplit;
+
+    public WordTagLines(Collection<String> lines,
+                        final String wordSplit,
+                        final String tagSplit) {
+        ArgUtil.notNull(lines, "lines");
+        ArgUtil.notEmpty(wordSplit, "wordSplit");
+        ArgUtil.notEmpty(tagSplit, "tagSplit");
+
+        this.wordSplit = wordSplit;
+        this.tagSplit = tagSplit;
+
+        Map<String, Set<String>> wordTagMap = buildWordTagMap(lines);
+        wordTag = WordTags.map(wordTagMap);
+    }
+
+    public WordTagLines(Collection<String> lines) {
+        this(lines, " ", ",");
+    }
+
+    private Map<String, Set<String>> buildWordTagMap(final Collection<String> lines) {
+        Map<String, Set<String>> wordTagMap = new HashMap<>();
+
+        for(String line : lines) {
+            String[] strings = line.split(wordSplit);
+            String key = strings[0];
+            Set<String> tags = new HashSet<>(StringUtil.splitToList(strings[1], tagSplit));
+            wordTagMap.put(key, tags);
+        }
+        return wordTagMap;
+    }
+
+    @Override
+    protected Set<String> doGetTag(String word) {
+        return wordTag.getTag(word);
+    }
+
+}
--- a/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTagMap.java
+++ b/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTagMap.java
@@ -0,0 +1,30 @@
+package com.github.houbb.sensitive.word.support.tag;
+
+import com.github.houbb.heaven.util.common.ArgUtil;
+
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * 根据 map 构建初始化
+ *
+ * key:单词
+ * value: 标签 set
+ *
+ * @since 0.24.0
+ */
+public class WordTagMap extends AbstractWordTag {
+
+    private final Map<String, Set<String>> wordTagMap;
+
+    public WordTagMap(Map<String, Set<String>> wordTagMap) {
+        ArgUtil.notNull(wordTagMap, "wordTagMap");
+        this.wordTagMap = wordTagMap;
+    }
+
+    @Override
+    protected Set<String> doGetTag(String word) {
+        return wordTagMap.get(word);
+    }
+
+}
--- a/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTagSystem.java
+++ b/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTagSystem.java
@@ -0,0 +1,28 @@
+package com.github.houbb.sensitive.word.support.tag;
+
+import com.github.houbb.heaven.util.io.StreamUtil;
+import com.github.houbb.sensitive.word.api.IWordTag;
+
+import java.util.List;
+import java.util.Set;
+
+/**
+ * 系统内置策略，根据文件默认处理
+ *
+ * @since 0.24.0
+ */
+public class WordTagSystem extends AbstractWordTag {
+
+    private final IWordTag wordTag;
+
+    public WordTagSystem() {
+        List<String> lines = StreamUtil.readAllLines("/sensitive_word_tags.txt");
+        this.wordTag = WordTags.lines(lines);
+    }
+
+    @Override
+    protected Set<String> doGetTag(String word) {
+        return wordTag.getTag(word);
+    }
+
+}
--- a/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTags.java
+++ b/src/main/java/com/github/houbb/sensitive/word/support/tag/WordTags.java
@@ -1,7 +1,14 @@
 package com.github.houbb.sensitive.word.support.tag;

+import com.github.houbb.heaven.support.pipeline.Pipeline;
+import com.github.houbb.heaven.util.common.ArgUtil;
+import com.github.houbb.heaven.util.util.ArrayUtil;
 import com.github.houbb.sensitive.word.api.IWordTag;

+import java.util.Collection;
+import java.util.Map;
+import java.util.Set;
+
 /**
 * 单词标签
 *
@@ -9,12 +16,109 @@ import com.github.houbb.sensitive.word.api.IWordTag;
 */
 public class WordTags {

+    /**
+     * 空实现
+     * @return 空实现
+     * @since 0.10.0
+     */
    public static IWordTag none() {
        return new NoneWordTag();
    }

+    /**
+     * 文件实现
+     * @param filePath 文件路径
+     * @return 文件实现
+     * @since 0.10.0
+     */
    public static IWordTag file(String filePath) {
        return new FileWordTag(filePath);
    }

+    /**
+     * 文件实现
+     *
+     * @param filePath 文件路径
+     * @param wordSplit 单词分割
+     * @param tagSplit 标签分割
+     * @return 实现
+     * @since 0.24.0
+     */
+    public static IWordTag file(String filePath, final String wordSplit, final String tagSplit) {
+        return new FileWordTag(filePath, wordSplit, tagSplit);
+    }
+
+    /**
+     * MAP 策略
+     * @param wordTagMap map
+     * @return 实现
+     * @since 0.24.0
+     */
+    public static IWordTag map(final Map<String, Set<String>> wordTagMap) {
+        return new WordTagMap(wordTagMap);
+    }
+
+    /**
+     * 根据标准的约定行处理
+     * @param lines 行信息
+     * @return 结果
+     * @since 0.24.0
+     */
+    public static IWordTag lines(final Collection<String> lines) {
+        return new WordTagLines(lines);
+    }
+
+    /**
+     * 根据标准的约定行处理
+     * @param lines 行信息
+     * @return 结果
+     */
+    public static IWordTag lines(final Collection<String> lines, final String wordSplit, final String tagSplit) {
+        return new WordTagLines(lines, wordSplit, tagSplit);
+    }
+
+    /**
+     * 系统文件策略
+     * @return 标准策略
+     * @since 0.24.0
+     */
+    public static IWordTag system() {
+        return new WordTagSystem();
+    }
+
+    /**
+     * 默认策略
+     * @return 标准策略
+     * @since 0.24.0
+     */
+    public static IWordTag defaults() {
+        return system();
+    }
+
+    /**
+     * 链式调用
+     *
+     * @param wordTag 标签策略
+     * @param others 其他
+     * @return 结果
+     * @since 0.24.0
+     */
+    public static IWordTag chains(final IWordTag wordTag,
+                                   final IWordTag... others) {
+        ArgUtil.notNull(wordTag, "wordTag");
+
+        return new AbstractWordTagInit() {
+            @Override
+            protected void init(Pipeline<IWordTag> pipeline) {
+                pipeline.addLast(wordTag);
+
+                if(ArrayUtil.isNotEmpty(others)) {
+                    for(IWordTag other : others) {
+                        pipeline.addLast(other);
+                    }
+                }
+            }
+        };
+    }
+
 }
--- a/src/main/java/com/github/houbb/sensitive/word/utils/InnerWordTagUtils.java
+++ b/src/main/java/com/github/houbb/sensitive/word/utils/InnerWordTagUtils.java
@@ -1,7 +1,9 @@
 package com.github.houbb.sensitive.word.utils;

 import com.github.houbb.heaven.util.lang.StringUtil;
+import com.github.houbb.heaven.util.util.CollectionUtil;
 import com.github.houbb.sensitive.word.api.IWordContext;
+import com.github.houbb.sensitive.word.api.IWordTag;

 import java.util.Collections;
 import java.util.Set;
@@ -27,7 +29,14 @@ public class InnerWordTagUtils {
            return Collections.emptySet();
        }

-        // 是否需要格式化？ v0.24.0
+        final IWordTag wordTag = wordContext.wordTag();
+        // 直接获取
+        Set<String> actualSet = wordTag.getTag(word);
+        if(CollectionUtil.isNotEmpty(actualSet)) {
+            return actualSet;
+        }
+
+        // 格式化处理后的信息
        String formatWord = InnerWordFormatUtils.format(word, wordContext);
        return wordContext.wordTag().getTag(formatWord);
    }
--- a/src/main/resources/sensitive_word_tags.txt
+++ b/src/main/resources/sensitive_word_tags.txt
--- a/src/test/java/com/github/houbb/sensitive/word/bs/SensitiveWordBsTagTest.java
+++ b/src/test/java/com/github/houbb/sensitive/word/bs/SensitiveWordBsTagTest.java
@@ -1,16 +1,15 @@
 package com.github.houbb.sensitive.word.bs;

-import com.github.houbb.heaven.util.lang.StringUtil;
 import com.github.houbb.sensitive.word.api.IWordDeny;
 import com.github.houbb.sensitive.word.api.IWordTag;
 import com.github.houbb.sensitive.word.support.result.WordResultHandlers;
 import com.github.houbb.sensitive.word.support.result.WordTagsDto;
-import com.github.houbb.sensitive.word.support.tag.AbstractWordTag;
 import com.github.houbb.sensitive.word.support.tag.WordTags;
 import org.junit.Assert;
 import org.junit.Test;

-import java.util.*;
+import java.util.Arrays;
+import java.util.List;

 /**
 * <p> project: sensitive-word-SensitiveWordBsTest </p>
@@ -21,25 +20,10 @@ import java.util.*;
 */
 public class SensitiveWordBsTagTest {

-    private void addLine(String line,
-                         Map<String, Set<String>> wordTagMap) {
-        String[] strings = line.split(" ");
-        String key = strings[0];
-        Set<String> tags = new HashSet<>(StringUtil.splitToList(strings[1]));
-        wordTagMap.put(key, tags);
-    }
-
    @Test
    public void wordResultHandlerWordTagsTest() {
        // 自定义测试标签类
-        final Map<String, Set<String>> wordTagMap = new HashMap<>();
-        addLine("0售 广告", wordTagMap);
-        IWordTag wordTag = new AbstractWordTag() {
-            @Override
-            protected Set<String> doGetTag(String word) {
-                return wordTagMap.get(word);
-            }
-        };
+        IWordTag wordTag = WordTags.lines(Arrays.asList("0售 广告"));

        // 指定初始化
        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
@@ -59,19 +43,24 @@ public class SensitiveWordBsTagTest {
        Assert.assertEquals("[WordTagsDto{word='0售', tags=[广告]}]", wordTagsDtoList2.toString());
    }

+    @Test
+    public void wordResultHandlerWordTags2Test() {
+        // 自定义测试标签类
+        IWordTag wordTag = WordTags.lines(Arrays.asList("天安门 政治,国家,地址"));
+
+        // 指定初始化
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(wordTag)
+                .init()
+                ;
+        List<WordTagsDto> wordTagsDtoList1 = sensitiveWordBs.findAll("天安门", WordResultHandlers.wordTags());
+        Assert.assertEquals("[WordTagsDto{word='天安门', tags=[政治, 国家, 地址]}]", wordTagsDtoList1.toString());
+    }
+
    @Test
    public void wordTagsTest() {
        // 自定义测试标签类
-        final Map<String, Set<String>> wordTagMap = new HashMap<>();
-        addLine("0售 广告", wordTagMap);
-        addLine("天安门 政治,国家,地址", wordTagMap);
-        IWordTag wordTag = new AbstractWordTag() {
-            @Override
-            protected Set<String> doGetTag(String word) {
-                return wordTagMap.get(word);
-            }
-        };
-
+        IWordTag wordTag = WordTags.lines(Arrays.asList("0售 广告", "天安门 政治,国家,地址"));
        // 指定初始化
        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
                .wordTag(wordTag)
--- a/src/test/java/com/github/houbb/sensitive/word/support/tag/WordTagTest.java
+++ b/src/test/java/com/github/houbb/sensitive/word/support/tag/WordTagTest.java
@@ -0,0 +1,113 @@
+package com.github.houbb.sensitive.word.support.tag;
+
+import com.github.houbb.sensitive.word.api.IWordTag;
+import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.*;
+
+/**
+ * 单词处理
+ * @since 0.24.0
+ */
+public class WordTagTest {
+
+    public static void main(String[] args) {
+        final String path = "D:\\github\\sensitive-word\\src\\test\\resources\\dict_tag_test.txt";
+        IWordTag wordTag = WordTags.file(path);
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(wordTag)
+                .init();
+
+        Set<String> tagSet = sensitiveWordBs.tags("零售");
+        Assert.assertEquals("[广告, 网络]", tagSet.toString());
+
+
+        IWordTag wordTag2 = WordTags.file(path, " ", ",");
+        SensitiveWordBs sensitiveWordBs2 = SensitiveWordBs.newInstance()
+                .wordTag(wordTag2)
+                .init();
+        Set<String> tagSet2 = sensitiveWordBs2.tags("零售");
+        Assert.assertEquals("[广告, 网络]", tagSet2.toString());
+    }
+
+    @Test
+    public void noneTest() {
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(WordTags.none())
+                .init();
+
+        Set<String> tagSet = sensitiveWordBs.tags("博彩");
+        Assert.assertEquals("[]", tagSet.toString());
+    }
+
+    @Test
+    public void defaultsTest() {
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(WordTags.defaults())
+                .init();
+
+        Set<String> tagSet = sensitiveWordBs.tags("博彩");
+        Assert.assertEquals("[3]", tagSet.toString());
+    }
+
+    @Test
+    public void systemTest() {
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(WordTags.system())
+                .init();
+
+        Set<String> tagSet = sensitiveWordBs.tags("博彩");
+        Assert.assertEquals("[3]", tagSet.toString());
+    }
+
+    @Test
+    public void linesTest() {
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(WordTags.lines(Arrays.asList("博彩 赌博")))
+                .init();
+
+        Set<String> tagSet = sensitiveWordBs.tags("博彩");
+        Assert.assertEquals("[赌博]", tagSet.toString());
+    }
+
+    @Test
+    public void lines2Test() {
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(WordTags.lines(Arrays.asList("博彩:赌博,网络"), ":", ","))
+                .init();
+
+        Set<String> tagSet = sensitiveWordBs.tags("博彩");
+        Assert.assertEquals("[网络, 赌博]", tagSet.toString());
+    }
+
+    @Test
+    public void mapTest() {
+        Map<String, Set<String>> wordTagMap = new HashMap<>();
+        Set<String> initTagSet = new HashSet<>();
+        initTagSet.add("广告");
+        initTagSet.add("网络");
+        wordTagMap.put("零售", initTagSet);
+
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(WordTags.map(wordTagMap))
+                .init()
+                ;
+
+        Set<String> tagSet = sensitiveWordBs.tags("零售");
+        Assert.assertEquals("[广告, 网络]", tagSet.toString());
+    }
+
+    @Test
+    public void chainsTest() {
+        IWordTag wordTag = WordTags.lines(Arrays.asList("零售 广告,网络"));
+        SensitiveWordBs sensitiveWordBs = SensitiveWordBs.newInstance()
+                .wordTag(WordTags.chains(WordTags.none(), wordTag))
+                .init();
+
+        Set<String> tagSet = sensitiveWordBs.tags("零售");
+        Assert.assertEquals("[广告, 网络]", tagSet.toString());
+    }
+
+}
--- a/src/test/resources/dict_tag_test.txt
+++ b/src/test/resources/dict_tag_test.txt
@@ -1,4 +1,4 @@
 五星红旗 政治,国家
 毛主席 政治,国家,伟人
 天安门 政治,国家,地址
-0售 广告
+0售 广告,网络