且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在RASA NLU中使用Hindi模型?

更新时间:2023-12-02 21:56:16

似乎您已经使用spaCy成功地学习了hi模型.下一步是编写如下配置文件:

It seems that you have successfully learned hi model using spaCy. The next step is to write a config file like:

language: "hi"

pipeline:
- name: "tokenizer_whitespace"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"

如果刚刚学习的hi模型也具有令牌生成器,则可以将tokenizer_whitespace替换为tokenizer_spacy.

If your hi model which you just learned also have tokenizer, you can replace tokenizer_whitespace with tokenizer_spacy.

我应该提到基于tensorflow的rasa的新意图分类器不需要您的hi模型的字向量,它会从头开始提取字向量,请参见

I should mention that the new intent classifier of rasa which is based on tensorflow does not need wordvectors of your hi model, it extract the wordevectors from scratch, see here. For the entity extraction you also don't need the hi model, just tokenizer do the stuffs for you! So, in overall, you can have your bot even without hi model!

培训数据文件应为json或markdown,如 doc .我认为您的意图和实体的名称应使用英语,但很明显,示例查询可以使用任何utf-8语言(例如印地文).

The training data file should can be json or markdown as fully explained in doc. I think the name of your intents and entities should be in English but it is clear that the sample queries can be in any utf-8 language like hindi.

然后,您可以使用文档中介绍的不同方法来学习模型. 例如:

Then you can learn your model using different methods which explained in doc. for example:

python3 -m rasa_nlu.train \
    --config YOUR_CONFIG_FILE.yml \
    --data YOUR_TRAIN_DATA.json \
    --path PATH_TO_SAVE_MODEL

您可以在文档中找到快速入门.

You can find a good quick start in doc.