Replace the translate GitHub workflow with a Docker container.
To translate an English doc:
Setup the environment
There are three environment variables that need to be set in the file starrocks/docs/translation/.env:
OPENAI_API_KEY
WANDB_API_KEY
GIT_PYTHON_REFRESH
GIT_PYTHON_REFRESH should be set to quiet because we are not interacting with Git within the container. The other two environment variables will be provided by the Documentation Team leader.
These should be set in the file in starrocks/docs/translation/.env
This is the format:
OPENAI_API_KEY=sk-proj-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
WANDB_API_KEY=bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
GIT_PYTHON_REFRESH=quiet
Files to translate
Provide the paths of the files to translate in the file starrocks/docs/translation/files.txt
The entries in the file should be relative to the starrocks/docs/translation/ folder, for example:
../en/quick_start/quick_start.md
../en/deployment/helm.md
Build the Docker image
This probably only needs to be done once unless the folks from Weights and Biases modify the Python package gpt_translate.
cd docs/translation
docker build -f translation.Dockerfile -t translate .
Translate the docs
Change dir back up to the starrocks folder so that you can mount the docs/ folder in the container.
cd ../../
Translate the files:
docker run -v ./docs:/docs \
--env-file ./docs/translation/.env \
translate \
bash /docs/translation/scripts/translate.sh
Check the files
Once the translation is complete the container will exit. Check the status with git status and check the translated file(s).
Signed-off-by: DanRoscigno <dan@roscigno.com>
This PR is to add a workflow that translates docs from English to Chinese. I can provide the GPT and W&B secrets.
Note:
This requires three secrets and a PAT. The details are below.
Detect changed Markdown or MDX files using tj-actions/changed-files
Translate Docusaurus Markdown from English to Chinese using tcapelle/gpt_translate
Automatically open a PR with the Chinese docs using peter-evans/create-pull-request
Setup
Github Personal Access Token
I used a fine-grained token limited to this repo. Here are the perms I gave:
Read and Write access to code (content) and pull requests
Read and Write access to pull requests
Additionally, GitHub automatically assigned:
Read access to metadata
Repository secret
I created three repository secrets:
TRANSLATE_PAT
This contains the PAT created above.
OPENAI_API_KEY
This contains a GPT4 API key
WANDB_API_KEY
This contains the output of W&B authorize
Workflow file notes
The workflow runs when a pull request with target main is closed by being merged. It does not run on every commit made to a PR as that is wasteful.
The paths and filters in the workflow are specific to the StarRocks docs, you will need to change them.
gpt_translate configuration
The configuration is in the configs dir in this repo. At StarRocks we use a modified prompt and temperature in addition to a custom glossary in configs/language_dicts/zh.yaml. You can compare our prompt with the default from tcapelle/gpt_translate.
About Weights and Biases
You should read Weights & Biases, I am no expert. I do think that the Weave feature is going to be useful as I tune the prompt and glossary I am using. Weave keeps track of the changes and the impact. Additionally, W&B validates the translations for me automatically and provides scores. Seriously, I am no AI expert; check out their site.
Signed-off-by: DanRoscigno <dan@roscigno.com>