Signed-off-by: DanRoscigno <dan@roscigno.com> |
||
|---|---|---|
| .. | ||
| en-to-ja-configs | ||
| en-to-zh-configs | ||
| scripts | ||
| zh-to-en-configs | ||
| .env.sample | ||
| .gitignore | ||
| README.md | ||
| translation.Dockerfile | ||
README.md
Translating docs with GPT
This README describes using GPT-4o to translate from Chinese to English, or from English to Chinese or Japanese. The system used is specific to Docusaurus Markdown and MDX. We are using code provided by Weights and Biases, as they also use Docusaurus and have expertise with LLMs.
1. Build the Docker image
See the README in the DanRoscigno/gpt-translate repo. Build the Docker image from there.
2. Set up the environment
There are three environment variables that need to be set in the file starrocks/docs/translation/.env:
Tip
Copy the file
starrocks/docs/translation/.env.sampletostarrocks/docs/translation/.envand edit with your API keys, or contact the Doc team leader for keys.
- OPENAI_API_KEY
- WANDB_API_KEY
- GIT_PYTHON_REFRESH
GIT_PYTHON_REFRESH should be set to quiet because we are not interacting with Git within the container. The other two environment variables will be provided by the Documentation Team leader.
These should be set in the file in starrocks/docs/translation/.env
This is the format:
OPENAI_API_KEY=sk-proj-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
WANDB_API_KEY=bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
GIT_PYTHON_REFRESH=quiet
3. Identify the files to translate
Provide the paths of the files to translate in the file starrocks/docs/translation/files.txt
The entries in the file should be relative to the starrocks/docs/translation/ folder, for example:
Tip
The only difference in the
files.txtcontent below is the name of the directory,zhoren. Add the path to the files that need to be translated.
From English to Chinese or Japanese
|
From Chinese to English
|
Tip:
Use
findto get all of the Markdown files under a directory. For example, to translate alll of the include files underdocs/en/_assets/:find ../en/_assets -type f -name "*\.md*" > files.txt
Also:
Note:
Known issues:
Pydantic error about serialization: There is an error coming up after the translation is complete. I have not debugged this yet, I will see if the author of the tranlation package can help, he knows way more about Python than I do.
4. Translate the docs
Change dir back up to the starrocks folder so that you can mount the docs/ folder in the container.
cd ../../
Translate the files:
From English to Chinese |
From English to Japanese |
From Chinese to English |
Sample output
You should see output similar to this if you are translating a single file:
Logged in as Weights & Biases user: danroscigno.
View Weave data at https://wandb.ai/danroscigno-starrocks/gpt-translate/weave
[14:56:11] INFO config_folder: ./configs cli.py:70
debug: false
do_translate_header_description: true
input_file: ./files.txt
input_folder: ../en
language: zh
limit: null
max_openai_concurrent_calls: 7
max_tokens: 4096
model: gpt-4o
out_file: ' intro_ja.md'
out_folder: ../zh
remove_comments: true
replace: true
silence_openai: true
temperature: 0.2
weave_project: gpt-translate
INFO Reading ./files.txt translate.py:195
INFO Translating 1 file translate.py:202
Translating files: 0%| | 0/1 [00:00<?, ?it/s][14:56:34] INFO ✅ Translated file saved to translate.py:169
../zh/developers/trace-tools/Trace.md
Translating files: 100%|██████████| 1/1 [00:22<00:00, 22.63s/it]
📦 Published to https://wandb.ai/danroscigno-starrocks/gpt-translate/weave/objects/Translation-zh/versions/UaI7t2Vtn8iI2Zcw7bQVHV1BMPUHYDY8cqBfcTW3QYQ
🍩 https://wandb.ai/danroscigno-starrocks/gpt-translate/r/call/0192dded-553f-7492-a82b-11539ad42bfa
Check the files
Once the translation is complete the container will exit. Check the status with git status and check the translated file(s).
Sample git status
My starrocks/docs/translation/files.txt contained a single path to the English Trace.md file, so a translated file was produced at docs/zh/developers/trace-tools/Trace.md:
$ git status
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: docs/zh/developers/trace-tools/Trace.md