Documentation > Advanced Usage (current)
- Full / partial translation
- Specify source and target languages
- Translate with different services
- Translate wih exceptions
- Multi-threads
- Custom prompt
-
Entire document
pdf2zh example.pdf
-
Part of the document
pdf2zh example.pdf -p 1-3,5
See Google Languages Codes, DeepL Languages Codes
pdf2zh example.pdf -li en -lo ja
We've provided a detailed table on the required environment variables for each translation service. Make sure to set them before using the respective service.
Translator | Service | Environment Variables | Default Values | Notes |
---|---|---|---|---|
Google (Default) | google |
None | N/A | None |
Bing | bing |
None | N/A | None |
DeepL | deepl |
DEEPL_AUTH_KEY |
[Your Key] |
See DeepL |
DeepLX | deeplx |
DEEPLX_ENDPOINT |
https://api.deepl.com/translate |
See DeepLX |
Ollama | ollama |
OLLAMA_HOST , OLLAMA_MODEL |
http://127.0.0.1:11434 , gemma2 |
See Ollama |
OpenAI | openai |
OPENAI_BASE_URL , OPENAI_API_KEY , OPENAI_MODEL |
https://api.openai.com/v1 , [Your Key] , gpt-4o-mini |
See OpenAI |
AzureOpenAI | azure-openai |
AZURE_OPENAI_BASE_URL , AZURE_OPENAI_API_KEY , AZURE_OPENAI_MODEL |
[Your Endpoint] , [Your Key] , gpt-4o-mini |
See Azure OpenAI |
Zhipu | zhipu |
ZHIPU_API_KEY , ZHIPU_MODEL |
[Your Key] , glm-4-flash |
See Zhipu |
ModelScope | ModelScope |
MODELSCOPE_API_KEY , MODELSCOPE_MODEL |
[Your Key] , Qwen/Qwen2.5-Coder-32B-Instruct |
See ModelScope |
Silicon | silicon |
SILICON_API_KEY , SILICON_MODEL |
[Your Key] , Qwen/Qwen2.5-7B-Instruct |
See SiliconCloud |
Gemini | gemini |
GEMINI_API_KEY , GEMINI_MODEL |
[Your Key] , gemini-1.5-flash |
See Gemini |
Azure | azure |
AZURE_ENDPOINT , AZURE_API_KEY |
https://api.translator.azure.cn , [Your Key] |
See Azure |
Tencent | tencent |
TENCENTCLOUD_SECRET_ID , TENCENTCLOUD_SECRET_KEY |
[Your ID] , [Your Key] |
See Tencent |
Dify | dify |
DIFY_API_URL , DIFY_API_KEY |
[Your DIFY URL] , [Your Key] |
See Dify,Three variables, lang_out, lang_in, and text, need to be defined in Dify's workflow input. |
AnythingLLM | anythingllm |
AnythingLLM_URL , AnythingLLM_APIKEY |
[Your AnythingLLM URL] , [Your Key] |
See anything-llm |
Use -s service
or -s service:model
to specify service:
pdf2zh example.pdf -s openai:gpt-4o-mini
Or specify model with environment variables:
set OPENAI_MODEL=gpt-4o-mini
pdf2zh example.pdf -s openai
Use regex to specify formula fonts and characters that need to be preserved:
pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
Preserve Latex
, Mono
, Code
, Italic
, Symbol
and Math
fonts by default:
pdf2zh example.pdf -f "(CM[^R]|(MS|XY|MT|BL|RM|EU|LA|RS)[A-Z]|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)"
Use -t
to specify how many threads to use in translation:
pdf2zh example.pdf -t 1
Use --prompt
to specify which prompt to use in llm:
pdf2zh example.pdf -pr prompt.txt
example prompt.txt
[
{
"role": "system",
"content": "You are a professional,authentic machine translation engine.",
},
{
"role": "user",
"content": "Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:",
},
]
In custom prompt file, there are three variables can be used.
variables | comment |
---|---|
lang_in |
input language |
lang_out |
output language |
text |
text need to be translated |