Skip to content

Latest commit

 

History

History
147 lines (102 loc) · 9.1 KB

ADVANCED.md

File metadata and controls

147 lines (102 loc) · 9.1 KB

Documentation > Advanced Usage (current)


Table of Contents


Full / partial translation

  • Entire document

    pdf2zh example.pdf
  • Part of the document

    pdf2zh example.pdf -p 1-3,5

⬆️ Back to top


Specify source and target languages

See Google Languages Codes, DeepL Languages Codes

pdf2zh example.pdf -li en -lo ja

⬆️ Back to top


Translate with different services

We've provided a detailed table on the required environment variables for each translation service. Make sure to set them before using the respective service.

Translator Service Environment Variables Default Values Notes
Google (Default) google None N/A None
Bing bing None N/A None
DeepL deepl DEEPL_AUTH_KEY [Your Key] See DeepL
DeepLX deeplx DEEPLX_ENDPOINT https://api.deepl.com/translate See DeepLX
Ollama ollama OLLAMA_HOST, OLLAMA_MODEL http://127.0.0.1:11434, gemma2 See Ollama
OpenAI openai OPENAI_BASE_URL, OPENAI_API_KEY, OPENAI_MODEL https://api.openai.com/v1, [Your Key], gpt-4o-mini See OpenAI
AzureOpenAI azure-openai AZURE_OPENAI_BASE_URL, AZURE_OPENAI_API_KEY, AZURE_OPENAI_MODEL [Your Endpoint], [Your Key], gpt-4o-mini See Azure OpenAI
Zhipu zhipu ZHIPU_API_KEY, ZHIPU_MODEL [Your Key], glm-4-flash See Zhipu
ModelScope ModelScope MODELSCOPE_API_KEY, MODELSCOPE_MODEL [Your Key], Qwen/Qwen2.5-Coder-32B-Instruct See ModelScope
Silicon silicon SILICON_API_KEY, SILICON_MODEL [Your Key], Qwen/Qwen2.5-7B-Instruct See SiliconCloud
Gemini gemini GEMINI_API_KEY, GEMINI_MODEL [Your Key], gemini-1.5-flash See Gemini
Azure azure AZURE_ENDPOINT, AZURE_API_KEY https://api.translator.azure.cn, [Your Key] See Azure
Tencent tencent TENCENTCLOUD_SECRET_ID, TENCENTCLOUD_SECRET_KEY [Your ID], [Your Key] See Tencent
Dify dify DIFY_API_URL, DIFY_API_KEY [Your DIFY URL], [Your Key] See Dify,Three variables, lang_out, lang_in, and text, need to be defined in Dify's workflow input.
AnythingLLM anythingllm AnythingLLM_URL, AnythingLLM_APIKEY [Your AnythingLLM URL], [Your Key] See anything-llm

Use -s service or -s service:model to specify service:

pdf2zh example.pdf -s openai:gpt-4o-mini

Or specify model with environment variables:

set OPENAI_MODEL=gpt-4o-mini
pdf2zh example.pdf -s openai

⬆️ Back to top


Translate wih exceptions

Use regex to specify formula fonts and characters that need to be preserved:

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Preserve Latex, Mono, Code, Italic, Symbol and Math fonts by default:

pdf2zh example.pdf -f "(CM[^R]|(MS|XY|MT|BL|RM|EU|LA|RS)[A-Z]|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)"

⬆️ Back to top


Multi-threads

Use -t to specify how many threads to use in translation:

pdf2zh example.pdf -t 1

⬆️ Back to top


Custom prompt

Use --prompt to specify which prompt to use in llm:

pdf2zh example.pdf -pr prompt.txt

example prompt.txt

[
    {
        "role": "system",
        "content": "You are a professional,authentic machine translation engine.",
    },
    {
        "role": "user",
        "content": "Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:",
    },
]

In custom prompt file, there are three variables can be used.

variables comment
lang_in input language
lang_out output language
text text need to be translated

⬆️ Back to top