Effortless mass text translation with PowerShell and ChatGPT (OpenAI API)

Effortless mass text translation with PowerShell and ChatGPT (OpenAI API)
Photo by Towfiqu barbhuiya / Unsplash

Recently, I faced the monumental task of translating a mountain of text files from English into French, German, Spanish, and Italian.
The endless copy/paste routine was driving me bonkers, so I had a lightbulb moment: why not let OpenAI's API handle it ?
The best part ? With their LLM (ChatGPT), I can fine-tune the translations to ensure they nail the context every time and stick to a specific language. And it’s fully automated.
Effortless and enjoyable translation !

Alright, let’s conjure up a minion to do the work for me !

let's start with a the process flow diagram:

The challenges ahead are :

  • Managing a high quantity of different text files.
  • Handling text files with huge numbers of words and sentences.
  • Avoiding token limit exceedance during an API call.
  • Maintaining context accuracy while translating sentences.
  • Keeping translations concise and close to the original text.
  • Ensuring the ability to translate into any language.
  • Automating the entire process to handle large numbers of text files without human intervention.

First, fire-up PowerShell and let’s define the necessary stored variables :

$folderPath = "C:\\Users\\grego\\Downloads\\Speak2text"
$apiToken = "your OpenAI API key here"
$lang = "French"

Next, let's define the function to split the text into sentences :

function Split-TextIntoSentences {
    param (
        [string]$text
    )
    
    $lines = $text -split "(\\r?\\n)"
    $sentences = @()
    
    foreach ($line in $lines) {
        if ($line -ne "") {
            $lineSentences = [System.Text.RegularExpressions.Regex]::Split($line, "(?<=\.)\s|(?<=\.)\r?\n")
            $sentences += $lineSentences
            $sentences += "`n"  # Preserve line break
        } else {
            $sentences += "`n"  # Preserve empty line
        }
    }
    
    return $sentences
}

Then, define the function to call ChatGPT through the OpenAI API :

function Translate-TextWithChatGPT {
    param (
        [string]$inputFilePath,
        [string]$apiToken,
        [string]$lang
    )

    # Read the content of the input file
    $inputText = Get-Content -Path $inputFilePath -Raw

    # Define the API endpoint and headers
    $apiUrl = "https://api.openai.com/v1/chat/completions"
    $headers = @{
        "Content-Type" = "application/json"
        "Authorization" = "Bearer $apiToken"
    }

    # Define the pre-prompt with the target language
    $prePrompt = "You're a native $lang speaker. You will translate the sentences from US english language to $lang. The text is about some features from Acronis Cyber Protect Cloud. Keep the sentences in the same order and structure as the original. Don't make the translated sentences more than 25% longer than the original ones."

    # Split the input text into sentences
    $sentences = Split-TextIntoSentences -text $inputText

    $translatedText = ""
    foreach ($sentence in $sentences) {
        if ($sentence -eq "`n") {
            $translatedText += "`r`n"
            continue
        }

        if (-not [string]::IsNullOrWhiteSpace($sentence)) {
            $messages = @(
                @{
                    "role" = "system"
                    "content" = $prePrompt
                },
                @{
                    "role" = "user"
                    "content" = $sentence
                }
            )

            $body = @{
                "model" = "gpt-4o"
                "messages" = $messages
                "max_tokens" = 4096
            } | ConvertTo-Json

            $response = Invoke-RestMethod -Uri $apiUrl -Method Post -Headers $headers -Body $body
            $translatedText += $response.choices[0].message.content -replace "`n", "`r`n"
        }
    }

"max_tokens" could be adjust as needed / "model" coud be adjust with gpt-3.5-turbo or gpt-4 or gpt-4o / "prePrompt" could be set as desired keep in mind that language to translate to is stored in $lang

Define the output to rename files by adding "_translated" at the end of the file name :

    $outputFileName = [System.IO.Path]::GetFileNameWithoutExtension($inputFilePath) -replace "_todo$", "_translated"
    $outputFilePath = [System.IO.Path]::Combine([System.IO.Path]::GetDirectoryName($inputFilePath), "$outputFileName.txt")

Process each file in the folder path that contains "_todo" at the end of its file name :

Get-ChildItem -Path $folderPath -Filter *_todo.txt | ForEach-Object {
    Translate-TextWithChatGPT -inputFilePath $_.FullName -apiToken $apiToken -lang $lang
}

Define the function to remove empty lines and clean up the output file :

function Remove-EmptyLines {
    param (
        [string]$filePath
    )

    $content = Get-Content -Path $filePath
    $cleanedContent = $content | Where-Object { $_ -ne "" }
    Set-Content -Path $filePath -Value $cleanedContent

    Write-Host "Empty lines removed from: $filePath"
}

function AddSpaceAfterPeriod {
    param (
        [string]$filePath
    )

    $content = Get-Content -Path $filePath -Raw
    $modifiedContent = [System.Text.RegularExpressions.Regex]::Replace($content, "\.(?![\r\n])", ". ")
    Set-Content -Path $filePath -Value $modifiedContent

    Write-Host "Periods followed by space added in: $filePath"
}

And just like that

Every text file in a specific folder with "_todo" at the end of its filename will be translated into the language set in $lang, using the pre-prompt defined in $prePrompt. The results will be cleaned up, stripping out any pesky empty lines, and saved in the same folder with "_translated" added to their filenames.

Et voilà !
Sit back, relax, and chill while the minion takes care of everything !

person on body of water reading book
Photo by Toa Heftiba / Unsplash

Here’s the complete script below :

# Define the function to split text into sentences while preserving line structure
function Split-TextIntoSentences {
    param (
        [string]$text
    )
    
    $lines = $text -split "(\\r?\\n)"
    $sentences = @()
    
    foreach ($line in $lines) {
        if ($line -ne "") {
            $lineSentences = [System.Text.RegularExpressions.Regex]::Split($line, "(?<=\.)\s|(?<=\.)\r?\n")
            $sentences += $lineSentences
            $sentences += "`n"  # Preserve line break
        } else {
            $sentences += "`n"  # Preserve empty line
        }
    }
    
    return $sentences
}

# Define the function to translate the text using ChatGPT API
function Translate-TextWithChatGPT {
    param (
        [string]$inputFilePath,
        [string]$apiToken,
        [string]$lang
    )

    # Read the content of the input file
    $inputText = Get-Content -Path $inputFilePath -Raw

    # Define the API endpoint and headers
    $apiUrl = "https://api.openai.com/v1/chat/completions"
    $headers = @{
        "Content-Type" = "application/json"
        "Authorization" = "Bearer $apiToken"
    }

    # Define the pre-prompt with the target language
    $prePrompt = "You're a native $lang speaker. You will translate the sentences from US english language to $lang. The text is about some features from Acronis Cyber Protect Cloud. Keep the sentences in the same order and structure as the original. Don't make the translated sentences more than 25% longer than the original ones."

    # Split the input text into sentences
    $sentences = Split-TextIntoSentences -text $inputText

    $translatedText = ""
    foreach ($sentence in $sentences) {
        if ($sentence -eq "`n") {
            $translatedText += "`r`n"
            continue
        }

        if (-not [string]::IsNullOrWhiteSpace($sentence)) {
            $messages = @(
                @{
                    "role" = "system"
                    "content" = $prePrompt
                },
                @{
                    "role" = "user"
                    "content" = $sentence
                }
            )

            $body = @{
                "model" = "gpt-4o"
                "messages" = $messages
                "max_tokens" = 4096  # Adjust as needed
            } | ConvertTo-Json

            $response = Invoke-RestMethod -Uri $apiUrl -Method Post -Headers $headers -Body $body
            $translatedText += $response.choices[0].message.content -replace "`n", "`r`n"
        }
    }

    # Define the output file path
    $outputFileName = [System.IO.Path]::GetFileNameWithoutExtension($inputFilePath) -replace "_todo$", "_translated"
    $outputFilePath = [System.IO.Path]::Combine([System.IO.Path]::GetDirectoryName($inputFilePath), "$outputFileName.txt")

    # Save the translated text to the output file
    Set-Content -Path $outputFilePath -Value $translatedText.TrimEnd("`r`n") 

    Write-Host "Translated text saved to: $outputFilePath"

    # Remove empty lines from the translated file
    Remove-EmptyLines -filePath $outputFilePath

    # Modify the output file to replace "." with ". " except at the end of lines
    AddSpaceAfterPeriod -filePath $outputFilePath
}

# Define the function to remove empty lines from a file
function Remove-EmptyLines {
    param (
        [string]$filePath
    )

    $content = Get-Content -Path $filePath
    $cleanedContent = $content | Where-Object { $_ -ne "" }
    Set-Content -Path $filePath -Value $cleanedContent

    Write-Host "Empty lines removed from: $filePath"
}

# Define the function to add a space after each period except at the end of lines
function AddSpaceAfterPeriod {
    param (
        [string]$filePath
    )

    $content = Get-Content -Path $filePath -Raw
    $modifiedContent = [System.Text.RegularExpressions.Regex]::Replace($content, "\.(?![\r\n])", ". ")
    Set-Content -Path $filePath -Value $modifiedContent

    Write-Host "Periods followed by space added in: $filePath"
}

# Specify the folder path and API token
$folderPath = "C:\\Users\\grego\\Downloads\\Speak2text"
$apiToken = "your OpenAI API key here"
$lang = "French"

# Process each *_todo.txt file in the folder
Get-ChildItem -Path $folderPath -Filter *_todo.txt | ForEach-Object {
    Translate-TextWithChatGPT -inputFilePath $_.FullName -apiToken $apiToken -lang $lang
}