Skip to content

Instantly share code, notes, and snippets.

@inneroot
Created March 24, 2025 11:50
Show Gist options
  • Select an option

  • Save inneroot/9d843fe437c7f4291d7e3d8ae72759f1 to your computer and use it in GitHub Desktop.

Select an option

Save inneroot/9d843fe437c7f4291d7e3d8ae72759f1 to your computer and use it in GitHub Desktop.

Revisions

  1. inneroot revised this gist Mar 24, 2025. No changes.
  2. inneroot created this gist Mar 24, 2025.
    15 changes: 15 additions & 0 deletions convertWrongEncode.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,15 @@
    def fix_double_encoded_cyrillic(garbled_text):
    # Step 1: Get the raw bytes of the garbled text (as UTF-8)
    raw_bytes = garbled_text.encode('latin1') # Preserves exact byte values

    # Step 2: Convert bytes to hex list (for debugging)
    hex_bytes = [hex(b) for b in raw_bytes] # ['0xc2', '0xe2', '0xe5', ...]

    # Step 3: Reinterpret the bytes as Windows-1251 (Cyrillic)
    fixed_text = raw_bytes.decode('windows-1251') # Correct decoding
    return fixed_text

    # Test it
    garbled = """Ââåäåíèå"""
    fixed = fix_double_encoded_cyrillic(garbled)
    print(fixed) # Output: "Введение"