Last active
June 19, 2023 21:01
-
-
Save a-chen/30f1e1b1a1f3d554c287bd55647acb96 to your computer and use it in GitHub Desktop.
Revisions
-
a-chen revised this gist
Jun 19, 2023 . 1 changed file with 10 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -55,12 +55,16 @@ output += "Lyrics:\n" verses = root.findall('.//ns:verse', ns) for verse in verses: lines_in_verse = verse.findall('.//ns:lines', ns) for lines in lines_in_verse: if lines is not None: lines_text = ET.tostring(lines, method='text', encoding='utf-8').decode('utf-8') lines_text = lines_text.replace('<br/>', '\n').strip() lines_text = ' '.join(lines_text.split()) output += lines_text + "\n" output += "\n" # Write output to a text file in output directory txt_file_name = os.path.join(output_directory, os.path.splitext(os.path.basename(file))[0] + '.txt') -
a-chen revised this gist
Jun 16, 2023 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1 @@ Gist name -
a-chen revised this gist
Jun 16, 2023 . 1 changed file with 144 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,144 @@ # OpenLP song export XML stripper Strips most OpenLP song export XML data so that's human-readable. The script is written in Python and uses the `lxml` library for XML parsing. ## Example ### Input file `H012 This Is My Father's World (Maltbie D. Babcock).xml` ``` <?xml version='1.0' encoding='UTF-8'?> <song xmlns="http://openlyrics.info/namespace/2009/song" version="0.8" createdIn="OpenLP 2.4.6" modifiedIn="OpenLP 2.4.6" modifiedDate="2020-07-05T16:12:57"> <properties> <titles> <title>H012 This Is My Father's World</title> <title>H012 這是天父世界</title> </titles> <verseOrder>v1 o1 v2 o2 v3 o3 v4</verseOrder> <authors> <author>Maltbie D. Babcock</author> </authors> <songbooks> <songbook name="Hymnary" entry="12"/> </songbooks> </properties> <format> <tags application="OpenLP"> <tag name="su"> <open><sup></open> <close></sup></close> </tag> <tag name="y"> <open><span style="-webkit-text-fill-color:yellow"></open> <close></span></close> </tag> </tags> </format> <lyrics> <verse name="v1"> <lines><tag name="su">1/3</tag> This is my Father’s world,<br/>And to my listening ears<br/>All nature sings, and round me rings<br/>The music of the spheres.<br/><br/><tag name="y">這 是 天 父 世 界 ,<br/>我 們 側 耳 要 聽 ,<br/>宇 宙 歌 唱 , 四 圍 響 應 ,<br/>星 辰 作 樂 同 聲 .</tag></lines> </verse> <verse name="o1"> <lines>This is my Father’s world;<br/>I rest me in the thought<br/>Of rocks and trees, of skies and seas―<br/>His hand the wonders wrought.<br/><br/><tag name="y">這 是 天 父 世 界 ,<br/>我 心 滿 有 安 寧 ;<br/>樹 木 花 草 蒼 天 碧 海<br/>述 說 天 父 全 能</tag></lines> </verse> <verse name="v2"> <lines><tag name="su">2/3</tag> This is my Father’s world,<br/>The birds their carols raise,<br/>The morning light, the lily white,<br/>Declare their Maker’s praise.<br/><br/><tag name="y">這 是 天 父 世 界<br/>小 鳥 展 翅 飛 鳴<br/>清 晨 明 亮 好 花 美 麗<br/>證 明 天 理 精 深</tag></lines> </verse> <verse name="o2"> <lines>This is my Father’s world:<br/>He shines in all that’s fair;<br/>In the rustling grass I hear Him pass, <br/>He speaks to me everywhere.<br/><br/><tag name="y">這 是 天 父 世 界<br/>祂 愛 普 及 萬 千<br/>風 吹 之 草 將 祂 表 現 <br/>天 父 充 滿 世 間</tag></lines> </verse> <verse name="v3"> <lines><tag name="su">3/3</tag> This is my Father’s world,<br/>O let me ne’er forget<br/>That tho’ the wrong seems oft so strong,<br/>God is the Ruler yet.<br/><br/><tag name="y">這 是 天 父 世 界<br/>求 主 叫 我 不 忘<br/>罪 惡 雖 然 好 像 得 勝<br/>天 父 卻 仍 掌 管</tag></lines> </verse> <verse name="o3"> <lines>This is my Father’s world:<br/>Why should my heart be sad?<br/>The Lord is King: let the heavens ring!<br/>God reigns: let earth be glad!<br/><br/><tag name="y">這 是 天 父 世 界<br/>我 心 不 必 憂 傷<br/>我 主 作 王 天 地 同 唱<br/>歌 聲 充 滿 萬 方</tag></lines> </verse> <verse name="v4"> <lines/> </verse> </lyrics> </song> ``` ### Output file `H012 This Is My Father's World (Maltbie D. Babcock).txt` ``` Title(s): H012 This Is My Father's World H012 這是天父世界 Verse Order: v1 o1 v2 o2 v3 o3 v4 Author(s): Maltbie D. Babcock Lyrics: 1/3 This is my Father’s world,And to my listening earsAll nature sings, and round me ringsThe music of the spheres.這 是 天 父 世 界 ,我 們 側 耳 要 聽 ,宇 宙 歌 唱 , 四 圍 響 應 ,星 辰 作 樂 同 聲 . This is my Father’s world;I rest me in the thoughtOf rocks and trees, of skies and seas―His hand the wonders wrought.這 是 天 父 世 界 ,我 心 滿 有 安 寧 ;樹 木 花 草 蒼 天 碧 海述 說 天 父 全 能 2/3 This is my Father’s world,The birds their carols raise,The morning light, the lily white,Declare their Maker’s praise.這 是 天 父 世 界小 鳥 展 翅 飛 鳴清 晨 明 亮 好 花 美 麗證 明 天 理 精 深 This is my Father’s world:He shines in all that’s fair;In the rustling grass I hear Him pass, He speaks to me everywhere.這 是 天 父 世 界祂 愛 普 及 萬 千風 吹 之 草 將 祂 表 現 天 父 充 滿 世 間 3/3 This is my Father’s world,O let me ne’er forgetThat tho’ the wrong seems oft so strong,God is the Ruler yet.這 是 天 父 世 界求 主 叫 我 不 忘罪 惡 雖 然 好 像 得 勝天 父 卻 仍 掌 管 This is my Father’s world:Why should my heart be sad?The Lord is King: let the heavens ring!God reigns: let earth be glad!這 是 天 父 世 界我 心 不 必 憂 傷我 主 作 王 天 地 同 唱歌 聲 充 滿 萬 方 ``` ## Getting Started ### Prerequisites - Python 3.x (Tested with Python 3.7, but newer versions should work as well) - pip (Python Package Installer) You can check your Python version with: ```bash python3 --version ``` ### Installing pip If you don't already have pip installed, you can install it using the script provided by Python's package maintainers: ```bash curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py python3 get-pip.py ``` You can check your pip version with: ```bash pip --version ``` ### Installing Dependencies This project depends on the `lxml` library. You can install it using pip: ```bash python3 -m pip install lxml ``` If you have multiple Python versions installed, replace `python3` with the version you used to install dependencies, e.g., `python3.7`. ## Running the Script 1. Create and put xml files into "input" directory 2. The script reads XML files from the `input` directory and writes the parsed information into text files in the `output` directory. `output` directory will be created automatically After installing the dependencies, you can run the script with: ```bash python3 strip-openlp-song-export-xml.py ``` -
a-chen revised this gist
Jun 16, 2023 . 1 changed file with 0 additions and 144 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,144 +0,0 @@ -
a-chen revised this gist
Jun 16, 2023 . No changes.There are no files selected for viewing
-
a-chen revised this gist
Jun 15, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -139,6 +139,6 @@ After installing the dependencies, you can run the script with: ```bash python3 strip-openlp-song-export-xml.py ``` -
a-chen revised this gist
Jun 15, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,7 +4,7 @@ Strips most OpenLP song export XML data so that's human-readable. The script is ## Example ### Input file `H012 This Is My Father's World (Maltbie D. Babcock).xml` ``` <?xml version='1.0' encoding='UTF-8'?> <song xmlns="http://openlyrics.info/namespace/2009/song" version="0.8" createdIn="OpenLP 2.4.6" modifiedIn="OpenLP 2.4.6" modifiedDate="2020-07-05T16:12:57"> -
a-chen revised this gist
Jun 15, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ # OpenLP song export XML stripper Strips most OpenLP song export XML data so that's human-readable. The script is written in Python and uses the `lxml` library for XML parsing. -
a-chen created this gist
Jun 15, 2023 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,144 @@ # Python XML Parser Strips most OpenLP song export XML data so that's human-readable. The script is written in Python and uses the `lxml` library for XML parsing. ## Example ### Input file `H012 This Is My Father's World (Maltbie D. Babcock).txt` ``` <?xml version='1.0' encoding='UTF-8'?> <song xmlns="http://openlyrics.info/namespace/2009/song" version="0.8" createdIn="OpenLP 2.4.6" modifiedIn="OpenLP 2.4.6" modifiedDate="2020-07-05T16:12:57"> <properties> <titles> <title>H012 This Is My Father's World</title> <title>H012 這是天父世界</title> </titles> <verseOrder>v1 o1 v2 o2 v3 o3 v4</verseOrder> <authors> <author>Maltbie D. Babcock</author> </authors> <songbooks> <songbook name="Hymnary" entry="12"/> </songbooks> </properties> <format> <tags application="OpenLP"> <tag name="su"> <open><sup></open> <close></sup></close> </tag> <tag name="y"> <open><span style="-webkit-text-fill-color:yellow"></open> <close></span></close> </tag> </tags> </format> <lyrics> <verse name="v1"> <lines><tag name="su">1/3</tag> This is my Father’s world,<br/>And to my listening ears<br/>All nature sings, and round me rings<br/>The music of the spheres.<br/><br/><tag name="y">這 是 天 父 世 界 ,<br/>我 們 側 耳 要 聽 ,<br/>宇 宙 歌 唱 , 四 圍 響 應 ,<br/>星 辰 作 樂 同 聲 .</tag></lines> </verse> <verse name="o1"> <lines>This is my Father’s world;<br/>I rest me in the thought<br/>Of rocks and trees, of skies and seas―<br/>His hand the wonders wrought.<br/><br/><tag name="y">這 是 天 父 世 界 ,<br/>我 心 滿 有 安 寧 ;<br/>樹 木 花 草 蒼 天 碧 海<br/>述 說 天 父 全 能</tag></lines> </verse> <verse name="v2"> <lines><tag name="su">2/3</tag> This is my Father’s world,<br/>The birds their carols raise,<br/>The morning light, the lily white,<br/>Declare their Maker’s praise.<br/><br/><tag name="y">這 是 天 父 世 界<br/>小 鳥 展 翅 飛 鳴<br/>清 晨 明 亮 好 花 美 麗<br/>證 明 天 理 精 深</tag></lines> </verse> <verse name="o2"> <lines>This is my Father’s world:<br/>He shines in all that’s fair;<br/>In the rustling grass I hear Him pass, <br/>He speaks to me everywhere.<br/><br/><tag name="y">這 是 天 父 世 界<br/>祂 愛 普 及 萬 千<br/>風 吹 之 草 將 祂 表 現 <br/>天 父 充 滿 世 間</tag></lines> </verse> <verse name="v3"> <lines><tag name="su">3/3</tag> This is my Father’s world,<br/>O let me ne’er forget<br/>That tho’ the wrong seems oft so strong,<br/>God is the Ruler yet.<br/><br/><tag name="y">這 是 天 父 世 界<br/>求 主 叫 我 不 忘<br/>罪 惡 雖 然 好 像 得 勝<br/>天 父 卻 仍 掌 管</tag></lines> </verse> <verse name="o3"> <lines>This is my Father’s world:<br/>Why should my heart be sad?<br/>The Lord is King: let the heavens ring!<br/>God reigns: let earth be glad!<br/><br/><tag name="y">這 是 天 父 世 界<br/>我 心 不 必 憂 傷<br/>我 主 作 王 天 地 同 唱<br/>歌 聲 充 滿 萬 方</tag></lines> </verse> <verse name="v4"> <lines/> </verse> </lyrics> </song> ``` ### Output file `H012 This Is My Father's World (Maltbie D. Babcock).txt` ``` Title(s): H012 This Is My Father's World H012 這是天父世界 Verse Order: v1 o1 v2 o2 v3 o3 v4 Author(s): Maltbie D. Babcock Lyrics: 1/3 This is my Father’s world,And to my listening earsAll nature sings, and round me ringsThe music of the spheres.這 是 天 父 世 界 ,我 們 側 耳 要 聽 ,宇 宙 歌 唱 , 四 圍 響 應 ,星 辰 作 樂 同 聲 . This is my Father’s world;I rest me in the thoughtOf rocks and trees, of skies and seas―His hand the wonders wrought.這 是 天 父 世 界 ,我 心 滿 有 安 寧 ;樹 木 花 草 蒼 天 碧 海述 說 天 父 全 能 2/3 This is my Father’s world,The birds their carols raise,The morning light, the lily white,Declare their Maker’s praise.這 是 天 父 世 界小 鳥 展 翅 飛 鳴清 晨 明 亮 好 花 美 麗證 明 天 理 精 深 This is my Father’s world:He shines in all that’s fair;In the rustling grass I hear Him pass, He speaks to me everywhere.這 是 天 父 世 界祂 愛 普 及 萬 千風 吹 之 草 將 祂 表 現 天 父 充 滿 世 間 3/3 This is my Father’s world,O let me ne’er forgetThat tho’ the wrong seems oft so strong,God is the Ruler yet.這 是 天 父 世 界求 主 叫 我 不 忘罪 惡 雖 然 好 像 得 勝天 父 卻 仍 掌 管 This is my Father’s world:Why should my heart be sad?The Lord is King: let the heavens ring!God reigns: let earth be glad!這 是 天 父 世 界我 心 不 必 憂 傷我 主 作 王 天 地 同 唱歌 聲 充 滿 萬 方 ``` ## Getting Started ### Prerequisites - Python 3.x (Tested with Python 3.7, but newer versions should work as well) - pip (Python Package Installer) You can check your Python version with: ```bash python3 --version ``` ### Installing pip If you don't already have pip installed, you can install it using the script provided by Python's package maintainers: ```bash curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py python3 get-pip.py ``` You can check your pip version with: ```bash pip --version ``` ### Installing Dependencies This project depends on the `lxml` library. You can install it using pip: ```bash python3 -m pip install lxml ``` If you have multiple Python versions installed, replace `python3` with the version you used to install dependencies, e.g., `python3.7`. ## Running the Script 1. Create and put xml files into "input" directory 2. The script reads XML files from the `input` directory and writes the parsed information into text files in the `output` directory. `output` directory will be created automatically After installing the dependencies, you can run the script with: ```bash python3 script.py ``` This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,70 @@ import os import glob from lxml import etree as ET # Directory path input_directory = os.path.join(os.getcwd(), 'input') output_directory = os.path.join(os.getcwd(), 'output') # Create output directory if it doesn't exist os.makedirs(output_directory, exist_ok=True) # Remove all existing files in output directory for filename in os.listdir(output_directory): os.remove(os.path.join(output_directory, filename)) # Get all XML files in the input directory files = glob.glob(os.path.join(input_directory, '*.xml')) ns = {'ns': 'http://openlyrics.info/namespace/2009/song'} for file in files: # Parse XML file parser = ET.XMLParser(remove_blank_text=True) tree = ET.parse(file, parser) root = tree.getroot() # Prepare output output = "Title(s):\n" titles = root.findall('.//ns:title', ns) for title in titles: output += (title.text or '').strip() + "\n" copyright = root.find('.//ns:copyright', ns) if copyright is not None and copyright.text: output += "Copyright: " + copyright.text.strip() + "\n" verseOrder = root.find('.//ns:verseOrder', ns) if verseOrder is not None and verseOrder.text: output += "Verse Order: " + verseOrder.text.strip() + "\n" ccliNo = root.find('.//ns:ccliNo', ns) if ccliNo is not None and ccliNo.text: output += "CCLI No.: " + ccliNo.text.strip() + "\n" authors = root.findall('.//ns:author', ns) if authors: output += "Author(s): " for i, author in enumerate(authors): output += (author.text or '').strip() if i < len(authors) - 1: output += ", " else: output += "\n" output += "Lyrics:\n" verses = root.findall('.//ns:verse', ns) for verse in verses: lines = verse.find('.//ns:lines', ns) if lines is not None: lines_text = ET.tostring(lines, method='text', encoding='utf-8').decode('utf-8') lines_text = lines_text.replace('<br/>', '\n').strip() lines_text = ' '.join(lines_text.split()) output += lines_text + "\n\n" # Write output to a text file in output directory txt_file_name = os.path.join(output_directory, os.path.splitext(os.path.basename(file))[0] + '.txt') with open(txt_file_name, 'w') as f: f.write(output) print("All files have been processed and saved!")