Unitarian Hymns Lyrics Extraction

PDF music files will not have their lyrics extracted in verse order by PDF to Text programs, but the MusicXML files contain coding that does allow extraction in verse order. Thanks to NoteTab clippers, of great ability, particularly Flo Gerkhe (overtaking and lapping myself, of rather less ability), the following clip will extract both the titles and lyrics from music files with the document type .XML and, furthermore, give a good attempt at a verse structure of lines. If there are no lyrics, it should still produce the titles. This saves a lot of typing. Other tidying-up editing may be necessary.

It is best to obtain the clip via pure textClick here to open the pure text clip for copying or saving and it is necessary, too, to open the wanted .XML into the text and HTML editor NoteTab rather than copy it from a browser.

Nevertheless, the clip is produced below for viewing. It starts from ^!Continue and completes at ^!Clearvariables. With the opened text file copied, open NoteTab with the clipbook visible and right click on the clipbook (an appropriate book - I have 'Processing') and choose Add from Clipboard from the menu. Save the clipbook when prompted (perhaps when closing NoteTab). By the way, do change C:\Adrian's Documents\Music\ to the main music folder; my computer's My Documents is on a smaller D drive.

In addition, you might like a clip that opens all available .XML files in one folderClick here for the clip to view and copy or save, perhaps a download folder (so edit the clip). Again the text needs copying and adding as above.

If all that is too difficult, then you might want to download a separate clip book for Music XML and other operationsClick here to download the zipped .CLB file. Once this unzipped .CLB fileClick here to view the clipbook is dropped into the NoteTab Libraries folder, it will appear for use on next opening NoteTab.

To operate, once the clip is ready, download and open a MusicXML file in NoteTab, and it displays rather as an HTML file does. With this being the editing window, click on the MusicXML extract clip and the extraction will happen automatically. The original file is protected, a temporary file exists mid-operation, and the file should save to the main title of the hymn followed by .txt (with manual operation in cases of error).

As a reminder, the MusicXML file is the standard generic file type transferable between music composing software. Basically, it is a text file with extensive own coding understood by such software and appears like HTML. Other XML using methods include editing in .PDF files where the saved file puts the edited elements as XML tagged overlay text parts in the otherwise formatted .PDF file.

 

Now don't be naughty and get the clip via pure textClick here to open the pure text clip for copying from or saving.

^!Continue Extracts Titles and Lyrics. Proceed with a MusicXML file open.
^!SetScreenUpdate Off
^!Replace "'" >> "'" WAS
^!Replace "\R(?=</text>)" >> "" WARS
^!Find "(?s)^\x20*<credit page="1">.+</credit>" WRS
^!Set %Credits%=^$GetSelection$
^!Set %Credits%=^$StrReplace("<[^>]+>";"";^%Credits%;AR)$
^!Set %Credits%=^$StrReplace("^\x20+";"";^%Credits%;AR)$
^!Set %Credits%=^$StrReplace("^\R";"";^%Credits%;AR)$
^!Replace "\x20(?=</text>)" >> "" WARS
^!Replace "<syllabic>(?:single|end)\X+?<text>[^<]+?\K(?=</text>)" >> "\x20" WARS
^!SetClipboard ^$GetDocListAll("^\x20*<lyric number="(\d+)\X+?<text>([^<]+?)</text>";$1|$2\r\n)$
^!Select All
^!Paste
^!Set %Nr%=1

:Loop
^!Set %Text%=^$GetDocListAll("^^%Nr%\|(.+)";"$1")$
^!IfEmpty ^%Text% Out
^!Append %All%=^%Text%^P^P
^!Inc %Nr%
^!Goto Loop

:Out
^!Select All
^!InsertText ^%All%
^!Replace "(?<=[,;.\x20])\x20?(?=[[:upper:]])" >> "\r\n" WARS
^!Jump Doc_Start
^!InsertText ^%Credits%^P
^!Jump Doc_Start
^!Set %FileName%=^$GetLine$
^!Save as "C:\Adrian's Documents\Music\^%FileName%.txt"
^!ClearVariables