Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save matthewjackowski/9c86cf9a61f3184e2f1f4fe85cda53e3 to your computer and use it in GitHub Desktop.
Save matthewjackowski/9c86cf9a61f3184e2f1f4fe85cda53e3 to your computer and use it in GitHub Desktop.

Revisions

  1. Matthew Jackowski created this gist Jul 28, 2016.
    35 changes: 35 additions & 0 deletions remove-translation-segmentation.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,35 @@
    #!/usr/bin/env ruby
    require 'optparse'
    require 'nokogiri'

    # This will hold the options we parse
    options = {}

    # Build command line parser
    OptionParser.new do |p|

    # Take a filename for the input
    p.on("-i", "--infile INFILE", "The name of the file to process") do |v|
    options[:infile] = v
    end

    # Take a filename for the output file
    p.on("-o", "--outfile OUTFILE", "The name of the file to save") do |u|
    options[:outfile] = u
    end
    end.parse!


    # Open file with Nokogiri and set file encoding to UTF-8
    doc = File.open(options[:infile]) { |f| Nokogiri::XML(f) }
    doc.encoding = 'utf-8'

    # Iterate through the trans-unit elements, get content of source and the target, remove seg-source
    doc.xpath('//trans-unit').each do |t|
    t.at_xpath('source').content = t.at_xpath('source').content
    t.at_xpath('target').content = t.at_xpath('target').content
    t.at_xpath('seg-source').remove
    end

    # Write the final file
    File.write(options[:outfile], doc.to_xml)