Skip to content

Instantly share code, notes, and snippets.

@rupeshtiwari
Forked from lolobosse/blogspot_to_jekyll.rb
Created February 26, 2021 22:16
Show Gist options
  • Select an option

  • Save rupeshtiwari/80f2203fee697a94e4b11b75b856aa56 to your computer and use it in GitHub Desktop.

Select an option

Save rupeshtiwari/80f2203fee697a94e4b11b75b856aa56 to your computer and use it in GitHub Desktop.
Migrate your blogger blog posts to jekyll.
# Convert blogger (blogspot) posts to jekyll posts
#
# What it does
# ------------
#
# 1) Fetches the blog's feed
# 2) For each post create a file with name
# "YYYY-MM-DD-{post-title}.html", with the following structure:
#
# ---
# layout: post
# title: {post-title}
# date: {YYYY-mm-dd HH:MM}
# comments: false
# categories:
# ---
#
# #{blog_post_content_in_html_format}
# 3) Write each file to a directory named `_posts`
#
# Requirements
# ------------
#
# * feedzirra: https://github.com/pauldix/feedzirra
#
# How to use
# ----------
#
# ruby blogger_to_jekyll.rb [feed_url]
#
require 'feedzirra'
require 'date'
def parse_post_entries(feed)
posts = []
feed.entries.each do |post|
obj = Hash.new
created_datetime = post.last_modified
creation_date = Date.strptime(created_datetime.to_s, "%Y-%m-%d")
title = post.title
file_name = creation_date.to_s + "-" + title.split(/ */).join("-").delete('\/') + ".markdown"
obj["file_name"] = file_name
obj["title"] = title
obj["creation_datetime"] = created_datetime
obj["content"] = post.content
posts.push(obj)
end
return posts
end
def write_posts(posts)
Dir.mkdir("_posts") unless File.directory?("_posts")
posts.each do |post|
file_name = "_posts/".concat(post["file_name"])
header = %{---
layout: post
title: #{post["title"]}
date: #{post["creation_datetime"]}
comments: false
categories:
---
%}
File.open(file_name, "w+") {|f|
f.write(header)
f.write(post["content"])
f.close
}
end
end
def main(feed_url="http://feeds.feedburner.com/Kennys/dev/null?format=xml")
puts "Fetching feed..."
feed = Feedzirra::Feed.fetch_and_parse(feed_url)
puts "Parsing feed..."
posts = parse_post_entries(feed)
puts "Writing posts..."
write_posts(posts)
end
main()
@talonx
Copy link

talonx commented Mar 7, 2023

xml = HTTParty.get("http://roopkt.blogspot.com/feeds/posts/default").body
This should take the feed_url and not be hardcoded?

@nikhilsilveira
Copy link

xml = HTTParty.get("http://roopkt.blogspot.com/feeds/posts/default").body.body)
This should take the feed_url and not be hardcoded?

Indeed it should. Users please note:

  • line 104: xml = HTTParty.get({"http://roopkt.blogspot.com/feeds/posts/default"}).body
    change this to xml = HTTParty.get({feed_url}).body

  • lines 39, 40 , ie, the block:

    title = post.title
    file_name = creation_date.to_s + "-" + title.split(/  */).join("-").delete('\/') + ".html"

    with:

    title = post.title
    safe_title = title.gsub(/[^0-9A-Za-z.\- ]/, '').strip.gsub(/\s+/, '-')
    file_name = "#{creation_date}-#{safe_title}.html"
    

    My import was failing due to special characters in the blog titles, like '?'. This edit is for sanitizing file names.

Thank you Rupesh, Talonx, and ChatGPT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment