Skip to content

Instantly share code, notes, and snippets.

@mikaelz
Last active April 30, 2023 12:53
Show Gist options
  • Select an option

  • Save mikaelz/43d56f9af5cc1b64fe40054b22edeb25 to your computer and use it in GitHub Desktop.

Select an option

Save mikaelz/43d56f9af5cc1b64fe40054b22edeb25 to your computer and use it in GitHub Desktop.

Revisions

  1. mikaelz revised this gist Aug 1, 2022. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion generate-sitemap-xml.sh
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,7 @@
    #!/bin/bash
    cd /tmp
    echo "Wget $1"
    wget --spider --recursive --level=2 --no-verbose --output-file=sitemap.txt $1
    wget --spider --recursive --level=3 --no-verbose --output-file=sitemap.txt $1
    echo "Grep URLs"
    grep -i URL /tmp/sitemap.txt | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' | sort -u | sed '/^$/d' > /tmp/sitemap-urls.txt
    header='<?xml version="1.0" encoding="UTF-8"?><urlset
  2. mikaelz created this gist Aug 1, 2022.
    22 changes: 22 additions & 0 deletions generate-sitemap-xml.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,22 @@
    #!/bin/bash
    cd /tmp
    echo "Wget $1"
    wget --spider --recursive --level=2 --no-verbose --output-file=sitemap.txt $1
    echo "Grep URLs"
    grep -i URL /tmp/sitemap.txt | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' | sort -u | sed '/^$/d' > /tmp/sitemap-urls.txt
    header='<?xml version="1.0" encoding="UTF-8"?><urlset
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
    http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
    echo $header > /tmp/sitemap.xml
    while read url; do
    case "$url" in
    http:* | https:*)
    echo '<url><loc>'$url'</loc></url>' >> sitemap.xml
    ;;
    *)
    ;;
    esac
    done < /tmp/sitemap-urls.txt
    echo "</urlset>" >> sitemap.xml