Skip to content

Instantly share code, notes, and snippets.

@huksley
Last active September 3, 2025 09:11
Show Gist options
  • Save huksley/bc3cb046157a99cd9d1517b32f91a99e to your computer and use it in GitHub Desktop.
Save huksley/bc3cb046157a99cd9d1517b32f91a99e to your computer and use it in GitHub Desktop.
This script decodes Google News generated encoded, internal URLs for RSS items
/**
* This magically uses batchexecute protocol. It's not documented, but it works.
*
* Licensed under: MIT License
*
* Copyright (c) 2024 Ruslan Gainutdinov
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
const fetchDecodedBatchExecute = (id: string) => {
const s =
'[[["Fbv4je","[\\"garturlreq\\",[[\\"en-US\\",\\"US\\",[\\"FINANCE_TOP_INDICES\\",\\"WEB_TEST_1_0_0\\"],null,null,1,1,\\"US:en\\",null,180,null,null,null,null,null,0,null,null,[1608992183,723341000]],\\"en-US\\",\\"US\\",1,[2,3,4,8],1,0,\\"655000234\\",0,0,null,0],\\"' +
id +
'\\"]",null,"generic"]]]';
return fetch("https://news.google.com/_/DotsSplashUi/data/batchexecute?" + "rpcids=Fbv4je", {
headers: {
"Content-Type": "application/x-www-form-urlencoded;charset=utf-8",
Referrer: "https://news.google.com/"
},
body: "f.req=" + encodeURIComponent(s),
method: "POST"
})
.then(e => e.text())
.then(s => {
const header = '[\\"garturlres\\",\\"';
const footer = '\\",';
if (!s.includes(header)) {
throw new Error("header not found: " + s);
}
const start = s.substring(s.indexOf(header) + header.length);
if (!start.includes(footer)) {
throw new Error("footer not found");
}
const url = start.substring(0, start.indexOf(footer));
return url;
});
};
/**
* Google News started generate encoded, internal URLs for RSS items
* https://news.google.com/rss/search?q=New%20York%20when%3A30d&hl=en-US&gl=US&ceid=US:en
*
* This script decodes URLs into original one, for example URL
* https://news.google.com/__i/rss/rd/articles/CBMiSGh0dHBzOi8vdGVjaGNydW5jaC5jb20vMjAyMi8xMC8yNy9uZXcteW9yay1wb3N0LWhhY2tlZC1vZmZlbnNpdmUtdHdlZXRzL9IBAA?oc=5
*
* contains this
* https://techcrunch.com/2022/10/27/new-york-post-hacked-offensive-tweets/
*
* In path after articles/ goes Base64 encoded binary data
*
* Format is the following:
* <prefix> <len bytes> <URL bytes> <len bytes> <amp URL bytes> [<suffix>]
*
* <prefix> - 0x08, 0x13, 0x22
* <suffix> - 0xd2, 0x01, 0x00 (sometimes missing??)
* <len bytes> - formatted as 0x40 or 0x81 0x01 sometimes
*
*
* https://news.google.com/rss/articles/CBMiqwFBVV95cUxNMTRqdUZpNl9hQldXbGo2YVVLOGFQdkFLYldlMUxUVlNEaElsYjRRODVUMkF3R1RYdWxvT1NoVzdUYS0xSHg3eVdpTjdVODQ5cVJJLWt4dk9vZFBScVp2ZmpzQXZZRy1ncDM5c2tRbXBVVHVrQnpmMGVrQXNkQVItV3h4dVQ1V1BTbjhnM3k2ZUdPdnhVOFk1NmllNTZkdGJTbW9NX0k5U3E2Tkk?oc=5
* https://news.google.com/rss/articles/CBMidkFVX3lxTFB1QmFsSi1Zc3dLQkpNLThKTXExWXBGWlE0eERJQ2hLRENIOFJzRTlsRnM1NS1Hc2FlbjdIMlZ3eWNQa0JqeVYzZGs1Y0hKaUtTUko2dmJabUtVMWZob0lNSFNCa3NLQ05ROGh4cVZfVTYyUDVxc2c?oc=5
* https://news.google.com/rss/articles/CBMiqwFBVV95cUxNMTRqdUZpNl9hQldXbGo2YVVLOGFQdkFLYldlMUxUVlNEaElsYjRRODVUMkF3R1RYdWxvT1NoVzdUYS0xSHg3eVdpTjdVODQ5cVJJLWt4dk9vZFBScVp2ZmpzQXZZRy1ncDM5c2tRbXBVVHVrQnpmMGVrQXNkQVItV3h4dVQ1V1BTbjhnM3k2ZUdPdnhVOFk1NmllNTZkdGJTbW9NX0k5U3E2Tkk?oc=5
*
* FIXME: What will happen if URL more than 255 bytes??
*
* Licensed under: MIT License
*
* Copyright (c) 2022 Ruslan Gainutdinov
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
export const decodeGoogleNewsUrl = async (sourceUrl: string) => {
const url = new URL(sourceUrl);
const path = url.pathname.split("/");
if (
url.hostname === "news.google.com" &&
path.length > 1 &&
path[path.length - 2] === "articles"
) {
const base64 = path[path.length - 1];
let str = atob(base64);
const prefix = Buffer.from([0x08, 0x13, 0x22]).toString("binary");
if (str.startsWith(prefix)) {
str = str.substring(prefix.length);
}
const suffix = Buffer.from([0xd2, 0x01, 0x00]).toString("binary");
if (str.endsWith(suffix)) {
str = str.substring(0, str.length - suffix.length);
}
// One or two bytes to skip
const bytes = Uint8Array.from(str, c => c.charCodeAt(0));
const len = bytes.at(0)!;
if (len >= 0x80) {
str = str.substring(2, len + 2);
} else {
str = str.substring(1, len + 1);
}
if (str.startsWith("AU_yqL")) {
// New style encoding, introduced in July 2024. Not yet known how to decode offline.
const url = await fetchDecodedBatchExecute(base64);
return url;
}
return str;
} else {
return sourceUrl;
}
};
@vincenzon
Copy link

@sviatoslav-lebediev the code I am using is below. It starts to 429 pretty quick.

The main loop is:

def get_news_urls(news, proxies):
    prx_at = 0
    for n in news:
        try:
            prx_at, proxy = get_next_proxy(proxies, prx_at)
            if proxy is not None:
                url = decode_google_news_url(n['url'], proxy['url'])
                n['actual_url'] = url
                print(url)
                sleep(15)
        except Exception as e:
            if str(e).startswith('429 Client Error'):
                proxy['valid'] = False

The get_next_proxy is returning the next proxy from a list in the case of webshare, and just the brightdata it is the bd url which does the rotation for you under the hood (at least that's my understanding). The list of news objects are obtained using the gnews package and look like:

https://news.google.com/rss/articles/CBMimwFBVV95cUxQTFNqdl84NzFwZkxYNy1lbnVVWnN4X2xqdDlFVzJwNjIxdFI5c0trUnJ3R3prMlg1cVBERVNERDFDNExtQVlVVzBaUjluZ25uUk10NDk3QTBJbmpjbnA1X0tZTUlIaFlQVEJ1eHR4WFlaYkpfaVFUOFg4YlVnZm1RcFlZWDF3Q1d0M3MtLVlkMVpveW1oY244d19wQdIBowFBVV95cUxNUGJwWDRIRHdCRVZtSkVBTnRVelQ1NlVOSjQtSjRlblBOb280NE1mQ2VHYmZlZkF1LXY0TGNUNGxTS2RsaUotT3NKdWVseXdfY2M5c1JmNDNWdXpZMTRSb3RJV0l0VDNGRHJHTjRzNF9iYjNBRlBOMXEtUXlzTnVmYmxNVFZJSmlzOVFzZWpIWjdsX0lIZUFicThZS1cyd05EXzV3?oc=5&hl=en-US&gl=US&ceid=US:en
https://news.google.com/rss/articles/CBMikAFBVV95cUxNOVVNUjcxVS1uajdnZV9wUWxyaVB2S2kxQXdhTkNtdm1qcllkVVczUEp4MjZtSUl5eGJMdm9haC1xWWlPUUNpWndDeDE3NmcwYnR4UUV5TDZfdHNNM1dGbjdOU0o1YlViM1c1MDNMdXp6b21yZ2dqaE1jNnJudjZvYWVGSzVQdWtNdXREcmVSOVjSAaIBQVVfeXFMTU1wUUJTclp5Q2VNVVRGTlJFNW9hbkdjTThuWVFQeklFWFZWY2RvbHctYjVCTWxjZVBrWVZxYzQtUWhKM2lGUzdackZUS01lRVdPbS1vaWwzcjJZX3Q4bnFUbnloSjhhTjVRX1dzbXdiWHo3RzJXN2cwS2hGSERPR25hN3kyck96WWpzNThLbzJSOGlIYTQzRkZhVmRNdkktTXpR?oc=5&hl=en-US&gl=US&ceid=US:en

The decode_google_news is taken from above. I think my only change was to add a header in an attempt to get it to work.

def decode_google_news_url(source_url, proxy_url):
    article = get_decoding_params(urlparse(source_url).path.split("/")[-1])
    articles_req = [
        "Fbv4je",
        f'["garturlreq",[["X","X",["X","X"],null,null,1,1,"US:en",null,1,null,null,null,null,null,0,1],"X","X",1,[1,1,1],1,1,null,0,0,null,0],"{article["gn_art_id"]}",{article["timestamp"]},"{article["signature"]}"]',
    ]

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://www.google.com/",
        "DNT": "1",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1"
    }
    
    response = requests.post(
        url="https://news.google.com/_/DotsSplashUi/data/batchexecute",
        headers={ 'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8' },
        data=f"f.req={quote(json.dumps([[articles_req]]))}",
        proxies={'http': proxy_url, 'https': proxy_url} 
    )
    response.raise_for_status()

    return json.loads(json.loads(response.text.split("\n\n")[1])[:-2][0][2])[1]

def get_decoding_params(gn_art_id):
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
    }
    response = requests.get(
        f"https://news.google.com/articles/{gn_art_id}", headers=headers
    )
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "lxml")
    div = soup.select_one("c-wiz > div")

    return {
        "signature": div.get("data-n-a-sg"),
        "timestamp": div.get("data-n-a-ts"),
        "gn_art_id": gn_art_id,
    }

@iamatef
Copy link

iamatef commented Sep 10, 2024

PYTHON SOLUTION

I don't know the exact algorithm used by Google to encode/decode the URLs, but I found a way to decode them using reverse engineering by inspecting the requests made by the browser in the redirection chain.

pip install beautifulsoup4 lxml

import json
from urllib.parse import quote, urlparse

import requests
from bs4 import BeautifulSoup


def get_decoding_params(gn_art_id):
    response = requests.get(f"https://news.google.com/articles/{gn_art_id}")
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "lxml")
    div = soup.select_one("c-wiz > div")
    return {
        "signature": div.get("data-n-a-sg"),
        "timestamp": div.get("data-n-a-ts"),
        "gn_art_id": gn_art_id,
    }


def decode_urls(articles):
    articles_reqs = [
        [
            "Fbv4je",
            f'["garturlreq",[["X","X",["X","X"],null,null,1,1,"US:en",null,1,null,null,null,null,null,0,1],"X","X",1,[1,1,1],1,1,null,0,0,null,0],"{art["gn_art_id"]}",{art["timestamp"]},"{art["signature"]}"]',
        ]
        for art in articles
    ]
    payload = f"f.req={quote(json.dumps([articles_reqs]))}"
    headers = {"content-type": "application/x-www-form-urlencoded;charset=UTF-8"}
    response = requests.post(
        url="https://news.google.com/_/DotsSplashUi/data/batchexecute",
        headers=headers,
        data=payload,
    )
    response.raise_for_status()
    return [json.loads(res[2])[1] for res in json.loads(response.text.split("\n\n")[1])[:-2]]

# Example usage
encoded_urls = [
    "https://news.google.com/rss/articles/CBMipgFBVV95cUxPWV9fTEI4cjh1RndwanpzNVliMUh6czg2X1RjeEN0YUctUmlZb0FyeV9oT3RWM1JrMGRodGtqTk1zV3pkNEpmdGNxc2lfd0c4LVpGVENvUDFMOEJqc0FCVVExSlRrQmI3TWZ2NUc4dy1EVXF4YnBLaGZ4cTFMQXFFM2JpanhDR3hoRmthUjVjdm1najZsaFh4a3lBbDladDZtVS1FMHFn?oc=5",
    "https://news.google.com/rss/articles/CBMi3AFBVV95cUxOX01TWDZZN2J5LWlmU3hudGZaRDh6a1dxUHMtalBEY1c0TlJSNlpieWxaUkxUU19MVTN3Y1BqaUZael83d1ctNXhaQUtPM0IyMFc4R3VydEtoMmFYMWpMU1Rtc3BjYmY4d3gxZHlMZG5NX0s1RmR2ZXI5YllvdzNSd2xkOFNCUTZTaEp3b0IxZEJZdVFLUDBNMC1wNGgwMGhjRG9HRFpRZU5BMFVIYjZCOWdWcHI1YzdoVHFWYnZSOEFwQ0NubGx3Rzd0SHN6OENKMXZUcHUxazA5WTIw?hl=en-US&gl=US&ceid=US%3Aen",
]
articles_params = [get_decoding_params(urlparse(url).path.split("/")[-1]) for url in encoded_urls]
decoded_urls = decode_urls(articles_params)
print(decoded_urls)

Works great and to lower/get rid of 429 errors, I changed

response = requests.get(f"https://news.google.com/articles/{gn_art_id}")

to

response = requests.get(f"https://news.google.com/rss/articles/{gn_art_id}")

@SSujitX
Copy link

SSujitX commented Sep 10, 2024

PYTHON SOLUTION

I don't know the exact algorithm used by Google to encode/decode the URLs, but I found a way to decode them using reverse engineering by inspecting the requests made by the browser in the redirection chain.

pip install beautifulsoup4 lxml

import json
from urllib.parse import quote, urlparse

import requests
from bs4 import BeautifulSoup


def get_decoding_params(gn_art_id):
    response = requests.get(f"https://news.google.com/articles/{gn_art_id}")
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "lxml")
    div = soup.select_one("c-wiz > div")
    return {
        "signature": div.get("data-n-a-sg"),
        "timestamp": div.get("data-n-a-ts"),
        "gn_art_id": gn_art_id,
    }


def decode_urls(articles):
    articles_reqs = [
        [
            "Fbv4je",
            f'["garturlreq",[["X","X",["X","X"],null,null,1,1,"US:en",null,1,null,null,null,null,null,0,1],"X","X",1,[1,1,1],1,1,null,0,0,null,0],"{art["gn_art_id"]}",{art["timestamp"]},"{art["signature"]}"]',
        ]
        for art in articles
    ]
    payload = f"f.req={quote(json.dumps([articles_reqs]))}"
    headers = {"content-type": "application/x-www-form-urlencoded;charset=UTF-8"}
    response = requests.post(
        url="https://news.google.com/_/DotsSplashUi/data/batchexecute",
        headers=headers,
        data=payload,
    )
    response.raise_for_status()
    return [json.loads(res[2])[1] for res in json.loads(response.text.split("\n\n")[1])[:-2]]

# Example usage
encoded_urls = [
    "https://news.google.com/rss/articles/CBMipgFBVV95cUxPWV9fTEI4cjh1RndwanpzNVliMUh6czg2X1RjeEN0YUctUmlZb0FyeV9oT3RWM1JrMGRodGtqTk1zV3pkNEpmdGNxc2lfd0c4LVpGVENvUDFMOEJqc0FCVVExSlRrQmI3TWZ2NUc4dy1EVXF4YnBLaGZ4cTFMQXFFM2JpanhDR3hoRmthUjVjdm1najZsaFh4a3lBbDladDZtVS1FMHFn?oc=5",
    "https://news.google.com/rss/articles/CBMi3AFBVV95cUxOX01TWDZZN2J5LWlmU3hudGZaRDh6a1dxUHMtalBEY1c0TlJSNlpieWxaUkxUU19MVTN3Y1BqaUZael83d1ctNXhaQUtPM0IyMFc4R3VydEtoMmFYMWpMU1Rtc3BjYmY4d3gxZHlMZG5NX0s1RmR2ZXI5YllvdzNSd2xkOFNCUTZTaEp3b0IxZEJZdVFLUDBNMC1wNGgwMGhjRG9HRFpRZU5BMFVIYjZCOWdWcHI1YzdoVHFWYnZSOEFwQ0NubGx3Rzd0SHN6OENKMXZUcHUxazA5WTIw?hl=en-US&gl=US&ceid=US%3Aen",
]
articles_params = [get_decoding_params(urlparse(url).path.split("/")[-1]) for url in encoded_urls]
decoded_urls = decode_urls(articles_params)
print(decoded_urls)

Works great and to lower/get rid of 429 errors, I changed

response = requests.get(f"https://news.google.com/articles/{gn_art_id}")

to

response = requests.get(f"https://news.google.com/rss/articles/{gn_art_id}")

I wanted to do the same. Can you check how many links we can get before we receive a 429 error?

@sviatoslav-lebediev
Copy link

@vincenzon How many links do you need to process? How many proxies do you have? How many requests per second/minute do you make?

@Noman654
Copy link

Noman654 commented Oct 5, 2024

Yes, the new challenge is to hack the requests limits...

so it’s making multiple requests? seems like google really doesn’t want us to decode urls. what are the chances they block us?

We can use proxy for somewhat hack this there is so many free proxies is available in the market I am using webshare for proxies I implement the solution in python you can try just insert the api key
code in python

@Ronkiro
Copy link

Ronkiro commented Oct 7, 2024

A lot slower, but this is what i used for a while (uses selenium to access the url and get the redirection, ignore the usage of ValueError, it's just laziness lol)

def get_correct_url(url):
  if not url.startswith("https://news.google.com"):
    return url
  # Setup Selenium WebDriver
  chrome_options = Options()
  chrome_options.add_argument("--no-sandbox")  # Disable sandboxing for Docker
  chrome_options.add_argument('--ignore-certificate-errors')
  chrome_options.add_argument("--headless")  # Run in headless mode (no GUI)
  chrome_options.add_argument("--disable-dev-shm-usage")  # Overcome limited resource problems
  chrome_options.add_argument("--disable-gpu")
  chrome_options.add_argument("--disable-features=VizDisplayCompositor")
  chrome_options.add_argument("user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
  driver = webdriver.Chrome(service=Service(), options=chrome_options)
  final_url = url
  try:
      # Open the URL with Selenium
      driver.get(url)
      WebDriverWait(driver, 60).until(lambda driver: not driver.current_url.startswith("https://news.google.com"))
      final_url = driver.current_url
      if("google.com/sorry" in final_url):
        raise ValueError("Caught 429 into google")
      if("news.google" in final_url):
        raise ValueError("Couldn't parse the new url")
  finally:
      # Close the browser
      driver.quit()
  return final_url

@SSujitX
Copy link

SSujitX commented Oct 7, 2024

Enjoy new updates. Thanks for the solution @iamatef

Repo

  • You can install this package using pip:
pip install googlenewsdecoder
  • You can upgrade this package using pip (upgrade to latest version):
pip install googlenewsdecoder  --upgrade
from googlenewsdecoder import new_decoderv1

def main():

    interval_time = 5 # default interval is None, if not specified

    source_urls = ["https://news.google.com/read/CBMilgFBVV95cUxOM0JJaFRwV2dqRDk5dEFpWmF1cC1IVml5WmVtbHZBRXBjZHBfaUsyalRpa1I3a2lKM1ZnZUI4MHhPU2sydi1nX3JrYU0xWjhLaHNfU0N6cEhOYVE2TEptRnRoZGVTU3kzZGJNQzc2aDZqYjJOR0xleTdsemdRVnJGLTVYTEhzWGw4Z19lR3AwR0F1bXlyZ0HSAYwBQVVfeXFMTXlLRDRJUFN5WHg3ZTI0X1F4SjN6bmFIck1IaGxFVVZyOFQxdk1JT3JUbl91SEhsU0NpQzkzRFdHSEtjVGhJNzY4ZTl6eXhESUQ3XzdWVTBGOGgwSmlXaVRmU3BsQlhPVjV4VWxET3FQVzJNbm5CUDlUOHJUTExaME5YbjZCX1NqOU9Ta3U?hl=en-US&gl=US&ceid=US%3Aen","https://news.google.com/read/CBMiiAFBVV95cUxQOXZLdC1hSzFqQVVLWGJVZzlPaDYyNjdWTURScV9BbVp0SWhFNzZpSWZxSzdhc0tKbVlHMU13NmZVOFdidFFkajZPTm9SRnlZMWFRZ01CVHh0dXU0TjNVMUxZNk9Ibk5DV3hrYlRiZ20zYkIzSFhMQVVpcTFPc00xQjhhcGV1aXM00gF_QVVfeXFMTmtFQXMwMlY1el9WY0VRWEh5YkxXbHF0SjFLQVByNk1xS3hpdnBuUDVxOGZCQXl1QVFXaUVpbk5lUGgwRVVVT25tZlVUVWZqQzc4cm5MSVlfYmVlclFTOUFmTHF4eTlfemhTa2JKeG14bmNabENkSmZaeHB4WnZ5dw?hl=en-US&gl=US&ceid=US%3Aen"]

    for url in source_urls:
        try:
            decoded_url = new_decoderv1(url, interval=interval_time)
            if decoded_url.get("status"):
                print("Decoded URL:", decoded_url["decoded_url"])
            else:
                print("Error:", decoded_url["message"])
        except Exception as e:
            print(f"Error occurred: {e}")

    # Output: decoded_url - {'status': True, 'decoded_url': 'https://healthdatamanagement.com/articles/empowering-the-quintuple-aim-embracing-an-essential-architecture/'}


if __name__ == "__main__":
    main()

@maks-outsource
Copy link

Enjoy new updates. Thanks for the solution @iamatef

Wow... thank you.
Do we have the possibility to use it via PHP?
How can we handle such a possibility?

@Ronkiro
Copy link

Ronkiro commented Oct 10, 2024

@maks-outsource You just need to re-code it with same logic... It can work with any languages.

According to LLM

<?php

function get_decoding_params($gn_art_id) {
    $url = "https://news.google.com/articles/$gn_art_id";
    
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $response = curl_exec($ch);
    
    if (curl_errno($ch)) {
        throw new Exception('Curl error: ' . curl_error($ch));
    }
    
    curl_close($ch);
    
    // Load the response into DOMDocument
    $dom = new DOMDocument();
    @$dom->loadHTML($response);
    $xpath = new DOMXPath($dom);
    $div = $xpath->query("//c-wiz/div")->item(0);
    
    return [
        "signature" => $div->getAttribute("data-n-a-sg"),
        "timestamp" => $div->getAttribute("data-n-a-ts"),
        "gn_art_id" => $gn_art_id,
    ];
}

function decode_urls($articles) {
    $articles_reqs = [];

    foreach ($articles as $art) {
        $articles_reqs[] = [
            "Fbv4je",
            json_encode([
                ["garturlreq", [
                    ["X", "X", ["X", "X"], null, null, 1, 1, "US:en", null, 1, null, null, null, null, null, 0, 1],
                    "X", "X", 1, [1, 1, 1], 1, 1, null, 0, 0, null, 0
                ]],
                $art["gn_art_id"],
                $art["timestamp"],
                $art["signature"]
            ])
        ];
    }

    $payload = "f.req=" . urlencode(json_encode([$articles_reqs]));
    $headers = [
        "Content-Type: application/x-www-form-urlencoded;charset=UTF-8",
    ];

    $ch = curl_init("https://news.google.com/_/DotsSplashUi/data/batchexecute");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $payload);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

    $response = curl_exec($ch);
    
    if (curl_errno($ch)) {
        throw new Exception('Curl error: ' . curl_error($ch));
    }
    
    curl_close($ch);
    
    $responseParts = explode("\n\n", $response);
    $decoded = json_decode($responseParts[1], true);
    
    return array_map(function($res) {
        return json_decode($res[2], true)[1];
    }, array_slice($decoded, 0, -2));
}

// Example usage
$encoded_urls = [
    "https://news.google.com/rss/articles/CBMipgFBVV95cUxPWV9fTEI4cjh1RndwanpzNVliMUh6czg2X1RjeEN0YUctUmlZb0FyeV9oT3RWM1JrMGRodGtqTk1zV3pkNEpmdGNxc2lfd0c4LVpGVENvUDFMOEJqc0FCVVExSlRrQmI3TWZ2NUc4dy1EVXF4YnBLaGZ4cTFMQXFFM2JpanhDR3hoRmthUjVjdm1najZsaFh4a3lBbDladDZtVS1FMHFn?oc=5",
    "https://news.google.com/rss/articles/CBMi3AFBVV95cUxOX01TWDZZN2J5LWlmU3hudGZaRDh6a1dxUHMtalBEY1c0TlJSNlpieWxaUkxUU19MVTN3Y1BqaUZael83d1ctNXhaQUtPM0IyMFc4R3VydEtoMmFYMWpMU1Rtc3BjYmY4d3gxZHlMZG5NX0s1RmR2ZXI5YllvdzNSd2xkOFNCUTZTaEp3b0IxZEJZdVFLUDBNMC1wNGgwMGhjRG9HRFpRZU5BMFVIYjZCOWdWcHI1YzdoVHFWYnZSOEFwQ0NubGx3Rzd0SHN6OENKMXZUcHUxazA5WTIw?hl=en-US&gl=US&ceid=US%3Aen",
];

$articles_params = [];
foreach ($encoded_urls as $url) {
    $gn_art_id = basename(parse_url($url, PHP_URL_PATH));
    $articles_params[] = get_decoding_params($gn_art_id);
}

$decoded_urls = decode_urls($articles_params);
print_r($decoded_urls);
?>

@Ronkiro
Copy link

Ronkiro commented Oct 18, 2024

I'm having a problem with order of the urls.

Basically, i get the URL from a previous array of Article objects. then when decoding, i need i to be at the same order of previous Article array. But when returning from /batchexecute it just loses the order. Anyone has a solution for this? For now i may need to not "batch run it" but run one by one...

fyi my current implementation, tips are welcome

def get_decoding_params(url):
    response = requests.get(url)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, "lxml")
    div = soup.select_one("c-wiz > div")
    gn_art_id = urlparse(url).path.split("/")[-1]
    return {
        "signature": div.get("data-n-a-sg"),
        "timestamp": div.get("data-n-a-ts"),
        "gn_art_id": gn_art_id,
    }


def decode_urls(articles):
    articles_reqs = [
        [
            "Fbv4je",
            f'["garturlreq",[["X","X",["X","X"],null,null,1,1,"US:en",null,1,null,null,null,null,null,0,1],"X","X",1,[1,1,1],1,1,null,0,0,null,0],"{art["gn_art_id"]}",{art["timestamp"]},"{art["signature"]}"]',
        ]
        for art in articles
    ]
    payload = f"f.req={quote(json.dumps([articles_reqs]))}"
    headers = {"content-type": "application/x-www-form-urlencoded;charset=UTF-8"}
    response = requests.post(
        url="https://news.google.com/_/DotsSplashUi/data/batchexecute",
        headers=headers,
        data=payload,
    )
    response.raise_for_status()
    return [json.loads(res[2])[1] for res in json.loads(response.text.split("\n\n")[1])[:-2]]


def decode_batch_urls(urls):
    articles_params = [get_decoding_params(url) for url in urls]
    decoded_urls = decode_urls(articles_params)
    return decoded_urls


def decode_single_url(url):
    articles_params = [get_decoding_params(url)]
    decoded_urls = decode_urls(articles_params)
    return decoded_urls[0]

I didn't find any key or id-like that i could use to re-map the ordering, sadly

@yudataguy
Copy link

A lot slower, but this is what i used for a while (uses selenium to access the url and get the redirection, ignore the usage of ValueError, it's just laziness lol)

playwright version

from playwright.sync_api import TimeoutError as PlaywrightTimeoutError
from playwright.sync_api import sync_playwright

def get_correct_url(url: str) -> str:
    """
    Convert Google News URL to its original source URL using Playwright.

    Args:
        url (str): The input URL to process

    Returns:
        str: The resolved URL pointing to the original news source

    Raises:
        ValueError: If URL resolution fails or hits Google's rate limit
        Exception: If browser automation fails
        TimeoutError: If URL resolution timed out after 60 seconds
    """
    if not url.startswith("https://news.google.com"):
        return url

    with sync_playwright() as p:
        # Launch browser with specific configurations
        browser = p.chromium.launch(
            headless=True,
            args=[
                "--disable-gpu",
                "--no-sandbox",
                "--disable-dev-shm-usage",
            ],
        )

        try:
            context = browser.new_context(
                user_agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
            )
            page = context.new_page()

            # Navigate to URL and wait for redirect
            page.goto(url, wait_until="networkidle")
            try:
                page.wait_for_url(
                    lambda url: not url.startswith("https://news.google.com"),
                    timeout=60000,
                )
            except PlaywrightTimeoutError:
                raise ValueError("URL resolution timed out after 60 seconds")

            final_url = page.url
            if "google.com/sorry" in final_url:
                raise ValueError("Rate limited by Google (HTTP 429)")
            if "news.google" in final_url:
                raise ValueError("Failed to resolve original news URL")
            return final_url

        except Exception as e:
            raise Exception(f"Browser automation failed: {str(e)}")
        finally:
            browser.close()

google_news_url = "https://news.google.com/rss/articles/CBMipgFBVV95cUxPWV9fTEI4cjh1RndwanpzNVliMUh6czg2X1RjeEN0YUctUmlZb0FyeV9oT3RWM1JrMGRodGtqTk1zV3pkNEpmdGNxc2lfd0c4LVpGVENvUDFMOEJqc0FCVVExSlRrQmI3TWZ2NUc4dy1EVXF4YnBLaGZ4cTFMQXFFM2JpanhDR3hoRmthUjVjdm1najZsaFh4a3lBbDladDZtVS1FMHFn?oc=5"

print(get_correct_url(google_news_url))

@HopiumCurrency
Copy link

HopiumCurrency commented Feb 17, 2025

sadly this method as of 17/02/2025 no longer works.. specifically decode_urls() part.

eg using php, php test-google-decode.php
Array
(
[signature] => AV_R3eBjl9LmyZGQgthZngaDg8y_
[timestamp] => 1739777504
[gn_art_id] => CBMipgFBVV95cUxPWV9fTEI4cjh1RndwanpzNVliMUh6czg2X1RjeEN0YUctUmlZb0FyeV9oT3RWM1JrMGRodGtqTk1zV3pkNEpmdGNxc2lfd0c4LVpGVENvUDFMOEJqc0FCVVExSlRrQmI3TWZ2NUc4dy1EVXF4YnBLaGZ4cTFMQXFFM2JpanhDR3hoRmthUjVjdm1najZsaFh4a3lBbDladDZtVS1FMHFn
)
Array
(
[signature] => AV_R3eAbCwyvr0UU5JgU_ZZW4E9h
[timestamp] => 1739777504
[gn_art_id] => CBMi3AFBVV95cUxOX01TWDZZN2J5LWlmU3hudGZaRDh6a1dxUHMtalBEY1c0TlJSNlpieWxaUkxUU19MVTN3Y1BqaUZael83d1ctNXhaQUtPM0IyMFc4R3VydEtoMmFYMWpMU1Rtc3BjYmY4d3gxZHlMZG5NX0s1RmR2ZXI5YllvdzNSd2xkOFNCUTZTaEp3b0IxZEJZdVFLUDBNMC1wNGgwMGhjRG9HRFpRZU5BMFVIYjZCOWdWcHI1YzdoVHFWYnZSOEFwQ0NubGx3Rzd0SHN6OENKMXZUcHUxazA5WTIw
)
)]}'

[["wrb.fr","Fbv4je",null,null,null,[3],""],["wrb.fr","Fbv4je",null,null,null,[3],""],["di",16],["af.httprm",16,"5177547088804027398",9]]Array
(
[0] =>
[1] =>
)

array elements are now blank via decode_urls($articles_params,$user_agent); - I added a user agent string to php code version, also handling ssl certificates.

@sviatoslav-lebediev
Copy link

@HopiumCurrency dunno, but my JS implementation is still working.

@Ronkiro
Copy link

Ronkiro commented Feb 22, 2025

It's still working in my JS implementation too, and will probably give me a lot of headaches if it stops working lol

@eriffire
Copy link

I'm having a problem with order of the urls.

Basically, i get the URL from a previous array of Article objects. then when decoding, i need i to be at the same order of previous Article array. But when returning from /batchexecute it just loses the order. Anyone has a solution for this? For now i may need to not "batch run it" but run one by one...

fyi my current implementation, tips are welcome


I didn't find any key or id-like that i could use to re-map the ordering, sadly

did you find any tips? thank you

@Ronkiro
Copy link

Ronkiro commented Mar 25, 2025

I'm having a problem with order of the urls.
Basically, i get the URL from a previous array of Article objects. then when decoding, i need i to be at the same order of previous Article array. But when returning from /batchexecute it just loses the order. Anyone has a solution for this? For now i may need to not "batch run it" but run one by one...
fyi my current implementation, tips are welcome


I didn't find any key or id-like that i could use to re-map the ordering, sadly

did you find any tips? thank you

I literally gave up and used proxies, running one by one rotating proxies and user agents. It was the only way i found to guarantee order

@rommmy
Copy link

rommmy commented Apr 2, 2025

Hey guys,

Has the process been impacted by google's recent update ? https://www.seroundtable.com/google-news-automatically-generated-publications-39149.html

@huksley
Copy link
Author

huksley commented Apr 2, 2025

seems like RSS works still not sure about url decoding https://news.google.com/rss?query=google

@Ronkiro
Copy link

Ronkiro commented Apr 4, 2025

Yes, it's still working

@hakuna-matatta
Copy link

hakuna-matatta commented May 18, 2025

It doesn't seem to work anymore, the response doesn't have the header/footer so it errors out :(
the issue seems to be with body being passed into the request.

Any other solutions?

@hakuna-matatta
Copy link

@Ronkiro is your implementation same as what author posted or is it different?

@Ronkiro
Copy link

Ronkiro commented May 19, 2025

@hakuna-matatta my implementation is a python equivalent, i posted above.
The only difference from now is that i'm using proxy to avoid 429 errors. That is probably the reason you are receiving those errors

https://gist.github.com/huksley/bc3cb046157a99cd9d1517b32f91a99e?permalink_comment_id=5241199#gistcomment-5241199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment