Last active
May 9, 2023 16:41
-
-
Save vheidari/bfbc68bfea64d186513f14eaefdc1306 to your computer and use it in GitHub Desktop.
Revisions
-
vheidari revised this gist
Apr 10, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -181,7 +181,7 @@ import * as fs from 'node:fs/promises'; const page = await browser.newPage(); // go to this address await page.goto('https://www.amazon.com/Desktop-Processor-12-Thread-Unlocked-Motherboard/dp/B0972FHS7J'); // set viewport size, width and height await page.setViewport({width: 1980, height: 1080}); -
vheidari revised this gist
Apr 9, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ # Puppeteer In this article I wanna introduce `Puppeteer` as a tools that help us to do something cool like `Web Scraping` or `Automation` some task. `Puppeteer` helps developer up and run a google chromium browser throught command line tools this google chromium is headless browser that acting like real world browser. `Puppeteer` Api helps developer to do anyting that a user could do with it's browser. for example : - we could open and new page or new tab - we could select any element from DOM with it's api -
vheidari revised this gist
Apr 9, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -163,7 +163,7 @@ import puppeteer from 'puppeteer'; ``` ## `Puppeteer` Example 04 -> how we could download an images with puppeteer In this example we will attempt to extract first product image from the amazon.com website then save it on the hard disk. ```javascript -
vheidari created this gist
Apr 9, 2023 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,227 @@ # Puppeteer In this article I wanna introduce `Puppeteer` as a tools that help us to do something cool like `Web Scraping` or `Automation` some task. `Puppeteer` helps developer up and run a google chromium browser throught command line tools this google chromium is headless browser that that actining like real world browser. `Puppeteer` Api helps developer to do anyting that a user could do with it's browser. for example : - we could open and new page or new tab - we could select any element from DOM with it's api - we could typing and selection input element and manipulate them value - we could select a button and click on it - we could create a pdf from current page that - we could create a screenshot from current page - ... There is a official explanation about `Puppeteer` : > "Puppeteer" is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full (non-headless) Chrome/Chromium. ## Install `Puppeteer` To install this tools we should follow below instructions : First we create a `package.json` file through this command ```bash npm init ``` Then use this below command to install `Puppeteer` ``` npm install puppeteer --save ``` when `npm` did install all dependencies. open the `package.json` and add `"type": "module"` inside it as a key/value. ```json { "name": "project-name", "version": "1.0.0", "description": "", "type": "module", "main": "app.js", "scripts": { "run": "node app.js" }, "author": "", "license": "ISC", "dependencies": { "puppeteer": "^19.8.3" } } ``` ok, when we did all above tasks we are ready to implement our own first example. ## `Puppeteer` Example 01 -> Take a screeenshot In this example we will learn how we could take a screenshot from a web page or website then save it on the hard disk. ```javascript // import puppeteer package import puppeteer from 'puppeteer'; ( async() => { // launch a browser const browser = await puppeteer.launch(); // creat a new page const page = await browser.newPage(); // go to this address https://developer.mozilla.org/en-US/ await page.goto('https://developer.mozilla.org/en-US/'); // set viewport size, width and height await page.setViewport({width: 1980, height: 1080}); // take a screenshot await page.screenshot({path: 'mozillla-dev-center.png', fullPage: true}); // close browser await browser.close(); } )(); ``` ## `Puppeteer` Example 02 -> how to read bitcoin price from cmc In this example we decide to read bitcoin price from `CoinMarketCap` website. we will learn how to select a element and how extract data from it. ```javascript // import puppeteer package import puppeteer from 'puppeteer'; ( async () => { // launch a browser const browser = await puppeteer.launch(); // creat a new page const page = await browser.newPage(); // go to this address https://coinmarketcap.com/currencies/bitcoin/ await page.goto('https://coinmarketcap.com/currencies/bitcoin/'); // set viewport size, width and height await page.setViewport({width: 1980, height: 1080}); // select price element and store withing bitcoinElement const bitcoinElement = await page.waitForSelector('.priceValue>span'); // extract price from bitcoinElement with evaluate method const bitcoinPrice = await bitcoinElement.evaluate( el => el.textContent ); // print bitcoin price console.log("bitcoin price on the cmc : " + bitcoinPrice); // close browser await browser.close(); } )(); ``` ## `Puppeteer` Example 03 -> how to select an input form and type inside it Example 03 show us how we could interact with a html form and type anything inside it. ```javascript // import puppeteer package import puppeteer from 'puppeteer'; ( async () => { // launch a browser const browser = await puppeteer.launch(); // creat a new page const page = await browser.newPage(); // go to this address https://github.com/ await page.goto('https://github.com/'); // set viewport size, width and height await page.setViewport({width: 1980, height: 1080}); // select search input form with waitForSelector through input[name="q"] const searchBox = await page.waitForSelector('input[name="q"]'); // typing puppeteer inside input element with type method await searchBox.type('puppeteer'); // creating a screenshot from webpage that show us everything is ok await page.screenshot({path: 'github-searchbox.png', fullPage: true}); await browser.close(); } )(); ``` ## `Puppeteer` Example 04 -> how we could download list of images with puppeteer In this example we will attempt to extract first product image from the amazon.com website then save it on the hard disk. ```javascript // import puppeteer package import puppeteer from 'puppeteer'; import * as fs from 'node:fs/promises'; ( async () => { // launch a browser const browser = await puppeteer.launch(); // creat a new page const page = await browser.newPage(); // go to this address await page.goto('https://www.amazon.com/Desktop-Processor-12-Thread-Unlocked-Motherboard/dp/B0972FHS7J/ref=sr_1_14?keywords=amd%2Bmotherboard&qid=1680889742&sr=8-14&th=1'); // set viewport size, width and height await page.setViewport({width: 1980, height: 1080}); // timeout to page completly loaded await page.waitForTimeout(10000); await page.screenshot({path:'amazon.png', fullPage:true}); // select image throught its id const getLandingImage = await page.waitForSelector('#landingImage'); // extract url inside browser through evaluate methond and pass it to landingImageUrl (nodejs enviourment) const landingImageUrl = await getLandingImage.evaluate( x => x.src); // go to image url const imagePage = await page.goto(landingImageUrl); // writing image on the hard disk through fs api, and puppeteer buffer method await fs.writeFile(landingImageUrl.split("/").pop(), await imagePage.buffer()); // log image url to terminal console.log(landingImageUrl); // creating a screenshot from webpage that show us everything is ok await page.screenshot({path: 'github-searchbox.png', fullPage: true}); await browser.close(); } )(); ``` ## Good Examples and Resources https://github.com/puppeteer/puppeteer/tree/main/examples