Puppeteer Tutorial– An Excellent Learning Guide of Puppeteer Tutorial 1 & 2

The Puppeteer is an open-sourced java framework which is developed with node-js library. The Puppeteer has the ability to work as a web scraping tool. It can be also used as a test automation for web based applications just like selenium web driver. The popularity of Puppeteer is getting increased rapidly for test automation. The pre-requisites to walk through the Puppeteer Tutorial, are basic knowledge of command line, JavaScript, OOPs concept and HTML DOM structure. The complete Puppeteer tutorial is distributed into topics which are mentioned in the below table of content. 

Puppeteer Tutorial

Tosca Tutorial #1: Puppeteer Overview

Tosca Tutorial #2: Puppeteer Environment Variables

Tosca Tutorial #3: Puppeteer Web Scraping and Puppeteer Test Automation Overview

Tosca Tutorial #4: Install Puppeteer 

In this article of Puppeteer Tutorial, we will explain about Puppeteer Overview and Puppeteer Environment Variables. 

Puppeteer Overview

The Puppeteer is an open-sourced java framework which is developed with node-js library. Puppeteer is able to control Chrome browser over the devtool protocol with the help of the high-level application interface(API). The Puppeteer is able to control both headful and headless chrome browsers. 

The Puppeteer framework is introduced by Google. As per the functionality, it’s not a new concept. But it make the work easier. Fundamentally, it summaries a list of activities through a compacted package.

Puppeteer Tutorial - Puppeteer
Puppeteer Tutorial – Puppeteer

How do Puppeteers work?

  • Puppeteer uses the Node JS library.
  • The Node JS allows to use the high-level APIs.
  • The APIs are capable of controlling the Chrome browser over devtool protocol.
  • By default, Puppeteer works with headless Chrome browsers but it can interact with headful Chrome browsers as well by changing the default configuration.

Chrome DevTools Protocol:

Using the Chrome DevTools Protocol, tools like Puppeteer are able to instrument, inspect, debug and profile the blink-based browsers such as Chromium, Chrome, etc.

Here, the instrumentation of the browser is divided into a number of domains such as DOM, Debugger, Network, etc. The every domain explains all the different supported commands and the generated events.

Features of Puppeteer:

  • The manual processes through the Chrome browser can be automated.
  • It can captures screenshot of any web page and generates the image or pdf file of the screenshot.
  • A single page application can be developed of server side rendering using the Puppeteer.
  • It can automate the web form submission, UI testing, keyboard input, etc., with checkpoints.
  • It provides more control over the Chrome browser.
  • The default headless mode is very fast.
  • It supports web scraping.
  • Ability to measures rendering and load timing using Chrome performance analysis tools.

Puppeteer vs Puppeteer-core:

Since Puppeteer version v1.7.0, below two packages, are available in every release –

  • puppeteer-core package
  • puppeteer package

Puppeteer-core Package:

Puppeteer-core is a java-base Node library that is able to perform any operation that supports the DevTools protocol. The Puppeteer-core doesn’t download Chromium during the installation. As a library, Puppeteer-core is completely driven through its programmatic interface. Also, the features of Puppeteer-core can’t be customized by all PUPPETEER_* env variables. The basic command to install Puppeteer-core – 

npm install puppeteer-core
# or "yarn add puppeteer-core"

When using puppeteer-core, include statements will be looks like below –

const puppeteer = require('puppeteer-core')

When to use Puppeteer-Core:

  • To develop Puppeteer project to use existing Chrome browser over the DevTools protocol where additional chromium download is not required.
  • To develop another end-user product or library on top of DevTools protocol. For example, one project may build a screenshot generator using puppeteer-core and write a custom setup.js script that downloads headless_shell instead of Chromium to save storage.

Puppeteer Package:

Puppeteer is a complete product for Chrome or Chromium browser automation. During the installation, it downloads the latest version of Chromium, and after that, it was driven by puppeteer-core. As an end-user product, Puppeteer supports all the PUPPETEER_* env variables to customize its behavior. The basic command to install Puppeteer – 

npm install puppeteer
# or "yarn add puppeteer"

When using Puppeteer, include statements that will be looks like below –

puppeteer = require(‘puppeteer’)

Difference between Puppeteer and Puppeteer-core:

  • Puppeteer-core doesn’t download the Chromium browser automatically during installation.
  • Puppeteer-core does not consider all PUPPETEER_* env variables.
  • In most projects, we are using the Puppeteer product package.

Headless Chrome:

Headless chrome means the Puppeteer is interacting with a chrome browser as a background application, which means that the chrome UI is not visible on the screen. By default, Puppeteer launches the application as headless chrome. Code sample to launch Headless Chrome – 

In this example, we are opening the headless chrome, i.e., the Chrome UI will not be visible. It can be done by passing the headless flag as true to the Puppeteer.launch method().

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  // Specify statements for Headless Chrome operations  
  await browser.close();
})();

Headful Chrome:

Headful chrome means the Puppeteer is interacting with a chrome browser for which chrome UI is visible on the screen. By default, Puppeteer launches the application as headless chrome. Code sample to launch Headful Chrome – 

In this example, we are opening the chrome, which is visible to us. It can be done by passing the headless flag as false to the Puppeteer.launch() method.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: false});
  // Specify statements for Headless Chrome operations  
  await browser.close();
})();

Puppeteer Environment Variables

Puppeteer works with predefined environment variables to support its operations. If Puppeteer doesn’t find the environment variables during the installation, a lowercased variant of these variables will be used from the npm config (manages the NPM Configurations file). The environment variables are not considered by the Puppeteer-core package. The most important Puppeteer environment variables are – 

  • PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: It instructs not to download bundled Chromium during the installation step.
  • PUPPETEER_DOWNLOAD_HOST: It overwrites URL prefix that can be used to download Chromium. 
  • PUPPETEER_DOWNLOAD_PATH: It overwrites the download folder path. Defaults path is – “<root>/.local-chromium/” where <root> is the package root of puppeteer.
  • HTTP_PROXY, HTTPS_PROXY, NO_PROXY: These variables define the proxy settings to download Chromium during the installation.
  • PUPPETEER_CHROMIUM_REVISION: It defines a specific version of Chromium to be used by the Puppeteer.
  • PUPPETEER_EXECUTABLE_PATH: It specifies an executable path to be used in Puppeteer.launch method. 
  • PUPPETEER_PRODUCT: It defines which browser is to be used by Puppeteer. The value has to be either chrome or firefox. 

Conclusion:

In this introductory article on Puppeteer Tutorial, we have learned about Puppeteer overview and Puppeteer Environment Variables. In the next article of the Puppeteer tutorial, we will learn about the Puppeteer Web Scraping and Puppeteer Test Automation overview. Please click here to visit the reference portal for this Puppeteer Tutorial. Also, please click here to learn Selenium from LambdaGeeks.

Leave a Comment