Bypass CAPTCHAs Using Puppeteer and Headless Chrome

by Michael Reddy
How Tos & Tutorials
May 5, 2023
5 min read

As you likely know, CAPTCHAs are those distorted letters and numbers you often encounter on login forms, signup pages, or when posting comments. But do you know their purpose? They exist to prevent bots from accessing features intended for human users.

The issue, however, is that CAPTCHAs have become increasingly complex, making them difficult for humans to decipher, while bots have become more adept at bypassing them. There are even services dedicated to bypassing CAPTCHAs.

In this guide, we'll demonstrate how to bypass CAPTCHAs using Puppeteer and headless Chrome. Let's get started!

About CAPTCHAs

CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart) are designed to protect websites from bots and automated scripts.

Although CAPTCHAs serve an essential security purpose, they can also be a nuisance for developers during testing and automation. This article will demonstrate how to bypass CAPTCHAs using Puppeteer, a Node.js library for controlling headless Chrome, along with headless Chrome itself.

Please note that bypassing CAPTCHAs for malicious purposes is illegal and unethical. The information provided here is for educational purposes only and should not be used for any form of hacking or unauthorized access.

Before we begin, ensure that you have the following installed on your machine:

Node.js (version 10 or higher)
Google Chrome

Next, create a new project directory and initialize it with npm:

mkdir bypass-captcha
cd bypass-captcha
npm init -y

Now, install the required dependencies:

npm install puppeteer

Bypassing CAPTCHAs with Puppeteer and Headless Chrome

To bypass CAPTCHAs, we'll follow these steps:

Configure Puppeteer to launch headless Chrome
Navigate to the target website with a CAPTCHA
Fill out the required fields
Solve the CAPTCHA
Submit the form

Step 1: Configure Puppeteer to launch headless Chrome

First, create a new JavaScript file called index.js in your project directory. In this file, import Puppeteer and configure it to launch headless Chrome:

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({
headless: false,
slowMo: 50,
});

// Rest of the code will go here

await browser.close();
})();

Here, we've set headless to false for demonstration purposes, allowing you to see the browser's actions. The slowMo option introduces a delay to better visualize the automation process.

Step 2: Navigate to the target website with a CAPTCHA

Next, navigate to the website containing the CAPTCHA. For this example, we'll use a mock website:

const page = await browser.newPage();
await page.goto('https://example.com/captcha');

Replace https://example.com/captcha with the actual URL of the website containing the CAPTCHA.

Step 3: Fill out the required fields

Assuming the website contains a form with an input field for an email address, locate the input field using the appropriate CSS selector and type in an email address:

await page.type('#email', '[email protected]');

Replace #email with the actual CSS selector for the email input field and [email protected] with a valid email address.

Step 4: Solve the CAPTCHA

Solving CAPTCHAs can be challenging, as there are many different types. This example demonstrates how to bypass a simple image-based CAPTCHA by leveraging a third-party OCR (Optical Character Recognition) service.

First, install the axios package to make HTTP requests:

npm install axios

Then, import the axios package in your index.js file:

const axios = require(‘axios');

Next, add the following function to extract the CAPTCHA image's source:

async function getCaptchaImageSrc(page) {
const captchaImage = await page

Next, add the following function to extract the CAPTCHA image's source:

async function getCaptchaImageSrc(page) {
const captchaImage = await page.$('#captcha-image');
const captchaImageSrc = await page.evaluate((img) => img.src, captchaImage);
return captchaImageSrc;
}

Replace #captcha-image with the appropriate CSS selector for the CAPTCHA image element on the target website. This function takes a Puppeteer page object as an argument, locates the CAPTCHA image element, and extracts its source URL.

Now, use the extracted CAPTCHA image source to download the image and send it to the OCR service for solving:

Replace https://api.example-ocr.com/recognize with the actual API endpoint of the OCR service you're using, and 'your-api-key' with your API key for that service. This function downloads the CAPTCHA image, converts it to a base64-encoded string, and sends it to the OCR service for recognition. The OCR service then returns the recognized text, which we can use to fill out the CAPTCHA input field.

Step 5: Submit the form

Finally, after filling out the required fields and solving the CAPTCHA, submit the form. Locate the form's submit button using the appropriate CSS selector and click it:

await page.click('#submit-button');

Replace #submit-button with the actual CSS selector for the submit button on the target website.

Putting it all together

Here's the complete code for bypassing CAPTCHAs using Puppeteer and headless Chrome:

const puppeteer = require('puppeteer');
const axios = require('axios');

async function getCaptchaImageSrc(page) {
// ...
}

async function solveCaptcha(imageSrc) {
// ...
}

(async () => {
// ...
const captchaImageSrc = await getCaptchaImageSrc(page);
const captchaText = await solveCaptcha(captchaImageSrc);
await page.type('#captcha-input', captchaText);
await page.click('#submit-button');
// ...
})();

In summary, bypassing CAPTCHAs can be a valuable skill for automating various tasks on the internet. This guide has shown you how to bypass CAPTCHAs using Puppeteer and headless Chrome effectively.

We hope this information has been helpful and sets you on your path to becoming an expert automator. If you found this guide useful, please share your experiences or any issues you encounter in the comments. We'll do our best to respond as quickly as possible

Related

Tags:CAPTCHAs Puppeteer

nv-author-image

Michael

Michael Reddy is a tech enthusiast, entertainment buff, and avid traveler who loves exploring Linux and sharing unique insights with readers.