Back

Technologies:

javascriptjavascript
avatar
Tolerim
25 days ago

Is there a more effective approach to debugging my Puppeteer crawler in order to identify the styles/markup that caused the script to fail?

As a senior JavaScript developer, I am tasked with finding specific ways to expedite my debugging process for a frequently-breaking Puppeteer crawler. The crawler code, shown below, has previously produced successful outcomes. Recently, I have been receiving error messages, possibly due to a website style change. How can I efficiently pinpoint the troublesome style to prevent having to manually search through each style?

const puppeteer = require("puppeteer");
const delay = require("./delay.js")
const fs = require("fs");

//url of some event in the future
const url = ["https://www.oddsportal.com/baseball/usa/mlb/oakland-athletics-seattle-mariners-bFxLtxG9/"]

const res = [];

puppeteer.launch({ headless: false, sloMo:1000 }).then(async (browser) => {
    var page = await browser.newPage();
    await page.goto("https://www.oddsportal.com/");
    await delay(1000);

    console.log("Getting data..");
    for (i = 0; i < url.length; i++) {
        try {
            await page.goto(url[i]);
           
            await page.waitForSelector('[class="flex flex-col items-center justify-center gap-1 border-r border-[#E0E0E0] min-w-[60px] max-sm:min-w-[55px] max-sm:max-w-[55px]"]')
            
            await delay(500);
            var data = await page.evaluate(async () => {
                var t = document.querySelectorAll('[class="flex flex-col items-center justify-center gap-1 border-r border-[#E0E0E0] min-w-[60px] max-sm:min-w-[55px] max-sm:max-w-[55px]"]')
                var average = {
                    avgoddslt: t[0].innerText,
                    avgoddsrt: t[1].innerText
                }
                
                t = document.querySelectorAll('[class="absolute w-full text-center cursor-pointer height-content"]');
                var UserPredictions = {
                    pxtx_lt: t[0].innerText,
                    pxtx_rt: t[1].innerText
                }
                var bettingExchange = [];
                t = document.querySelectorAll('[class="height-content min-mt:!hidden text-black-main font-bold text-xs leading-[18px]"]')
                for (i = 0; i < t.length; i++) {
                    t[i].click();
                    await delay(500);
                    if (i % 2 == 0) {
                        var exch_name = document.querySelectorAll('[class="w-[75px] bg-cover bg-no-repeat"]')[0]?.alt;
                        var odds = document.querySelectorAll('[class="flex flex-col gap-1 text-xs"]')[1]?.innerText.split("\n");
                        var volume = document.querySelectorAll('[class="flex flex-col gap-1 text-xs"]')[2]?.innerText.split("\n");
                        var opOdds = document.querySelectorAll('[class="flex gap-1"]')[1]?.innerText.split("\n");
                        bettingExchange.push({
                            exch_name,
                            left_OddsMovement: {
                                odds,
                                volume
                            },
                            OpeningOdds: opOdds
                        })
                    } else {
                        var exch_name = document.querySelectorAll('[class="w-[75px] bg-cover bg-no-repeat"]')[0]?.alt;
                        var odds = document.querySelectorAll('[class="flex flex-col gap-1 text-xs"]')[1]?.innerText.split("\n");
                        var volume = document.querySelectorAll('[class="flex flex-col gap-1 text-xs"]')[2]?.innerText.split("\n");
                        var opOdds = document.querySelectorAll('[class="flex gap-1"]')[1]?.innerText.split("\n");
                        bettingExchange.push({
                            exch_name,
                            right_OddsMovement: {
                                odds,
                                volume
                            },
                            OpeningOdds: opOdds
                        })
                    }
                }
                return ({
                    average,
                    UserPredictions,
                    bettingExchange
                })
            })
            data['url'] = page.url();

            console.log(url[i] + " =>[Success]");
            res.push(data);
        } catch {
            console.error(url[i] + " =>[ERROR]");
        }
    }
    await browser.close();
While I am new to Puppeteer, having more experience with Python's BeautifulSoup, I would appreciate suggestions on how to resolve this problem. Thank you in advance.

Answers(2)
avatar
Vogi
24 days ago
Verified Answer
As a senior JavaScript developer, you can use the following methods to expedite your debugging process for a frequently-breaking Puppeteer crawler: 1. Turn off headless mode: Sometimes it's useful to see what the browser is displaying. You can turn off headless mode by setting the headless option to false when launching Puppeteer. 2. Slow it down: The slowMo option slows down Puppeteer operations by the specified amount of milliseconds. This can help you see what's happening in slow motion and identify issues. 3. Capture console output: You can listen for the console event using page.on('console', msg => console.log('PAGE LOG:', msg.text()));. This will log all console output from the page to your Node.js console. 4. Use debugger in application code browser: You can set devtools to true when launching Puppeteer and add debugger inside any client code you want debugged. For example, await page.evaluate(() => { debugger; });. 5. Use debugger in node.js: You can use ndb (npm install -g ndb) or npx (npx ndb) before your test command. For example: ndb jest or ndb mocha (or npx ndb jest / npx ndb mocha). Debug your test inside chromium like a boss!6. Enable verbose logging: You can enable verbose logging by setting the args option when launching Puppeteer. I hope this helps you resolve your problem. Let me know if you have any other questions.

Sources

Debugging Puppeteer - Chrome DevelopersDebugging | Puppeteerjavascript - Debug puppeteer - Stack OverflowHow to start with Puppeteer Debugging | BrowserStack
avatar
Tolerim
25 days ago
One way to debug this is to take advantage of the debugger statement already included in the code. You can set breakpoints in your code using the debugger statement and run the code in debug mode to step through the code, inspect variables, and see which line of code is causing the issue. Another useful tool to use is the waitForSelector function, which can be used to wait for a specific element to appear on the page before executing the next line of code. If the selector is not found within a set timeframe, the function will throw an error, which can help you identify which selector is causing the issue. You can also take screenshots of the pages at different stages of the crawling process to visually inspect any changes that may have occurred. This can be done using the page.screenshot function in Puppeteer. Another approach is to use a visual regression testing tool like BackstopJS, which can compare screenshots of web pages before and after a code change to identify any visual changes. This can help pinpoint any styling or layout issues that may be causing the script to fail.
;