Use Selenium to Scrape YouTube Comments

I was working with a friend to grab comments from YouTube. We’d initially thought of using lynx or w3m, but the comments section always showed up as “Loading…”. Next, we tried using BeautifulSoup, but that didn’t work either, for similar reasons. Finally, we tried using Selenium, because it allows one to interact with the JavaScript on the page.

To get the comments to load, we scrolled down to the end of the page using the code at line 17. The while loop at line 21 retries the page until the id "comment-section-renderer-items" is available, then the comments are collected with the for-loop at the bottom, before closing the web page.

For macOS users, be sure to have run “brew install chromedriver”, as described in this SO post. For completeness, Selenium can be installed as pip install selenium.

#!/usr/local/bin/python2.7

from selenium import webdriver
import sys
import time

# grab the url as the first command line argument
url = sys.argv[1]

# create a Chrome browser
driver = webdriver.Chrome()

# open the url from the command line
driver.get( url )

# scroll to the bottom in order to load the comments
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)

# wait for the comments to load
while True:
    # if comments load, then break out of the while loop
    try:
        driver.find_element_by_id("comment-section-renderer-items")
        break
    # otherwise, sleep for three seconds and try again
    except:
        time.sleep(3)
        continue

# print the comments, separated by a line
for item in driver.find_elements_by_class_name("comment-renderer"):
    print item.text
    print "-"*80

# close the browser
driver.close()