I was working with a friend to grab comments from YouTube. We’d initially thought of using lynx or w3m, but the comments section always showed up as “Loading…”. Next, we tried using BeautifulSoup, but that didn’t work either, for similar reasons. Finally, we tried using Selenium, because it allows one to interact with the JavaScript on the page.
To get the comments to load, we scrolled down to the end of the page using the code at line 17. The while loop at line 21 retries the page until the id "comment-section-renderer-items"
is available, then the comments are collected with the for-loop at the bottom, before closing the web page.
For macOS users, be sure to have run “brew install chromedriver”, as described in this SO post. For completeness, Selenium can be installed as pip install selenium
.
#!/usr/local/bin/python2.7 from selenium import webdriver import sys import time # grab the url as the first command line argument url = sys.argv[1] # create a Chrome browser driver = webdriver.Chrome() # open the url from the command line driver.get( url ) # scroll to the bottom in order to load the comments driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(3) # wait for the comments to load while True: # if comments load, then break out of the while loop try: driver.find_element_by_id("comment-section-renderer-items") break # otherwise, sleep for three seconds and try again except: time.sleep(3) continue # print the comments, separated by a line for item in driver.find_elements_by_class_name("comment-renderer"): print item.text print "-"*80 # close the browser driver.close()