Saturday, 24 August 2013

Using PhantomJS in webdriver, what are ways to work around and debug lack of "existing" data returned?

Using PhantomJS in webdriver, what are ways to work around and debug lack
of "existing" data returned?

I have left the website off, as not to give anyone ideas on scraping, but
I can return a full page using Firefox webdriver, but not PhantomJS. The
html "exists" in a standard browser, but not while using the headless. My
script is in Python. Example...
from selenium import webdriver
driver = webdriver.PhantomJS()
spec = "xproduct123"
base_url = "https://www.xxx.com/products/%s" % spec
driver.get(base_url)
selector = driver.find_element_by_tag_name("html").text
print selector
driver.close()
returns Process finished with exit code 0
If I change the webdriver to driver = webdriver.Firefox() I will get a
full html page (at least the text content) in my terminal. This does not
happen on all sites, so I have to figure a workaround without sharing the
site I am scraping if possible.

No comments:

Post a Comment