Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
418 views
in Technique[技术] by (71.8m points)

In Excel VBA, what is the way to check if the web page is fully loaded?

To pause code until a web page is fully loaded, I've been using the method below with great success almost all of the time.

Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

But occasionally, I see text content load after the method determines that the page is fully loaded, and so that content is not extracted.

However, if I step through the code via F8, the content is extracted every time. This is done about as fast as I can press the F8 key repeatedly.

So how can I check to ensure the page, and all its content, is fully loaded before the code continues to extract data?

In both cases, IE is running invisibly. However, I've tried this with IE visible and there is actually content in this specific location on the pages I'm working with.

This is being done in Excel 2016, using VBA script. The specific content request is written like:

 'get item name from page and write it to the first cell on the first empty row available
 Set itemName = objIE.document.querySelector(".the-item-name")
 Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).Value = itemName.innerText

I've read through Excel VBA: Wait for JavaScript execution in Internet Explorer because I think that maybe the values are getting added after the document is loaded, in an effort to prevent anyone from scraping data. However, I can't seem to identify any script that may be doing that. Doesn't mean it isn't there. I just can't see it yet.

A specific example of the page with this issue is URL

https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html

Initially product-total-price div element contains dash (-), prior to the price being loaded, so that's what the request will return: - / each instead of $11.29 / each.

I have a workaround, but it's not as efficient or as concise as I'd like it to be. I test the string returned for the presence of the dash. If it's there, loop and check it again, else capture it and insert it into the worksheet.

setPriceUM:
    Set hdPriceUM = objIE.document.querySelector(".product-total-price").innerTe????xt
    hdPriceUMString = hdPriceUM.innerText
    stringTest = InStr(hdPriceUMString, "-")
    If stringTest = True Then
        GoTo setPriceUM
    Else
        Debug.Print hdPriceUMString
    End If

Thank you for taking the time to read this and consider it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Functionality of webpages is very different, so there is no solution that will fit to all of them.

Regarding your example, your workaround is a working solution, the code might be like:

Sub TestIE()

    Dim q

    With CreateObject("InternetExplorer.Application")
        .Visible = True
        .Navigate "https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html"
        ' Wait IE
        Do While .readyState < 4 Or .Busy
            DoEvents
        Loop
        ' Wait document
        Do While .document.readyState <> "complete"
            DoEvents
        Loop
        ' Wait element
        Do
            q = .document.querySelector(".product-total-price").innerText
            If Left(q, 1) <> "-" Then Exit Do
            DoEvents
        Loop
        .Quit
    End With
    Debug.Print q

End Sub

Anyway, you need to look into the webpage loading process, XHRs and DOM modifications, using browser developer tools (F12). Going that way, you may find that one of the numerous XHRs returns the price in JSON format. It's logged on network tab of browser developer tools right before the price appearing while the page is loading. That XHR is made by one of the loaded JS, notably after the page loaded event. Try this URL (I just copied it from network tab):

https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en

So you may just reproduce that XHR and extract the price by splitting:

Sub TestXHR()

    Dim q

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en", False
        .Send
        q = .ResponseText
    End With
    q = Replace(q, " : ", ":")
    q = Split(q, """displayPrice""", 2)(1)
    q = Split(q, """formattedValue"":""", 2)(1)
    q = Split(q, """", 2)(0)
    Debug.Print q

End Sub

But again, there is no common case.

You may also use JSON parser, take a look at some examples.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...