Many websites have more than just simple static content. Dynamic content which is rendered by JavaScript requires browser to be able to scrape data. This video demonstrates how to use Nightmare (which is a wrapper around Electron) to launch a url and scrape dynamic data.
Nice tutorial. Unfortunately I tried to use Nightmare to crawl an AJAX site and it didn't work.
var Nightmare = require('nightmare'); new Nightmare() .goto('https://l3com.taleo.net/careersection/l3_ext_us/jobsearch.ftl') .evaluate(function () { var links = document.querySelectorAll('th a'); return links }, function (links) { console.log(links); }) .run();
looks Nightmare doesn't support https... or how to configurate it for https?
Thanks!
When googling for answers, try looking for "PhantomJS https" (because Nightmare is just a wrapper around PhantomJS).
So add the follow config when you run your script: "--ssl-protocol=any" And I tossed together this as a bonus (I'll be talking more about "cheerio" in future videos):
import Nightmare from "nightmare";
import cheerio from "cheerio";
new Nightmare()
.goto('https://l3com.taleo.net/careersection/l3_ext_us/jobsearch.ftl')
.evaluate(function(){
return document.documentElement.innerHTML; //pass all of the html as text
}, function(html){
let $ = cheerio.load(html); //use cheerio for jqeury in node
let titles = $('#jobs .absolute>span>a').map(function(){
return $(this).text();
}).get();
console.log(titles); //log out the array of job titles
})
.run();
How does it compare to CasperJS? I've spent a lot of time with Casper, and I'm curious if someone out there is familiar enough with both APIs to have an opinion on the two.
Thanks for the video. However, I could not get the current script to work. Looks like Nightmare has changed their syntax. Borrowing from the their posted example, I have come up with this and it worked for me after installing 'vo'.
var Nightmare = require('nightmare');
var vo = require('vo');
vo(function* () {
var nightmare = Nightmare({ show: true });
var link = yield nightmare
.goto('http://weather.com')
.evaluate(function () {
return document.querySelector('.temperature').innerText;
});
yield nightmare.end();
return link;
})(function (err, result) {
if (err) return console.log(err);
console.log(result);
});
Thanks Baskin, your code worked for me. These video should be updated with notes more clearly. I don't understand why vo is being used. Could you explain?
Hi, I need to send custom header in goto. How to achieve this ?