Sometimes, being a lazy gets in the way of growth. I'm currently scrapping a large archive...something like 12k files.

I could have used beautifulsoup in ...but instead did some super hacky shit using + + .

I'm ashamed and proud....all at the same time.

Well, my gig internet is really paying for itself today. I'm pulling in +500gb of data at some breakneck speeds. This is going to take a week...

Show thread

@Clifford Let me see the damage you've done to these tools


Literally a nasty one liner using wget to download the raw html, sed to delete html on rows with an href, then sed one more time to convert the remainder into a url in which I loop through to download a shit ton of files. It's still running 24 hours later.

I could have forked processes to speed it up... But I don't want to overwhelm their infra. It's internet archive. Slow and steady when's the race.

Sign in to participate in the conversation

Mostly an instance for friends and family. Nothing special.