Follow

Sometimes, being a lazy gets in the way of growth. I'm currently scrapping a large archive...something like 12k files.

I could have used beautifulsoup in ...but instead did some super hacky shit using + + .

I'm ashamed and proud....all at the same time.

Well, my gig internet is really paying for itself today. I'm pulling in +500gb of data at some breakneck speeds. This is going to take a week...

Show thread

@Clifford Let me see the damage you've done to these tools

@bitmvr

Literally a nasty one liner using wget to download the raw html, sed to delete html on rows with an href, then sed one more time to convert the remainder into a url in which I loop through to download a shit ton of files. It's still running 24 hours later.

I could have forked processes to speed it up... But I don't want to overwhelm their infra. It's internet archive. Slow and steady when's the race.

Sign in to participate in the conversation
Mastodon

Mostly an instance for friends and family. Nothing special.