*Really* big scraping project queued up and now I wait what maybe be a few days to see if it works. 
-
Show this thread
-
I fought the urge to drop it into Celery last night, but depending on how long my son naps today, that might exactly be what I do. I am grabbing about a million JSON files via Python requests which is slow and steady.
1 reply 0 retweets 1 likeShow this thread -
Over 110K JSON files pulled so far. I probably will switch to Django and Celery because that will make working with the data go quicker for me.
3 replies 0 retweets 2 likesShow this thread -
Over 160K files now.
2 replies 0 retweets 3 likesShow this thread -
Broke 670K and noticed a huge uptick after porting over to Celery yesterday.
1 reply 0 retweets 1 likeShow this thread -
It has been a few days, and I have 427,473 pages to parse and new urls to try and scrape. I feel like I should be documenting this or writing a short book on the subject but time friends... Time...
2 replies 0 retweets 2 likesShow this thread -
After two+ weeks of using Python + dataset, I used Django's "inspectdb" and I now have a working admin. I created some high-level database models, and I am starting to process pages using BeautifulSoup4 to see what the data looks like. So far so good.
2 replies 0 retweets 2 likesShow this thread -
Now I am heavily leaning on Postgres + Django's JSONField until I figure out which fields should be more concrete. URL + data(JSONField) to start and then I add fields as they become apparent. Again, tons of data upfront and the fields aren't as obvious up front. So why stress?
2 replies 0 retweets 2 likesShow this thread
Yeah, @djangoproject is awesome. My current favorite feature is LiveServerTestCase. It can start your webserver and create a test database for an integration test, and tear it all down at the end of the test. Huge time-saver. #python
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.