Profile_bird

Hey there! saurabhnanda is using Twitter.

Twitter is a free service that lets you keep in touch with people through the exchange of quick, frequent answers to one simple question: What are you doing? Join today to start receiving saurabhnanda's updates.

Already using Twitter
from your phone? Click here.

saurabhnanda

  1. @dataspora I'm curious about Cloudbase v/s Hive too. Will keep you posted.
  2. And the non-existant storyline award for 2009 goes to the transformers-revenge of the idiots!
  3. Plus There's rakhi ka swayamvar. Perfect entertainment for people with a negative IQ!
  4. All music channels play the same ten songs. No decent movie on. What's the use of 130+ channels.
  5. Crazy 3yr cleartrip bash. Remember seeing the founders do an egyptian dance on the stage. #fb
  6. 1.5 million records. 10min for generating a report aggregated on a single GROUP BY column #cloudbase on 4-node #hadoop cluster
  7. 49million records in raw table. 55min for filtering, cleaning, and inserting into new table with #cloudbase on 4-node #hadoop cluster.
  8. 5,000,000+ rows in #cloudbase table. COUNT DISTINCT with HAVING clause took 4m 05s to return. 2-node #hadoop cluster.
  9. 5,000,000+ rows in #cloudbase table. A simple COUNT took 3m 56s to return. 2-node #hadoop cluster.
  10. @hadoop where is Hadoop picking up 'master-hadoop.local' and 'master-hadoop' from? I've only specified external IP address in the conf.
  11. @hadoop Wrong FS: hdfs://master-hadoop.local/user/ct-admin/cloudbase/index/IP_ADDRESS_INDEX/index/metadata/76,expected: hdfs://master-hadoop
  12. Inserting 1.9GB access log into a #cloudbase table. 2 node #hadoop cluster. 3m 18sec
  13. Inserting 179MB access log into a #cloudbase table. 2 node #hadoop cluster. 19sec.
  14. @chaitanya_gupta yups. one weblog entry = one Apache access log line.
  15. Query to COUNT total number of visitors with 2182 weblog entries on #Pig + #Hadoop. 40m and still waiting!
  16. PARALLEL=16 return in 1min 15sec. I guess I'll stick with PARALLEL=8 with the other weblog processing tests #hadoop
  17. PARALLEL=8 return in 35sec! #hadoop
  18. Wow! What an increase in speed. Returned in 1min 17sec! #hadoop
  19. Finally in 13 min 19 sec. Let me try with some more parallelization (PARALLEL 4) #hadoop
  20. GROUP BY on Pig+Hadoop on 2182 tuples. 12 mins and still waiting for result! (2-node cluster, 1GB RAM desktop machines) #hadoop