Profile_bird

Hey there! tatemura is using Twitter.

Twitter is a free service that lets you keep in touch with people through the exchange of quick, frequent answers to one simple question: What are you doing? Join today to start receiving tatemura's updates.

Already using Twitter
from your phone? Click here.

tatemura

  1. @divyagrawal You might also be interested in a variety of open source implementations: http://bit.ly/qNDOA
  2. @divyagrawal Nevertheless such comparison can be a good start point. We should start thinking what we can give from the 40-year experience.
  3. @divyagrawal saying RDBMS outperforms MR is like saying a C program outperforms a C shell script... an apple and orange...
  4. @divyagrawal analysis is often ad-hoc and the users don't want to carefully design a separate process like ETL.
  5. @divyagrawal see how Pig (on top of MapReduce) is used, for instance. It can handle plain files and does data extraction at exec time.
  6. @divyagrawal ... and that (2) the user is ready (or skillful) to specify a query and interpret the results.
  7. @divyagrawal I agree that DB people's tendency to focus only performance, assuming that (1) the data is ready for DBMS to process,
  8. just arrived at cloudy Providence, RI to attend SIGMOD, missing the blue sky home.
  9. a talk on the recent change of MapReduce API (from mapred package to mapreduce package).
  10. Pig 0.3.0 is going to support multiple STOREs and GROUP-BYs in one MR job (a kind of multi-query optimization)
  11. I guess it depends on how much Hive is going to leverage the declarative aspect of SQL (e.g. optimization, data independence).
  12. Some people like SQL and some people don't :-)
  13. A question from the floor to Hive: Why not PIG?
  14. Both Hive and HBase had to struggle with Java overhead to improve performance.
  15. the columnar storage has been introduced to Hive, recently, showing better compression.
  16. Hive is also getting matured by adopting (traditional) query optimization.
  17. upcoming HBase 0.20 will show improvement on random-access latency
  18. The room for Track 1 is expanded, which is a good example of elasticity :-) #hadoopsummit09
  19. ... (3) Proof-of-Concept and ad-hoc work (10%), (4) development, testing, and QA (10%).
  20. 4 tiers of Hadoop deployment in Y!: (1) production systems (20%), (2) science and research (60%), ...