Skip to main content

How to get data computing faster for java developers

Posted by Raqsoft on January 24, 2014 at 12:05 AM PST

The columnar storage is good, especially when there are lots of tabular fields (this is quite common). In querying, the data to traverse is far less than that on the row storage. Less data to traverse brings less I/O workloads and higher query speed. However, the Hadoop application consumes most time on the hard disk I/O without columnar storage.
Both Hive and Impala support columnar storage, but columnar storage only available with the basic SQL interface. As for some more complex data computing, it seems quite difficult for the MapReduce framework to do columnar storage.
The sample data is a big data file sales.txt on the HDFS. This file has twenty fields, 100 million data entries, and the file size is 14G. Let

Related Topics >>

Comments

How does this SPAM relate to Java Please stop spamming us ...

How does this SPAM relate to Java

Please stop spamming us with the Excel-Lookalike CRAP language.