The columnar storage is good, especially when there are lots of tabular fields (this is quite common). In querying, the data to traverse is far less than that on the row storage. Less data to traverse brings less I/O workloads and higher query speed. However, the Hadoop application consumes most time on the hard disk I/O without columnar storage.
Both Hive and Impala support columnar storage, but...
on Jan 24, 2014
The data warehouse is essential to enterprise business intelligence, which accounts for a great part of the total enterprise cost. With the global data explosion in recent years, the business data volume grow significantly, posing a serious challenge for enterprise data warehouse to meet the diverse and complex business demands. More data, more data warehouse applications, more concurrent...
on Dec 11, 2013
In Java, implementing via SQL is a well-developed practice for database computation. However, the structured data is not only stored in the database, but also in the text, Excel, and XML files. Considering this, how to compute appropriately regarding the structured data from non-database files? This article raises 3 solutions for your reference: implement via Java API, convert to database...
on Aug 29, 2013
According to research, most complex report development work can be simplified by performing the data source computation in advance. For example, find out the clients who bought all products in the given list, and then present the details of these clients.
In developing such reports, it is the
on Aug 20, 2013
In report development, we may need to present the data from multiple databases in one report, such as data from MSSQL database for CRM and Oracle database for ERP. If the reporting tool like iReport only supports single data source, then we need to consolidate the multiple data sources into a single data source.
The Crystal, BIRT, and other so-called reporting tools for multiple data source can...
on Aug 6, 2013