Big Data to beginners
The applications have grown together with the man dream to get more and more information, then more application have been emerging and the man had became hostage of the application.
For example we can recognized the evolution between man and machine:
In the first time: Many men to just one machine ( like mainframe's age) then one man to one machine ( in to Personal computer era) and the last one is man to many machines ( Nowadays has been appearing devices like smartphones, tablets, PC, Google glasses, watches with Internet, web of things, etc.). With too many softwares, the informations are created in exponential way, so we need to management the data because theirs are important to business rules. Every year they have been increasing of 60%, for example, a company with one thousand of employers makes 1000 terabytes and this volume will have expansion fifty times on 2020.
When the bigdata was born the first difficulty is find the concept, there are many conceits of many people in deferents white paper, blogs, books, etc. The most commons is management a bigger volume of data quickly. But this definition is abstract, the first fail is what is a bigger volume ? To person A a bigger volume of data is one GigaByte and to person B thinks a bigger volume is one hundred of GybaBytes, and the second question is what is faster ? Minutes, hours, seconds, etc is different for each person.
So the challenge of bigdata is manager a big volume of data quickly to do anything like data mining, insert, etc. Makes a application that can grown where it needs is relevant, and to use scalability is a great strategic, it has two kind: the scalability horizontal( increase more hardware like more power's computer and more RAM's memory) and the vertical scalability ( increase number of computer and these work together) the last one has complexity and spends more time, but cheaper because may increase and decrease the number of machine when you need, so in lazy time of your application you use one server but you might have ten if you want, this way an elastic application.
To storage these informations with horizontal scalability, the NOSQL database are a good idea. To understand better NOSQL is not only SQL, one different between NOSQL and SQL is the first one has variety of character between theirs. The usually the NOSQL is faster to read and write a data, however is slower to find him. It may to use other service like the Lucene framework.
With service to find information or find by id in HD is slower than in RAM's memory, so have a device with fast access spend less time to recovery information, call him cache. When you think about cache you should have reflection of:
Have information before of you need, warm up the information, also you can warm up in real time, so when the second request have happened will faster than first one, however kill data is significant, you should know when the information is not necessary or replace with newest information.
A another aspect in bigdata, it is less popular, is speed in developing and modeling, for example, Twitter with many users using a hastag, in a little time there were an improvement to search with these hastags. So the bigdata also is related with a fast software development. In Java's world know more about Java EE 6 and JDK 7 get more productivity.
Finaly, the concept of bigdata is too easy, but put in practice is complicated, because you should know many frameworks, methodologies, technologies, many types of banks like NewSQL, SQL, NOSQL, cache, Lucene. The SQL and normalization still are most popular by University but they were made on 1970, when the server had 1kb of RAM's memory and 800kb and nowadays the smartphone is faster than those old servers. In other word, is wrong think in always use the SQL database.