Myths and Misconceptions Associated With Big Data Hadoop

Hadoop is open-source software. It is a framework for storing data and running applications on clusters of commodity hardware.

It provides for storage of any scale and any data. It has the enormous processing power and can handle limitless concurrent tasks of a job. Hadoop is thus an essential tool which has unprecedented fault tolerance, is flexible, scalable and is still is of low cost.

There is a lot of common myths and misconceptions about Hadoop and big data. Here we have listed some of these myths.

Myth: Hadoop is a single product.
Fact: Hadoop consists of multiple products.

It is generally assumed that Hadoop is a singular product. However, it is a brand name for a family of open source products. These products are incubated and administered by Apache software.

The library includes:

  • The Apache Hadoop library comprises:
  • The Hadoop Distributed File System (HDFS)
  • MapReduce
  • Pig
  • Hive
  • HBase
  • HCatalog

Myth: Hadoop is only about data volume.
Fact: Hadoop is also about data diversity, not just data volume.

Another misconception is that Hadoop is only associated with data volume. However, the real value of this software is in its ability to handle diverse data.

HDFS is designed to manage the storage and access any type of data. The only requirement is that the data is put in a file and that this file can be copied into HDFS. The primary advantage of Hadoop is the ability to analyze and extract useful data from a large volume of data.

Myth: All the components of Hadoop are open source only.
Fact: Hadoop is open source but available from proprietary vendors too.

Even though Apache Hadoop is an open-source software library which is available from Apache Software Foundation downloaded for free from, there are vendors such as IBM, Cloudera and EMC Greenplum who operate through unique distribution.

Myth: The only answer to “Big Data” is Hadoop.
Fact: Big data does not always require Hadoop.

Even though Big Data and Hadoop have become synonymous, it is not the only solution to Big Data. There are other companies like Teradata and Vertica which have been working on Big Data.

Myth: HDFS is the database management system of Hadoop.
Fact: HDFS is a file system, not a database management system (DBMS).

While Hadoop is mainly a distributed file system, it does not have the capabilities of database management system (DBMS). DBMS have abilities such as indexing, random access to data, support for standard SQL, and query optimization.

Myth: Hadoop is only used for analyzing Weblogs and other Web data.
Fact: Hadoop enables many types of analytics, not just Web analytics.

Some of the other types of data that is used for analysis is customer base segmentation, fraud detection, and risk analysis.

Myth: Hadoop is Free.
Fact: Hadoop is open source with certain costs

Since Hadoop is open source, it assumed that it is free. However, this is not entirely true. There are certain deploying costs involved. Administrative tools and support can create additional costs. In addition to this, there is the hardware cost of cluster, land, and electricity to make the cluster operational.

To get more information about Hadoop, please visit stechies website.


Share with your friends
To report this post you need to login first.

Leave a Reply

Your email address will not be published. Required fields are marked *