Back

GALLERY: Hadoop

The HDFS file system includes a so-called secondary namenode, a misleading term that some might incorrectly interpret as a backup namenode when the primary namenode goes offline. In fact, the secondary namenode regularly connects with the primary namenode and builds snapshots of the primary namenode's directory information, which the system then saves to local or remote directories. These checkpointed images can be used to restart a failed primary namenode without having to replay the entire journal of file-system actions, then to edit the log to create an up-to-date directory structure. Because the namenode is the single point for storage and management of metadata, it can become a bottleneck for supporting a huge number of files, especially a large number of small files. HDFS Federation, a new addition, aims to tackle this problem to a certain extent by allowing multiple namespaces served by separate namenodes. Moreover, there are some issues in HDFS such as small file issues, scalability problems, Single Point of Failure (SPoF), and bottlenecks in huge metadata requests. One advantage of using HDFS is data awareness between the job tracker and task tracker. The job tracker schedules map or reduce jobs to task trackers with an awareness of the data location. For example: if node A contains data (a, b, c) and node X contains data (x, y, z), the job tracker schedules node A to perform map or reduce tasks on (a, b, c) and node X would be scheduled to perform map or reduce tasks on (x, y, z). This reduces the amount of traffic that goes over the network and prevents unnecessary data transfer. When Hadoop is used with other file systems, this advantage is not always available. This can have a significant impact on job-completion times as demonstrated with data-intensive jobs.

Loading...
  • BlueTalon brings Hadoop security down to the file system ...
  • Top 10 priorities for a successful Hadoop implementation ...
  • Hadoop PIG Tutorial: Introduction, Installation & Example
  • Q&A: Big Data Calls for New Architecture, Approaches ...
  • Big data Analytics Data science Apache Hadoop Data ...
  • Brit 'naut Major Tim could carry YOUR name into ...
  • Big Data Management with HPE Vertica | Looker
  • hadoop生态系统主要架构图汇总 - 章三丰 - 博客园
  • Elon Musk wants to get into the boring business, literally ...
  • Visualization of data science patterns - Data Science Central
  • Oracle Data Integrator 12c | ODI 12c training London | WCC
  • Lego Skill Crane: The Claw - YouTube
  • Full Adder | Digital Electronics - GeeksforGeeks
  • 8 Open Source Big Data Mining Tools
  • [Demo] Combining Talend 6 + Spark for Real-Time Big Data ...
  • EMV sets the stage for a better payment future | InfoWorld
  • 3 Great Examples of How to Use & Leverage Social Media ...
  • 8 Steps to Becoming a Data-Driven Organization from Talend
  • Working with Data: What's Hot and What's Not - insideBIGDATA
  • Produktionsplan [Wirtschaftsinformatik Wiki - Kewee]