Hadoop/Bigdata: 2013

Sunday, 4 August 2013

AccessControlException: Access denied for user hdfs. Superuser privilege is required

put: Permission denied: user=XYZ, access=WRITE, inode="/user/test":hadoopuser:hdfsusers:drwxr-xr-x

This error due to permissions check fails for the directory /user.test, Whenever changes need for a file or directory in HDFS do a permissions check for a file or directory. Hadoop(without security or Kerberos installation) the identity of a client process user name is just whatever the host operating syste.

For Unix-like systems, it identify the client username by executing the command

The user name is the equivalent of `whoami`;

The group list is the equivalent of `bash -c groups`.

Super User

The super-user is the user with the same identity as name node daemon process running. That is the username started the name node daemon,

If you getting the above error this while copying a file to HDFS directory

$hadoop fs –put text.txt /user/test/

You(username-XYZ) are copying a file to hdfs directory “/user/test/”, it owned by hadoopuser in group hdfsusers.

Before hadoop tcommand, export the hadoop user like below then try

Export HADOOP_USER_NAME=hadoopuser

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Access denied for user hdfs. Superuser privilege is required

If getting this kind of error in java programs, before running java jar

$export HADOOP_USER_NAME=hadoopuser OR

Include this lines in your java program

System.setProperty("HADOOP_USER_NAME", hadoopSuperUserName);

Configuration conf = new Configuration(); conf.set("hadoop.job.ugi", hadoopSuperUserName);

Thursday, 25 July 2013

Hadoop file manipulations in java

            Configuration config = new Configuration();
            FileSystem hdfs = FileSystem.get(config);
            Path path = new Path(fileName);
            FileStatus fileStatus = hdfs.getFileStatus(path);

In order to get the last modification and access time of a file in Hadoop
file system:

            long modificationTime = fileStatus.getModificationTime();
            long accessTime = fileStatus.getAccessTime();

In order to get the replication and block size of a file in Hadoop
file system:

            short replica = fileStatus.getReplication();
            long blockSize = fileStatus.getBlockSize();

In order to get the group and owner of a file in Hadoop
file system:

            String group= fileStatus.getGroup();
            String owner =fileStatus.getOwner();

List All Files in the directory:
            if (fileStatus.isDir())
            {
              FileStatus[] status = fs.listStatus(

path);
              for (int i=0;i<status.length;i++){ 
                 Path cur =

status[i].getPath();

                 System.out.println("cur.toUri().getPath()");             
               }
            }else{

System.out.println(fileName+": is not a directory");
}

Tuesday, 23 July 2013

List Data nodes in hadoop cluster

List Data nodes in hadoop cluster Java code:
    Configuration conf = new Configuration();
      try{
   FileSystem fs = FileSystem.get(conf);
         BufferedWriter buffWrite = new BufferedWriter(new FileWriter(file));

        DistributedFileSystem hdfs = (DistributedFileSystem) fs;
        DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats();
                         for (int i = 0; i < dataNodeStats.length; i++) {
                    System.out.println(dataNodeStats[i].getHost());
                }
        }catch(IOException e) {
            logger.error("IOException when writing the node list to file \t" + e.toString());
        }

Monday, 22 July 2013

Set replication factor in HDFS

In the command line:
For existing file in HDFS:
To set replication of an individual file to 4:
hadoop fs -setrep -w 4 /hdfs/path/tofile

You can also do this for directory recursively.
To set replication for a directory to 1:
hadoop fs -setrep -R -w 1 /hdfs/path/toDirectory
To change replication of entire HDFS to 2:
hadoop fs -setrep -R -w 2 /
To copy new file with replication 2:
hadoop fs -D dfs.replication=2 -copyFromLocal /local/path/tofile /hdfs/path/tofile

In Java program: For a file

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path hdfsPath = new Path("/hdfs/hdfsFile");
short replication =2;
fs.setReplication(hdfsPath, replication);

returns 'true' if successful;
'false' if file does not exist or is a directory

Saturday, 20 July 2013

Hadoop map reduce new API

Hadoop map reduce API changed starting from Hadoop 0.20.x

Older API: - org.apache.hadoop.hbase.mapred
Newer API- org.apache.hadoop.hbase.mapreduce

The package org.apache.hadoop.mapred.* have been deprecated.
The current Map Reduce tutorial on the apache Hadoop page is written for the old API.

WordCount program with new API

import java.io.IOException;
import java.lang.InterruptedException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
/**
* The map class of WordCount.
*/
public static class TokenCounterMapper     extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException
    {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }
}
/**
* The reducer class of WordCount
*/
public static class TokenCounterReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
    public void reduce(Text key, Iterable<IntWritable> values, Context context)
        throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable value : values) {
            sum += value.get();
        }
        context.write(key, new IntWritable(sum));
    }
}
/**
* The main entry point.
*/
public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf, "Example Hadoop 0.20.1 WordCount");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenCounterMapper.class);
    job.setReducerClass(TokenCounterReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Compile WordCount.java:
mkdir wordcount_classes
$ javac -cp classpath -d wordcount_classes WordCount.java

where classpath is: CDH4 -/usr/lib/hadoop/*:/usr/lib/hadoop/client-0.20/*
In apache hadoop:     ${HADOOP_HOME}/hadoop-core-1.1.2.jar

Create a JAR
$jar -cvf wordcount.jar -C wordcount_classes/ .

Execute program:
$hadoop jar wordcount.jar WordCount /wordcount/input /wordcount/output