[하둡] 하둡 실행

[하둡] 하둡 실행
2020년 06월 23일
- 홀쑥
- 작성자
- 2020.06.23.:35
1) HDFS 명령어 ☞ HDFS를 제어할 수 있는 쉘 명령어

2) 도움말 보기 ☞ hdfs dfs -help

3) 파일목록 보기 ☞ ls, lsr

- ls : 지정한 디렉토리에 있는 파일의 정보를 출력한다.

- lsr : 하위 디렉토리 정보까지 출력한다.

ex) hdfs dfs -ls [디렉토리|파일]

ex) hdfs dfs -lsr [디렉토리|파일]

디렉토리나 파일을 지정하지 않을 경우 해당 계정의 홈 디렉토리를 조회한다.

4) 파일 용량 ☞ du, dus

- du : 지정한 디렉토리나 파일의 사용량을 확인한다. (출력결과 바이트 단위)

- dus : 전체 합계 용량을 출력한다.

ex) hdfs dfs -du [디렉토리|파일]

ex) hdfs dfs -dus [디렉토리|파일]

디렉토리나 파일을 지정하지 않을 경우 해당 계정의 홈 디렉토리를 조회한다.

5) 파일내용 보기 ☞ cat, text

- cat : 지정한 파일의 내용을 출력한다.

- text : cat은 텍스트파일만 출력하지만, text는 zip 파일 형태로 압축한 파일도 텍스트형태로 출력한다.

ex) hdfs dfs -cat [파일]

ex) hdfs dfs -text [파일]

6) 디렉토리 생성 ☞ mkdir

ex) hdfs dfs -mkdir [디렉토리]

이미 존재하는 디렉토리를 생성할 경우 에러가 발생한다.

7) 파일 복사

- put / copyFromLocal : 로컬 파일 시스템의 파일 및 디렉토리를 HDFS의 경로로 복사한다.

ex) hdfs dfs -put [로컬디렉토리|파일] [목적지디렉토리|파일]

ex) hdfs dfs -copyFromLocal [로컬디렉토리|파일] [목적지디렉토리|파일]

- get / copyToLocal : HDFS에 저장된 데이타를 로컬 파일 시스템으로 복사한다.

ex) hdfs dfs -get [소스디렉토리|파일] [로컬디렉토리|파일]

ex) hdfs dfs -copyToLocal [소스디렉토리|파일] [로컬디렉토리|파일]

- getmerge : 모든 파일의 내용을 하나로 합친 후, 로컬파일 시스템에 단 하나의 파일로 복사한다.

ex) hdfs dfs -getmerge [소스디렉토리|파일] [로컬파일명]

- cp : HDFS에서 디텍토리나 파일을 복사한다.

ex) hdfs dfs -cp [소스디렉토리|파일] [목적지디렉토리|파일]

8) 파일 이동

- mv : 디렉토리나 파일을 목적지 경로로 이동한다.

ex) hdfs dfs -mv [소스디렉토리|파일] [목적지디렉토리|파일]

- moveFromLocal : put명령어와 동일한

로컬파일 시스템으로 복사된 후 소스 경로의 파일은 삭제된다.

ex) hdfs dfs -moveFromLocal [소스디렉토리|파일] [목적지디렉토리|파일]

9) 삭제

- rm : 디렉토리나 파일이 삭제된다.

디렉토리인 경우 반드시 비어 있어야 삭제된다.

- rmr : 디렉토리나 파일이 삭제된다.

디렉토리인 경우 비어 있지 않아도 삭제된다.

10) 카운트 조회

- count : 지정한 경로에 대한 전체 디렉토리 개수, 전체 파일 개수, 전체 파일의 크기를 출력한다.

ex) hdfs dfs -count [디렉토리|파일]

11) 권한 변경

- chmod : 지정한 경로에 대한 권한을 변경한다.

ex) hdfs dfs -chmod 777 sample.csv

- chown : 지정한 파일과 디렉토리에 대한 소유권을 변경한다.

ex) hdfs dfs -chown tester:testerGroup sample.csv

sample.csv 파일의 소유자를 tester, 소유그룹을 testerGroup으로 변경한다.

- chgrp : 지정한 파일과 디렉토리에 대한 소유권 그룹만 변경한다.

ex) hdfs dfs -chgrp testerGroup sample.csv

sample.csv 파일의 소유그룹을 testerGroup으로 변경한다.

- R : 하위 디렉토리의 정보도 모두 변경한다.

12) 통계정보 조회 ☞ stat

ex) hdfs dfs -stat [디렉토리|파일]

13) 휴지통 비우기 ☞ expunge

ex) hdfs dfs -expunge

14) 0바이트 파일 생성 ☞ touchz

ex) hdfs dfs -touchz [파일]

0. nn01에서 하둡 실행

start-all.sh

1. Maven 프로젝트 만들기

create a simple Project 체크

GroudId = com.kosmo

ArticleId = lab1

2. pom.xml붙이기

groupid 수정

3. java 파일 수정

- sample3 안에 자바파일 패키지명 바꾸기

- wordCount수정

package sample; import java.io.IOException; import java.util.StringTokenizer; import javax.swing.text.StyledEditorKit.ItalicAction; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; /** * 파일을 읽어서 단어의 수를 계산하여 그 결과를 파일에 저장 */ public class WordCount { public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>{ //입력키 입력값 / 출력키 출력값 private final static LongWritable one = new LongWritable(1); //하둡 자료형 = 자바 long one = 1L; // one이라는 변수가 1을 가지고 있음 private Text word = new Text(); // String 대신 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ String line = value.toString(); StringTokenizer st = new StringTokenizer(line, "\t\r\n\f |:;,.()<>"); while(st.hasMoreTokens()) { word.set(st.nextToken()); context.write(word, one); } } } public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>{ // 입력키 입력값 출력키 출력값 // context.write(word, one);의 자료형 private LongWritable result = new LongWritable(); public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for(LongWritable v : values) { sum += v.get(); // hadoop 값 -> java 값으로 변환 } result.set(sum); // java 값 -> hadoop 값으로 변환 context.write(key, result); } } public static void main(String[] args) throws Exception{ // Configuration : 하둡의 설정파일을 조회하고 변경가능 클래스 Configuration conf = new Configuration(); if (args.length != 2) { System.err.println("Usage: WordCount <input> <output>"); System.exit(2); } // Job에 Configuration(설정) 지정 Job job = Job.getInstance(conf, "WordCount"); // 각 클래스 지정 job.setJarByClass(WordCount.class); // 나중에 jar로 실행해야 함 job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); // 출력 Key / Value 타입을 지정 job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); // 입력포맷과 출력포맷 지정 job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); // 파일입력포맷과 파일출력포맷 지정 FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); // Job이 다 완성될 때까지 대기하기 job.waitForCompletion(true); } }

4. 프로젝트 우클릭 -> Run as -> Maven install 두번 (첫 번째는 오류남)

5. WinSCP로 생성된 jar파일(snapshot) /home/hadoop/source에 옮기기

6. data파일 하둡으로 옮기기

hdfs dfs -put /home/hadoop/temp/data.txt /input/data

7. 하둡 실행 yarn 명령어 (1버전은 hadoop 명령어)

yarn jar /home/hadoop/source/lab1.java sample.WordCount /input/data/data.txt /output/wordcount

8. 확인하기

hdfs dfs -cat /output/wordcount/part-r-00000
다음글
다음 글이 없습니다.
이전글
이전 글이 없습니다.
댓글

티스토리툴바