OMP5434 (Fall 2019) Big Data Computing
Big Data Computing代写 There are 2 questions in this programming assignment. You should write a MapReduce program to solve each of them.
Individual Assignment 2 Due Date: 10:00am, 2nd December, 2019
Please submit your assignment in Blackboard
and follow our requirements in Section 2.
1. Problem statement
A sample input file is given below. Each line corresponds to a point-of-interest (POI), which contains a keyword, coordinate values x and y (separated by white space).
park 3 5 lake 2 3 mall 1 4 park 2 4 lake 9 8 mall 2 7 |
We measure the distance between two points p1=(x1,y1) and p2=(x2,y2) by:
_________________
dist(p1, p2) = Ö(x1 – x2)2 + (y1 – y2)2
Each keyword k is associated with a group G(k) of points.
[Example] The group of “park” contains two points: (3,5) and (2,4).
There are 2 questions in this programming assignment.
You should write a MapReduce program to solve each of them.
Question Q1: Find the centroid (i.e., the mean position of points) of each group.
[Example]
Input: the sample input above Big Data Computing代写
Output:
lake 5.5 5.5
mall 1.5 5.5
park 2.5 4.5
Question Q2: Find the diameter (i.e., the maximum distance between any two points inside a group) of each group.
[Example]
Input: the sample input above Big Data Computing代写
Output:
lake 8.602
mall 3.162
park 1.414
2. Requirements Big Data Computing代写
- Though MapReduce support multiple languages, in this assignment, you should use Java (Java 8) for implementation.
- You submission should be organized as follows
<YourStudentID> // your folder name, [Example] 19001234g
— Q1.java // source file for question 1
— Q1.jar // jar file for question 1, compiled and archived from Q1.java
— Q2.java // source file for question 2
— Q2.jar // jar file for question 2, compiled and archived from Q2.java
3.Archive the above structure as <YourStudentID>.zip and submit this .zip file in blackboard. [Example] 19001234g.zip
4.Make sure that you can compile your source file and run with the latest Hadoop version’s (i.e., Hadoop 3.2.1) pseudo-distributed mode.
5.Your jar file should be directly runnable on Linux platform with the following call: Big Data Computing代写
bin/hadoop jar Q1.jar Q1 <input path> <output path>
bin/hadoop jar Q2.jar Q2 <input path> <output path>
6.Your output result should preserve double precision.
7.You should only use one MapReduce round to solve each sub-question.
8.[Hint] You may use the Ubuntu image we provided for this assignment.
- Google drive: Big Data Computing代写
https://drive.google.com/file/d/1lMqmTAj2sC2gVqkVWW-MDUR24vv-a3Si/view?usp=sharing
- The Y drive in COMP Lab: Y:\Subject\COMP5434
Note: These files will get expired on November 7!
3. Grading criteria
20 marks will be given if your program can be compiled.
- for each .java file, 10 marks
80 marks will be given if your program is correct. We will test the correctness of your program by using 8 test cases (4 for each sub-question).
- For each test case, 10 marks
Notice this is an individual assignment. Plagiarism will result in 0 mark!
更多其他:文学论文代写 商科论文代写 艺术论文代写 人文代写 Case study代写 心理学论文代写 哲学论文代写 计算机论文代写
您必须登录才能发表评论。