Natural Language Engineering:
Assessed Coursework 1
Python作业代写 Submission format: You should submit one file that should either be a Jupyter note- book or a zip file containing a Jupyter…
Submission format: You should submit one file that should either be a Jupyter note- book or a zip file containing a Jupyter notebook and any other files (e.g., images or Python files) that you want to include in the notebook.
Due date: Your work should be submitted on the module’s Canvas site before 4pm on Thursday 29th October. This is Thursday of week 5. The standard late penalties apply.
Return date: Marks and feedback will be provided on Canvas on Thursday Novem- ber 19th for all submissions that are submitted by the due date.
Weighting This assessment contributes 20% of the mark for the module.
Overview Python作业代写
For this assignment you are asked to complete a python notebook (‘NLEassignment1.ipynb‘) which is provided with these guidelines. It is based on activities that you have already completed in labs during weeks 1-4 of the module. Any code you have developed during the labs can be submitted as part of your answers to the questions in the as- signment. To score highly on this assignment you will need to demonstrate that you:
- understand the theory and your code;
- can write and document high quality python code;
- can develop code further to solve related problems;
- can carry out experiments and display results in a coherent way;
- can analyse and interpret results; and
- can draw conclusions and understand limitations of the
For this report you should submit a single python notebook containing all of your an- swers to all of the questions in ‘NLEassignment1.ipynb‘. You may import from stan- dard libraries and the ‘sussex nltk‘ resources which you have been provided with. If you wish to import any other code, it must be included in a zip file with your note- book. It must be possible for the assessors to run your python notebook.
Marking Criteria and Requirements
Your submission will be marked out of 100. The assignment question is broken down into 4 parts, all parts should be answered and the breakdown of marks between parts is specified in the notebook. General and part specific criteria are given below. Please read these guidelines carefully and ask if you have any questions.
General: 20 marks available Python作业代写
20 marks are available for the overall quality of your assignment. When award- ing these marks the following general guidelines will be considered.
- In order to avoid misconduct, you should not talk about these coursework questions with your peers. If you are not sure what a question is asking you to do or have any other questions, please ask me or one of the Teaching Assistants.
- Your report should be no more than 2000 words in length excluding code and the content of graphs, tables and any references.
- You should specify the length of your 2000 is a strict limit.
- You should use a formal writing style.
- All graphs should have a title and have each axis clearly labelled. Python作业代写
- In all parts, marks will be awarded for the quality of your written answers as well as your code.
- Written / textual answers MUST be included in Markdown cells. Other- wise, you may score 0 for these answers.
- Code on its own does not count as an explanation or a discussion. Nor do code comments. Code should be commented but explanation and discus- sion MUST be given as text in Markdown cells (see previous point!).
- Do not add external text (e.g. code, output) as images.
- Your code must be applied to and your explanations must refer to the unique set of examples generated by entering your candidate number at the top of the notebook. This must be your own candidate number. Otherwise you may score 0.
- You should submit your notebook with the code having been run (i.e., with the output displayed rather than cleared)
- It must be possible for the assessors to run your python notebook.
Part 1: 15 marks available Python作业代写
Use your training data to find a) the top 20 words which occur more frequently in book reviews than in dvd reviews b) the top 20 words which occur more frequently in dvd reviews than book reviews. Dis- cuss what pre-processing techniques you have applied (or not applied) in answering this question, and why. [15%]
The following breakdown of marks will be applied
- Clear and effective use of code in order to correctly find the top 20 words which occur more frequently in book reviews than in dvd reviews [5 marks]
- Clear and effective use of code in order to correctly finding the top 20 words which occur more frequently in dvd reviews than in book reviews [5 marks]
- Selection and justification of pre-processing techniques chosen / not chosen [5 marks]
Part 2: 25 marks available
Design, build and test a word list classifier to classify reviews as being from the book domain or from the dvd domain. Make sure you discuss 1) how you decide the lengths and contents of the word lists and ii) accuracy, precision and recall of your final classifier.[25%]
The following breakdown of marks will be applied Python作业代写
- Descriptionof design of word list classifier [5 marks]
- Clearand effective use of code to build classifier [5 marks]
- Considerationof lengths and contents of word lists [5 marks]
- Testingincluding calculation of accuracy, precision and recall [5 marks]
- Discussionof results [5 marks]
Part 3: 15 marks available Python作业代写
Compare the performance of your word list classifier with a Naive Bayes classifier (e.g., from NLTK). Make sure you discuss the results. [15%]
The following breakdown of marks will be applied
- Clear and effective use of code to apply a NB classifier to the data [5 marks]
- Calculation of evaluation metrics for NB classifier [5 marks]
- Discussion of results [5 marks]
Part 4: 25 marks available
Design and carry out an experiment into the impact of the amount of training data on each of these classifiers. Make sure you describe de- sign decisions in your experiment, include a graph of your results and discuss your conclusions. [25%]
The following breakdown of marks will be applied Python作业代写
- Description of design of experiment [5 marks]
- Clear and effective use of code to investigate effect of amount of training data on 1 of the classifiers [5 marks]
- Clear and effective use of code to investigate effect of amount of training data on 2nd classifier [5 marks]
- Presentation of results [5 marks]
- Discussion of conclusions [5 marks]