Data Analytics and Machine Learning - 2023 entry
MODULE TITLE | Data Analytics and Machine Learning | CREDIT VALUE | 15 |
---|---|---|---|
MODULE CODE | ECM3901 | MODULE CONVENER | Dr Saptarshi Das (Coordinator) |
DURATION: TERM | 1 | 2 | 3 |
---|---|---|---|
DURATION: WEEKS | 5 | 6 |
Number of Students Taking Module (anticipated) | 25 |
---|
Classical statistical methods were developed at a time when data collection was expensive. Recent advances in science and computing technology has resulted in an explosion of available data, in fields as diverse as medicine, finance, marketing and biology. This has led to the development of new statistical methodologies, aimed at meeting the challenges associated with processing and understanding “big data”.
In this problem-solving oriented module, you will develop hands-on skills and techniques needed to turn complex data sets into useful information, implementing techniques developed in data mining and machine learning, and learning how to apply these in various data analytics packages and their open source versions.
The module spans over two terms. In the first term, it will develop basic understanding of data analytics and encourage group works, followed by more involved big data problems in the second term.
Prerequisite modules: “Scientific Computing 1" (ECM1914) and “Statistical Modelling” (ECM2907) or equivalent.
This module aims to lay the foundations for an understanding of statistical learning approaches and multivariate statistics. It aims to provide practical skills for implementing these techniques in practice, and how to effectively analyse and present “big data” effectively.
On successful completion of this module you should be able to:
Module Specific Skills and Knowledge:
1. Understand the challenges associated with collecting, manipulating and interpreting “big data”;
2. Learn the fundamental concepts of predictive modelling and pattern recognition;
3. Gain knowledge and insight into the latest developments in these fields;
4. Understand and apply statistical learning techniques in a variety of applications, using Matlab and open-source software Python/R;
Discipline Specific Skills and Knowledge:
5. Learn and apply advanced statistical methods to process complex data sets;
6. Improve computational skills, and gain a better understanding of the practical implementation of these approaches;
Personal and Key Transferable/Employment Skills and Knowledge:
7. Demonstrate key skills in data analytics, including practical implementation;
8. Understand the challenges of “big data”, communicate reasoning and solutions effectively in writing;
9. Demonstrate appropriate use of learning resources;
10. Demonstrate self-management and time management skills.
- Heterogeneous, multimedia datasets like image, audio, video, text processing, financial time-series, bioinformatics, remote sensing and medical image; benchmark big datasets from engineering, physical/life sciences, business and social sciences; Data cleaning and pre-processing, data visualization; descriptive statistics, feature extraction, Introduction to statistical learning paradigms: supervised learning, unsupervised learning, semi-supervised learning, connections to statistical signal processing and information theory [3 hours];
- Big-data management and processing, parallel computing on CPUs and GPUs, algorithm scalability, signal/image filtering, wavelets, colour image processing, challenges in computer vision [3 hours];
- Multivariate analysis and dimensionality reduction: principal component analysis, independent component analysis [3 hours];
- Regression: Linear and nonlinear, univariate, multiple and multivariate regression, least square, regularization and shrinkage methods, model selection and resampling methods [3 hours];
- Introduction to Gaussian process and kernel methods; spatial and temporal random processes [3 hours];
- Classification: probabilistic and non-probabilistic classifiers, feature selection, logistic regression, discriminant analysis, k-nearest neighbour, support vector machine, decision tree, ensemble learning, generalised linear models, multilayer perceptron [3 hours];
- Clustering: k-means, mixture-model and expectation-maximisation, hierarchical and spectral clustering [3 hours];
- Recent advances in artificial intelligence and machine learning in particular deep learning (convolutional neural networks, auto-encoder, transfer learning, recurrent neural networks, deep generative models) [6 hours];
- Fuzzy inference, single and multi-objective swarm/evolutionary optimisation algorithms, reinforcement learning; Bayesian optimisation [3 hours];
- Sampling and inference, graphical models, Bayesian machine learning, combining models [3 hours].
Scheduled Learning & Teaching Activities | 33 | Guided Independent Study | 117 | Placement / Study Abroad | 0 |
---|
Category | Hours of study time | Description |
Scheduled Learning & Teaching activities | 11 | Formal lectures of new material |
Scheduled Learning & Teaching activities | 22 | Computer classes and tutorials |
Guided Independent Study | 117 | Lecture & assessment preparation, wider reading |
Form of Assessment | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|
Fortnightly exercise | 2 x 5 hours | 1-10 | Questions marked by tutors, feedback given on all questions during tutorials |
Coursework | 100 | Written Exams | 0 | Practical Exams | 0 |
---|
Form of Assessment | % of Credit | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|---|
1 x Coursework (mixture of dimensionality reduction, classification, regression, clustering, AI) - based on the skills learned in the formative assessment and the Matlab/ Python/ R practical classes. Need to submit codes for an in-depth analysis of a chosen big dataset and a detailed individual report. | 1 x 50 |
Approx. 6-10 pages essay (1 x 3 hours, individual report) (Term 2) |
1-10 | Written and oral |
1 x Presentation on the group reports. | 1 x 10 | 1 x 15 mins (Term 1) | 1-10 | Written and oral |
1 x Group report/poster on in-depth analysis of medium size dataset. | 1 x 20 | Approx. 4-6 page essay (1 x 3 hours, group report), (Term 1) | 1-10 | Written and oral |
1 x in-class open book test on basic understanding of method/programming (10 marks programming and 10 marks Quiz) | 1 x 20 | 90 mins (Term 1) | 1-10 | Written and oral |
Original Form of Assessment | Form of Re-assessment | ILOs Re-assessed | Time Scale for Re-assessment |
---|---|---|---|
All above | Coursework (100%) | All | August Ref/Def period |
If a module is normally assessed entirely by coursework, all referred/deferred assessments will normally be by assignment.
If a module is normally assessed by examination or examination plus coursework, referred and deferred assessment will normally be by examination. For referrals, only the examination will count, a mark of 40% being awarded if the examination is passed. For deferrals, candidates will be awarded the higher of the deferred examination mark or the deferred examination mark combined with the original coursework mark.
information that you are expected to consult. Further guidance will be provided by the Module Convener
Basic reading:
Reading list for this module:
Type | Author | Title | Edition | Publisher | Year | ISBN |
---|---|---|---|---|---|---|
Set | Sergios Theodoridis, Aggelos Pikrakis, Konstantinos Koutroumbas & Dionisis Cavouras | Introduction to Pattern Recognition: A Matlab Approach | 1st | Academic Press | 2010 | B008KO4GQ2 |
Set | Simon Rogers & Mark Girolami | A First Course in Machine Learning | 2nd | CRC Press | 2016 | B01N7ZEBK8 |
Set | Murphy, K. | Machine Learning: A Probabilistic Perspective | 1st | MIT Press | 2012 | 978-0-262-018029 |
Set | Wendy L. Martinez and Angel R. Martinez | Computational Statistics Handbook with MATLAB | 3rd | CRC Press | 2015 | 978-1466592735 |
Set | Wendy L. Martinez, Angel R. Martinez, Jeffrey Solka | Exploratory Data Analysis with MATLAB | 3rd | CRC Press | 2017 | 978-1498776066 |
Set | Hastie T., Tibshirani R. & Friedman J. | The Elements of Statistical Learning: Data Mining, Inference, and Prediction | 2nd | Springer | 2009 | 978-0387848587 |
Set | Christopher Bishop | Pattern Recognition and Machine Learning | Springer | 2007 | 978-0387310732 | |
Set | David Barber | Bayesian Reasoning and Machine Learning | Cambridge University Press | 2012 | 978-0-521-51814-7 | |
Set | Sebastian Raschka, Vahid Mirjalili | Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow | 2nd | Packt Publishing | 2017 | 978-1787125933 |
Set | Aurelien Geron | Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow | O'Reilly | 2019 | 978-1492032649 | |
Set | Francois Chollet | Deep Learning with Python | Manning Publications | 2017 | 978-1617294433 | |
Set | Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach | Deep Learning | MIT Press | 2017 | 978-0262035613 | |
Set | Andreas C. Muller, Sarah Guido | Introduction to Machine Learning with Python | O'Reilly Media | 2016 | B01M0LNE8C | |
Set | Carl Edward Rasmussen, Christopher K. I. Williams | Gaussian Processes for Machine Learning | MIT Press | 2006 | 978-0262182539 | |
Set | Bharath Ramsundar, Reza Bosagh Zadeh | TensorFlow for Deep Learning | O'Reilly | 2018 | 978-1491980453 | |
Set | Matthew Scarpino | TensorFlow for Dummies | John Wiley & Sons | 2018 | 978-1119466215 |
CREDIT VALUE | 15 | ECTS VALUE | 7.5 |
---|---|---|---|
PRE-REQUISITE MODULES | ECM2907, ECM1914 |
---|---|
CO-REQUISITE MODULES |
NQF LEVEL (FHEQ) | 6 | AVAILABLE AS DISTANCE LEARNING | No |
---|---|---|---|
ORIGIN DATE | Thursday 7th May 2015 | LAST REVISION DATE | Wednesday 18th January 2023 |
KEY WORDS SEARCH | Big data; machine learning; pattern recognition; multivariate analysis; classification; clustering; regression; deep learning; artificial intelligence. |
---|
Please note that all modules are subject to change, please get in touch if you have any questions about this module.