Data Science at Scale - 2024 entry
MODULE TITLE | Data Science at Scale | CREDIT VALUE | 15 |
---|---|---|---|
MODULE CODE | COM3021 | MODULE CONVENER | Dr Hugo Barbosa (Coordinator) |
DURATION: TERM | 1 | 2 | 3 |
---|---|---|---|
DURATION: WEEKS | 11 | 0 | 0 |
Number of Students Taking Module (anticipated) | 30 |
---|
Data science relies on large amounts of data to be effective and many commercial and scientific applications require the analysis of large quantities of heterogenous, noisy data on distributed machines. This module will examine the ways in which algorithms for data science can be implemented for large data and will discuss new algorithms specifically designed for large scale data. You will also work with large-scale distributed and cloud systems for storing and computing with big data.
Through theory and practice this module aims to equip you with an understanding of the principles of distributed computing, particularly on cloud-based systems, the ways in which data can be stored and accessed to allow efficient computation, and efficient algorithms for large-scale computation.
Distributed cloud computing will provide you with the underpinning knowledge required to develop and implement machine learning and artificial intelligence algorithms on distributed high-performance computing systems.
On successful completion of this module, you should be able to:
Module Specific Skills and Knowledge:
1 Explain the common challenges encountered in large scale data science projects;
2 Display competence the use of a range of abstraction and programming models for large scale data processing;
3 Analyse and use a range of data storage models for parallel query processing;
4 Understand principles of and use cloud and distributed systems for data processing;
5 Understand and design algorithms for machine learning on large scale distributed systems;
Discipline Specific Skills and Knowledge:
6 Describe a number of different programming paradigms and associated data structures;
7 Learn a variety of data science methods and apply them to real problems;
Personal and Key Transferable / Employment Skills and Knowledge:
8 Plan and write a technical report;
9 Adapt existing technical knowledge to learning new methods.
• Introduction: the size of data and impediments to efficient computation;
• Data storage and retrieval: relational databases and NoSQL systems;
• Distributed systems and data: cloud computing and supercomputing; data distribution and consistency;
• The MapReduce paradigm and implementations;
• Algorithms for large scale learning: stochastic gradient descent, large scale linear algebra;
• Stream processing;
• Future architectures; co-design of hardware and algorithms.
Scheduled Learning & Teaching Activities | 35 | Guided Independent Study | 115 | Placement / Study Abroad | 0 |
---|
Category | Hours of study time | Description |
Scheduled Learning and Teaching | 20 | Lectures |
Scheduled Learning and Teaching | 15 | Workshops and tutorials |
Guided Independent Study | 115 | Coursework; private study; reading |
Form of Assessment | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|
Not Applicable |
Coursework | 30 | Written Exams | 70 | Practical Exams | 0 |
---|
Form of Assessment | % of Credit | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|---|
Written Exam | 70 | 2 hours | 1-6 | Orally, on request |
Technical Exercise and Report | 30 | 30 hours | 2-5, 7-9 | Written |
Original Form of Assessment | Form of Re-assessment | ILOs Re-assessed | Time Scale for Re-assessment |
---|---|---|---|
Written Exam | Written Exam (2 hours) | 1-6 | Referral/deferral period |
Technical Exercise and Report | Technical Exercise and Report 1 | 2-5, 7-9 | Referral/deferral period |
Reassessment will be by coursework and/or written exam in the failed or deferred element only. For referred candidates, the module mark will be capped at 40%. For deferred candidates, the module mark will be uncapped.
information that you are expected to consult. Further guidance will be provided by the Module Convener
ELE
Reading list for this module:
Type | Author | Title | Edition | Publisher | Year | ISBN |
---|---|---|---|---|---|---|
Set | Kleppmann, M. | Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems | 1st | O'Reilly | 2016 | 1449373321 |
Set | White, T. | Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale | 4th | O'Reilly | 2015 | 1491901632 |
Set | Narkhede, N., Shapira, G., Polino, T. | Kafka - The Definitive Guide | 1st | O'Reilly | 2016 | 978-1491936160 |
Set | Chambers, B. | Spark: The Definitive Guide | 1st | O'Reilly | 2018 | 1491912219 |
CREDIT VALUE | 15 | ECTS VALUE | 7.5 |
---|---|---|---|
PRE-REQUISITE MODULES | ECM2419, COM2013 |
---|---|
CO-REQUISITE MODULES |
NQF LEVEL (FHEQ) | 6 | AVAILABLE AS DISTANCE LEARNING | No |
---|---|---|---|
ORIGIN DATE | Friday 12th April 2019 | LAST REVISION DATE | Monday 4th March 2024 |
KEY WORDS SEARCH | None Defined |
---|
Please note that all modules are subject to change, please get in touch if you have any questions about this module.