Introduction to Big Data - CB554

Location Term Level Credits (ECTS) Current Convenor 2018-19 2019-20
Canterbury Spring
View Timetable
5 15 (7.5)


At least one quantitative research module that covers the concept of statistical significance and basic statistical modelling (e.g. CB313 Introduction to Statistics for Business, SO410 An Introduction to Quantitative Research or equivalent) at the discretion of the module convenor.


Available to short-term/exchange students



This module aims to address these aspects and challenges of Big Data Analytics by introducing fundamental concepts and algorithms of big data analytics. It starts with introduction of methods and tools of data collection, and then followed by methods of dealing with dirty data such as inconsistent data, missing data and redundant data, on which techniques of data preparation including data cleaning, data transformation and integration are addressed. Having discussed those contents, the module will then be continued with methods for structured data and unstructured data, where techniques for structured data include data mining (in particular parallel data mining techniques) and those for unstructured data include social network analysis and text mining. A further aim of the unit is to introduce software systems used for Big Data Analytics such as Hadoop.

Below is the outline of the module.
  • Concept of big data
  • Data collection, cleaning, transformation, and integration
  • Streaming data analysis
  • Parallel data mining
  • Structured data analysis
  • Social network analysis
  • Text analysis
  • Details

    This module appears in:

    Contact hours

    Independent Study 128
    Lectures 11
    Seminars 11

    Total Hours 150

    Method of assessment

    Exam, 40%
    Assignment 1, 30%
    Assignment 2, 30%

    Preliminary reading

    Reading will be taken from a set of specified articles to be published in the module guide. These will be a mixture of academic and non-academic sources. Such reading will provide the intellectual platform for the module beyond the lecture series.

    Recommended Text Books:
    • Marz, N., & Warren, J. (2015). Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co. (ISBN: 9781617290343)
    • Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt (ISBN 978-0-544-00269-2).
    • Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media. (ISBN: 978-0-07-179053-6)

    Journal articles from scientific Journals
    • Wu, X., Zhu, X., Wu, G. Q., & Ding, W. (2014). Data mining with big data. Knowledge and Data Engineering, IEEE Transactions on, 26(1), 97-107.
    • Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5), 662-679.
    • Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business Intelligence and Analytics: From Big Data to Big Impact. MIS quarterly, 36(4), 1165-1188.

    See the library reading list for this module (Canterbury)

    See the library reading list for this module (Medway)

    Learning outcomes

    The intended subject specific learning outcomes.
    On successfully completing the module students will be able to:
    8.1 Demonstrate knowledge and comprehension of different types of data (e.g. structured vs. unstructured data; static vs. streaming data).
    8.2 Conceptualise and design different types of data analysis tasks (e.g. supervised, semi-supervised and unsupervised learning tasks).
    8.3 Demonstrate knowledge of different types of tools for data collection, data cleaning and integration, data visualisation, text mining, social network analysis and parallel data mining (e.g. R, Hadoop).
    8.4 Analyse and synthesise big data related challenges (e.g., privacy issues, data storage) and data analysis processes.

    The intended generic learning outcomes.
    On successfully completing the module students will be able to:
    9.1 Conduct data preparation, data modelling, and model evaluation;
    9.2 Conduct structured data analysis, text mining, social network analysis
    9.3 Identify, analyse, and address data analysis problems;
    9.4 Interpret the outputs of data analysis projects;

    University of Kent makes every effort to ensure that module information is accurate for the relevant academic session and to provide educational services as described. However, courses, services and other matters may be subject to change. Please read our full disclaimer.