JCU Logo

JOHN CABOT UNIVERSITY

COURSE CODE: "CS 212"
COURSE NAME: "Introduction to Data Science"
SEMESTER & YEAR: Summer Session I 2024
SYLLABUS

INSTRUCTOR: Carlos Theran Yohn Parra Bautista
EMAIL: [email protected]
HOURS: MTWTH 9:00 AM 10:50 AM
TOTAL NO. OF CONTACT HOURS: 45
CREDITS: 3
PREREQUISITES: Prerequisites: CS 160, MA 100/101
OFFICE HOURS:

COURSE DESCRIPTION:
This course introduces students to the main concepts of data science. It combines statistical, ethics, computational learning theory, pattern recognition, and containerization to create and implement Machine Learning and Deep Learning models for classification and prediction. Such models may have a significant impact on society, as they can be used to automate procedures and extract relevant information from large amounts of data. Students will learn how to detect and correct implicit/explicit bias often found in A.I. and Machine Learning algorithms by assessing the quality and objectivity of training data. This is important to determining validity /veracity of information (such as found in social media) and in threat analysis (as in cybersecurity). The course includes a critique of the inherent biases of data science itself and their societal implications. The course uses project-based learning: students will be guided through the process of formulating and carrying out data science methodology with real-world data, with a focus on open, pre-existing secondary data. Topics covered include descriptive statistics, elementary probability theory, basics of linear algebra, ethics in emerging technology, nonparametric decision-making such as Euclidean distance, nearest neighbor, support vector machine, decision tree, and supervised and unsupervised learning techniques such as neural networks, kernel machines, convolutional networks.
SUMMARY OF COURSE CONTENT:

Tentative Schedule

Week 1

    Module 1 Intro to Data Science and tools

·       Introduction to Python and its main packages for data science.

·       Git & GitHub Intro

·       Quiz #1, Homework #1

    Expected Outcome: This course will provide the main concept of programming and python packages used for data science.

 

Week 2

        Module 2 Intro to Machine Learning (ML) with Python

·       Exploratory Data Analysis, Data Visualizations

·       Data Cleaning, Implementing Linear Regression

·       Quiz #2, Homework #2,    

Expected Outcome: This course will provide a set of techniques and skills that help develop a classification system for various data set. Evaluate the performance of the classifiers system.

    

 

Week 3

    Module 3 Intro to Machine Learning (ML) with Python

·       Implementing KNN, Logistic Regression

·       Implementing SVM

·       Standardization and Normalization

·       K-fold cross validation

·       Quiz #3, Homework #3, Midterm Exam

    Expected Outcome: This course will provide a set of techniques and skills that help develop a classification system for various data set. Evaluate the performance of the classifiers system.

 

Week 4

    Module 4 Intro to Deep Learning (DL) with Python 

·       Neural Network

·       Computer Vision and Convolutional Neural Networks

·       Natural Language Processing and Long Short-Term Memory

·       Quiz #4, Homework #4

    Expected Outcome: This course will provide a set of leading DL techniques to face a societal problem. Evaluate the performance of the classifiers system.

 

 Week 5

    Module 5 AI Ethics and Final Research Projects

·       Introduction to AI Ethics

·       Bias in the ML Life Cycle

·       Mitigating Bias in ML

·       Students hands-on project

·       Quiz #5, Final, Oral presentation, project deadline.

      Expected Outcome: This course will explain how data science is a field that can change and contribute to the development of new knowledge in AI.


Please include exams, quizzes, presentations, in other words, all graded items in the schedule as well.

LEARNING OUTCOMES:

LEARNING OUTCOMES

During the course: 

1.     Students completing this course will be able to perform literature review of data science papers by giving a presentation.

2.     Students completing this course will be able to deconstruct different statistical methods by finding decision boundaries.

3.     Students completing this course will be able to list some data representation and transformation techniques to explain results by giving a presentation.

4.     Students completing this course will be able to integrate perspectives from computational data by training an algorithm with different learning rules.

5.     Students completing this course will be able to train a neural network model to solve a classification problem by explaining different scenarios.

6.     Students completing this course will be able to interpret performance metrics of classifier system by explaining in different scenarios.

7.     Students completing this course will be able to create and execute docker-file to build containers and images by doing a given example.

8.     Students completing this course will be able to create a virtual environment for reproducibility and replicability by doing a given example.


Since these are "outcomes," it would be best if they were introduced with "Students completing this course will be able to" then begin each item in this list with the verbs "perform literature review," etc. 

TEXTBOOK:
NONE
REQUIRED RESERVED READING:
NONE

RECOMMENDED RESERVED READING:
NONE
GRADING POLICY
-ASSESSMENT METHODS:
AssignmentGuidelinesWeight
10 Assignments based on reporting coding debugging and interpretation of results. 25%
5 Quizzes: Quizzes are key concepts of material covered during the week  10%
1 Project: Submit project proposal. This project will be the draft where student will choose from available datasets and real-world challenge.  10%
1 Midterm Exam: Basic concepts of data science as a field.  25%
1 Final Exam: Oral Presentation, written report, and code of the selected research project.  25%
Attendance  5%

-ASSESSMENT CRITERIA:
AWork of this quality directly addresses the question or problem raised and provides a coherent argument displaying an extensive knowledge of relevant information or content. This type of work demonstrates the ability to critically evaluate concepts and theory and has an element of novelty and originality. There is clear evidence of a significant amount of reading beyond that required for the course.
BThis is highly competent level of performance and directly addresses the question or problem raised.There is a demonstration of some ability to critically evaluatetheory and concepts and relate them to practice. Discussions reflect the student’s own arguments and are not simply a repetition of standard lecture andreference material. The work does not suffer from any major errors or omissions and provides evidence of reading beyond the required assignments.
CThis is an acceptable level of performance and provides answers that are clear but limited, reflecting the information offered in the lectures and reference readings.
DThis level of performances demonstrates that the student lacks a coherent grasp of the material.Important information is omitted and irrelevant points included.In effect, the student has barely done enough to persuade the instructor that s/he should not fail.
FThis work fails to show any knowledge or understanding of the issues raised in the question. Most of the material in the answer is irrelevant.

-ATTENDANCE REQUIREMENTS:
ATTENDANCE REQUIREMENTS AND EXAMINATION POLICY
You cannot make-up a major exam (midterm or final) without the permission of the Dean’s Office. The Dean’s Office will grant such permission only when the absence was caused by a serious impediment, such as a documented illness, hospitalization or death in the immediate family (in which you must attend the funeral) or other situations of similar gravity. Absences due to other meaningful conflicts, such as job interviews, family celebrations, travel difficulties, student misunderstandings or personal convenience, will not be excused. Students who will be absent from a major exam must notify the Dean’s Office prior to that exam. Absences from class due to the observance of a religious holiday will normally be excused. Individual students who will have to miss class to observe a religious holiday should notify the instructor by the end of the Add/Drop period to make prior arrangements for making up any work that will be missed. The final exam period runs until ____________
ACADEMIC HONESTY
As stated in the university catalog, any student who commits an act of academic dishonesty will receive a failing grade on the work in which the dishonesty occurred. In addition, acts of academic dishonesty, irrespective of the weight of the assignment, may result in the student receiving a failing grade in the course. Instances of academic dishonesty will be reported to the Dean of Academic Affairs. A student who is reported twice for academic dishonesty is subject to summary dismissal from the University. In such a case, the Academic Council will then make a recommendation to the President, who will make the final decision.
STUDENTS WITH LEARNING OR OTHER DISABILITIES
John Cabot University does not discriminate on the basis of disability or handicap. Students with approved accommodations must inform their professors at the beginning of the term. Please see the website for the complete policy.

SCHEDULE

Week 1

    Module 1 Intro to Data Science and tools

·       Introduction to Python and its main packages for data science.

·       Git & GitHub Intro

·       Quiz #1, Homework #1

    Expected Outcome: This course will provide the main concept of programming and python packages used for data science.

 

Week 2

        Module 2 Intro to Machine Learning (ML) with Python

·       Exploratory Data Analysis, Data Visualizations

·       Data Cleaning, Implementing Linear Regression

·       Quiz #2, Homework #2,    

Expected Outcome: This course will provide a set of techniques and skills that help develop a classification system for various data set. Evaluate the performance of the classifiers system.

    

 

Week 3

    Module 3 Intro to Machine Learning (ML) with Python

·       Implementing KNN, Logistic Regression

·       Implementing SVM

·       Standardization and Normalization

·       K-fold cross validation

·       Quiz #3, Homework #3, Midterm Exam

    Expected Outcome: This course will provide a set of techniques and skills that help develop a classification system for various data set. Evaluate the performance of the classifiers system.

 

Week 4

    Module 4 Intro to Deep Learning (DL) with Python 

·       Neural Network

·       Computer Vision and Convolutional Neural Networks

·       Natural Language Processing and Long Short-Term Memory

·       Quiz #4, Homework #4

    Expected Outcome: This course will provide a set of leading DL techniques to face a societal problem. Evaluate the performance of the classifiers system.

 

 Week 5

    Module 5 AI Ethics and Final Research Projects

·       Introduction to AI Ethics

·       Bias in the ML Life Cycle

·       Mitigating Bias in ML

·       Students hands-on project

·       Quiz #5, Final, Oral presentation, project deadline.

      Expected Outcome: This course will explain how data science is a field that can change and contribute to the development of new knowledge in AI.