JCU Logo

JOHN CABOT UNIVERSITY

COURSE CODE: "EXP 1020"
COURSE NAME: "Introduction to Text Mining"
SEMESTER & YEAR: Spring 2020
SYLLABUS

INSTRUCTOR: Sathya Mellina
EMAIL: [email protected]
HOURS: F 2:00-6:00 PM [Course meets on: February 21, February 28, March 27, April 17]
TOTAL NO. OF CONTACT HOURS: 15
CREDITS: 1
PREREQUISITES:
OFFICE HOURS:

COURSE DESCRIPTION:
Grading: This course will be graded on a PASS/FAIL scale
This course introduces students to the basic elements of text mining that is used in various disciplines to do content analysis by exploring and analysing large amounts of unstructured text and turning it into quantitative indicators and actionable information. The focus will be on basic applications of text mining based on different automated computational tools and statistical techniques that prepare and handle any document to a form in which the text can be mined. Some standard functions of RStudio and Microsoft Excel will be covered.
SUMMARY OF COURSE CONTENT:
This course will cover essential techniques for mining and analysing unstructured text data by assessing the relationships and patterns of the items that compose the original text data. Students will be introduced to quantitative analysis and text processing procedures to prepare the documents to text mining applications. The course will then provide basic methods for information extraction based on word frequencies and supervised techniques such as “Dictionary methods” for conducting sentiment analysis on large corpus of documents.
LEARNING OUTCOMES:

The purpose of this course is to provide students an introduction to basic quantitative methods for mining and analysing unstructured text data. After completing the course, students will be able to:

·       understand different essential concepts and aspects of text mining

·       program with RStudio different basic methods for information extraction and text mining methods on a large volume of documents

·       extrapolate quantitative and reliable indicators capturing different dimensions of text data.

TEXTBOOK:
Book TitleAuthorPublisherISBN numberLibrary Call NumberCommentsFormatLocal BookstoreOnline Purchase
Text Mining with RJulia Silge, David RobinO'Reilly Media, Inc.9781491981641     
REQUIRED RESERVED READING:
NONE

RECOMMENDED RESERVED READING:
NONE
GRADING POLICY
-ASSESSMENT METHODS:
AssignmentGuidelinesWeight
Attendance and participationAttendance of and participation in all four classes.28%
ProjectA comprehensive take home project of text analytics administered at the end of the course. Students will need to provide the instructor their code and the original data to replicate their results.72%

-ASSESSMENT CRITERIA:
AWork of this quality directly addresses the question or problem raised and provides a coherent argument displaying an extensive knowledge of relevant information or content. This type of work demonstrates the ability to critically evaluate concepts and theory and has an element of novelty and originality. There is clear evidence of a significant amount of reading beyond that required for the course.
BThis is highly competent level of performance and directly addresses the question or problem raised.There is a demonstration of some ability to critically evaluatetheory and concepts and relate them to practice. Discussions reflect the student’s own arguments and are not simply a repetition of standard lecture andreference material. The work does not suffer from any major errors or omissions and provides evidence of reading beyond the required assignments.
CThis is an acceptable level of performance and provides answers that are clear but limited, reflecting the information offered in the lectures and reference readings.
DThis level of performances demonstrates that the student lacks a coherent grasp of the material.Important information is omitted and irrelevant points included.In effect, the student has barely done enough to persuade the instructor that s/he should not fail.
FThis work fails to show any knowledge or understanding of the issues raised in the question. Most of the material in the answer is irrelevant.

-ATTENDANCE REQUIREMENTS:

As this course provides only four classes, attendance is mandatory and unexcused absences will not be tolerated. Students who miss more than one class (excused formally from the Associate Dean of Academics or unexcused) will fail. Participation in class discussion is strongly encouraged.

ACADEMIC HONESTY
As stated in the university catalog, any student who commits an act of academic dishonesty will receive a failing grade on the work in which the dishonesty occurred. In addition, acts of academic dishonesty, irrespective of the weight of the assignment, may result in the student receiving a failing grade in the course. Instances of academic dishonesty will be reported to the Dean of Academic Affairs. A student who is reported twice for academic dishonesty is subject to summary dismissal from the University. In such a case, the Academic Council will then make a recommendation to the President, who will make the final decision.
STUDENTS WITH LEARNING OR OTHER DISABILITIES
John Cabot University does not discriminate on the basis of disability or handicap. Students with approved accommodations must inform their professors at the beginning of the term. Please see the website for the complete policy.

SCHEDULE

Weeks Topics

Weeks Topics

Week 1

Introduction to Text Mining and RStudio

Week 2

Data Cleaning and Preparation

Week 3

Basic Text Analysis

Week 4

Sentiment Analysis