About CSCI 8360

Overview

This course aims to provide students with real-world data science experience. Students form teams, design data science pipelines from the ground-up, and compete to achieve the highest validation accuracy on a hidden test set.


Prerequisites

Machine learning, statistics, linear algebra, and software engineering knowledge are essential. Courses that would satisfy these prerequisites include:


Grading

  • Projects: 60%
  • Final: 40%

There will be 3 team projects, each worth 20% of the total. Each project will be graded on both "theory" and "engineering" components. Teams will start with a grade of 85% (B); provided their solution adheres to a baseline standard, that is the grade they will receive. For more points, teams must go "above and beyond" the baseline.

  • The baseline theory grade will entail implementing the suggested strategy in each project handout. This will usually be a simple algorithm that, if correctly implemented, will confer a reasonable test performance.
  • The baseline engineering grade will entail i) well-designed, modular code, ii) good documentation (in the code, in a README file, and on the GitHub wiki), and iii) effective team dynamics (division of labor in a CONTRIBUTORS file, good use of git commit comments, use of GitHub issue tracker).
Examples of going "above and beyond" include, but are not limited to i) implementing a sufficiently different algorithm (cite the paper), ii) obtaining outstanding test performance, iii) using continuous integration tools, iv) designing unit tests, v) a permissive open source license (in a LICENSE file), vi) a project website, vii) using linters to adhere to style standards, viii) creating and successfully hitting milestones in the issue tracker, ix) packaging your project for distribution (e.g. through pip or conda-forge), or x) providing outstanding documentation (e.g. usage examples, install instructions, a "quickstart" guide, comparison to other similar methods, etc).

There will also be an introductory "Project 0" that aims to familiarize students with the technical infrastructure of the course; this is required but will not be graded.

There will be 1 final project. Like the "regular" projects, it will also be team-based. Unlike the other projects, however, students will be able to choose the project their team works on. The final project consists of three components:

  • Proposal: A 2-page (maximum) document, outlining the core idea of the project, expected outcomes, and how success will be be evaluated. A progress update a few weeks after the proposal submission will be considered part of the proposal.
  • Presentation: A 25-minute scientific talk to the rest of the class, describing your project, your approach, and your results.
  • Deliverables: A 6-10 page NIPS-style paper that thoroughly describes the complete project. This also includes the final code for the project.


Materials

GitHub

All course materials will be posted here, and all project repositories should be created and maintained here (whether they are private or public is up to you, but they should be part of the course's GitHub organization account). You can access the organization account through this link:

https://github.com/dsp-uga

The specific repository for the Spring 2018 course materials is located here.

AutoLab

This is where you submit the output of your code for each project (except the final project). You can access it via the link:

https://autolab.cs.uga.edu

Remember: if you have problems accessing AutoLab, check that you're either on UGA's campus network or are connected to it via VPN. If neither of these are true, you can check out EITS' instructions for getting set up with VPN. If one of them IS true, let me know and we'll see about sorting it out.

Slack

This is where I make will make critical course announcements, so please ensure you are subscribed.

This is the primary point of interaction for asking for and offering help. I will answer questions when I can, but also I encourage everyone to help each other out, too!

I also get inundated with emails on a daily basis, so using Slack to ask questions effectively acts as a filter: I'll most likely respond to a Slack question more quickly than I would by email.

https://eds-uga-csci8360.slack.com/

If you are not in the Slack chat, contact me with your preferred email address to be invited.


Policies

Projects are due by 11:59:59pm on the noted date; after that time, AutoLab will no longer accept submissions. Furthermore, no commits to GitHub repositories after the stated time will be considered when grading.

If you run into problems with your teammates, you first need to work with them and determine a course of action that is beneficial for everyone. If you and your teammates are still unable to reach a consensus, I am happy to help. But the bottom line is: your entire team sinks or swims together. There are no individual grades on the projects, so work with each other, not against one another.

The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved, on the first page of their assignment ("I did not give or receive any help on this assignment" or "I helped [person] with [specific task]."). Collaboration without full disclosure will be handled severely; except in usual extenuating circumstances, my policy is to fail the student(s) for the entire course.

The simple version is: don't copy code or even previous solutions. Given the nature of this course and the need for ground-truth data on the verification end, chances are high you can find similar projects in the wild. Resist that urge; it will be obvious, and we'll have to have an awkward conversation that won't end well.

DO NOT COPY CODE. Don't do it.


Contact

If you need to reach me, there are multiple ways: