About CSCI 8360

Overview

This course aims to provide students with real-world data science experience. Students form teams, design data science pipelines from the ground-up, and compete to achieve the highest validation accuracy on a hidden test set.


Prerequisites

Machine learning, statistics, linear algebra, and software engineering knowledge are essential. Courses that would satisfy these prerequisites include:


Grading

  • Projects: 75%
  • Final Project: 25%

There will be 4 team projects, each worth 18.75% of the total. Each project will be graded on both "theory" and "engineering" components. Teams will start with a grade of 85% (B); provided their solution adheres to a baseline standard, that is the grade they will receive. For more points, teams must go "above and beyond" the baseline.

  • The baseline theory grade will entail implementing the suggested strategy in each project handout. This will usually be a simple algorithm that, if correctly implemented, will confer a reasonable test performance.
  • The baseline engineering grade will entail i) well-designed, modular code, ii) good documentation (in the code, in a README file, and on the GitHub wiki), and iii) effective team dynamics (division of labor in a CONTRIBUTORS file, good use of git commit comments, use of GitHub issue tracker).
Examples of going "above and beyond" include, but are not limited to i) implementing a sufficiently different algorithm (cite the paper), ii) obtaining outstanding test performance, iii) using continuous integration tools, iv) designing unit tests, v) a permissive open source license (in a LICENSE file), vi) a project website, vii) using linters to adhere to style standards, viii) creating and successfully hitting milestones in the issue tracker, ix) packaging your project for distribution (e.g. through pip or conda-forge), or x) providing outstanding documentation (e.g. usage examples, install instructions, a "quickstart" guide, comparison to other similar methods, etc).

There will also be an introductory "Project 0" that aims to familiarize students with the technical infrastructure of the course; this is required but will not be graded.

There will be 1 final project. Like the "regular" projects, it will also be team-based. Unlike the other projects, however, students will be able to choose the project their team works on. The final project consists of three components:

  • Proposal: A 2-page (maximum) document, outlining the core idea of the project, expected outcomes, and how success will be be evaluated. A progress update a few weeks after the proposal submission will be considered part of the proposal.
  • Presentation: A 25-minute scientific talk to the rest of the class, describing your project, your approach, and your results.
  • Deliverables: A 6-10 page NIPS-style paper that thoroughly describes the complete project. This also includes the final code for the project.


Final Project

The final project is qualitatively similar to the other course projects, in that it is team-based and focused on some aspect of data science. However, students may form their own teams, and the time period for the final project is much longer than a course project. Therefore, with considerably more time and self-selected teams, more is expected.

There are three main deliverables of the final project (with some small caveats). They are as follows:

Proposal (due: 3/9)

This officially kicks off the final projects by teams being announced and ideas being formalized in a two-page (maximum) document. This document does not need to go into great detail, but it should outline the major ideas for the project, the expected outcomes, and how success with the conclusion of the project will be evaluated. This should include any relevant references, a note of the data / methods / tools that will be used, and the names of the team members.

There are two related items to the proposal.

  • Feedback (due: 3/16): The instructor will provide each team with feedback after going through the proposals. This feedback will enumerate any concerns or suggestions based on the content of the proposal. The teams are strongly encouraged to incorporate the suggestions (if any) into their ongoing work.
  • Updates (due: 4/6): Teams will submit update reports to the instructor, outlining the concrete points of progress they have made relative to the various objectives mentioned in their proposals. The instructor will use this as a gauge into the team dynamics, and allow the instructor one final opportunity to provide guidance to teams that may be struggling.

Presentations (4/18, 4/19, 4/24, 4/25)

Over the final class meeting periods, project teams will give 25-minute talks on their work. These presentations should follow the outline of conference talks in terms of content: motivate the problem you are addressing and provide salient background, describe your approach, detail your experiments and their results, and put them back into the context of the problem.

Teams are encouraged to be creative in how they deliver the content, however. This can include, but is certainly not limited to, live code demos or walkthroughs, web deployments, containerized solutions, interactive Jupyter slides, and others. Be creative!

Teams will sign up for presentation slots the week before. It's unfortunate how the course meetings align, but the instructor will take into consideration the fact that some teams will present a full week before the final project deadline, while others will present only a day or two--expectations in terms of the relative completion of the projects at the time of the presentation will be adjusted accordingly. However, even for teams presenting the first week, the projects should still be 80-90% completed, with only 1-2 final experiments remaining.

Deliverables (due: 4/27)

All final project deliverables are due by 11:59pm on Friday, April 27. This deadline is set in stone. Deliverables can be submitted through Slack, email, or any other timestamped direct electronic delivery method.

There are three items related to the deliverables.

  • Paper: Each team should submit a full 6-10 page NIPS-style paper that thoroughly describes the complete project. It should effectively summarize your complete work, and be in a format to submit directly to a conference or journal, should you wish to. Please do not throw this together at the last minute; the instructor will heavily penalize for grammar and syntax if it is unreadable, so ensure you have time to proofread before submitting.
  • Code: Each team should submit a link to their code repositories. Teams are encouraged to use the DSP-UGA GitHub organization to host their repositories. They are also encouraged to open-source their code before making final submissions to ensure the code is open for grading. All the usual expectations of code quality and accompanying documentation from earlier projects will be enforced.
  • Slides: Each team should submit the final versions of their presentation slides. These can be included as PDF/PPTX (if not too large) in a separate subfolder of the GitHub repository.


Materials

GitHub

All course materials will be posted here, and all project repositories should be created and maintained here (whether they are private or public is up to you, but they should be part of the course's GitHub organization account). You can access the organization account through this link:

https://github.com/dsp-uga

The specific repository for the Spring 2018 course materials is located here.

AutoLab

This is where you submit the output of your code for each project (except the final project). You can access it via the link:

https://autolab.cs.uga.edu

Remember: if you have problems accessing AutoLab, check that you're either on UGA's campus network or are connected to it via VPN. If neither of these are true, you can check out EITS' instructions for getting set up with VPN. If one of them IS true, let me know and we'll see about sorting it out.

Slack

This is where I make will make critical course announcements, so please ensure you are subscribed.

This is the primary point of interaction for asking for and offering help. I will answer questions when I can, but also I encourage everyone to help each other out, too!

I also get inundated with emails on a daily basis, so using Slack to ask questions effectively acts as a filter: I'll most likely respond to a Slack question more quickly than I would by email.

https://eds-uga-csci8360.slack.com/

If you are not in the Slack chat, contact me with your preferred email address to be invited.


Policies

Projects are due by 11:59:59pm on the noted date; after that time, AutoLab will no longer accept submissions. Furthermore, no commits to GitHub repositories after the stated time will be considered when grading.

If you run into problems with your teammates, you first need to work with them and determine a course of action that is beneficial for everyone. If you and your teammates are still unable to reach a consensus, I am happy to help. But the bottom line is: your entire team sinks or swims together. There are no individual grades on the projects, so work with each other, not against one another.

The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved, on the first page of their assignment ("I did not give or receive any help on this assignment" or "I helped [person] with [specific task]."). Collaboration without full disclosure will be handled severely; except in usual extenuating circumstances, my policy is to fail the student(s) for the entire course.

The simple version is: don't copy code or even previous solutions. Given the nature of this course and the need for ground-truth data on the verification end, chances are high you can find similar projects in the wild. Resist that urge; it will be obvious, and we'll have to have an awkward conversation that won't end well.

DO NOT COPY CODE. Don't do it.


Contact

If you need to reach me, there are multiple ways: