052813 VU Scientific Data Management (2017S)

6.00 ECTS (4.00 SWS), SPL 5 - Informatik und Wirtschaftsinformatik

Continuous assessment of course work

Moodle

Registration/Deregistration

Note: The time of your registration within the registration period has no effect on the allocation of places (no first come, first served).

Registration is open from Mo 06.02.2017 09:00 to We 22.02.2017 23:59
Deregistration possible until Mo 20.03.2017 23:59

Details

max. 25 participants

Language: English

Lecturers

Classes (iCal) - next class is marked with N

    
    Thursday
    02.03.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    09.03.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    16.03.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    23.03.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    30.03.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    06.04.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    27.04.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    04.05.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    11.05.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    18.05.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    01.06.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    08.06.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    22.06.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG
    
    Thursday
    29.06.
    08:00 - 11:15
    Hörsaal 2, Währinger Straße 29 2.OG

Information

Aims, contents and method of the course

This course introduces central methods and approaches for the organization and analysis of large and scientific data: distributed data repositories, index and access structures, hasing and clustering techniques. In programming exercises, students learn ways to support similarity search and data mining on large data. E.g. Parallelisation with MapReduce, Apache Spark or filter-refinement techniques.

Subject specific goals:
- Analysis and interpretation of scientific data
- Evaluate results of the analysis process
- Implementation of scalable solutions for huge amounts of data
- Users support and advice

Generic goals:
- Teamwork
- Improvement of programming skills
- Understanding of interplay in Data Mining and Scientific Computing

Assessment and permitted materials

Active participation
Work on exercise-sheets
Work on programming assignments in groups
Final exam

Minimum requirements and assessment criteria

For bachelor students, the mandatory prerequisite for this class is the successful completion of ISE or PC.
- ISE: Information Management & Systems Engineering
- PC: Parallel Computing

It is recommended to complete the following courses beforehand:
- Algorithmen und Datenstrukturen
- Datenbanksysteme
- Software Engineering
- Einführung in Scientific Computing
- Netzwerktechnologien

20% Exercise-sheets
40% Programming exercises in Team
40% Final exam
Attendence is mandatory

Examination topics

Clustering:
- K-means and variants
- density-based Clustering
MapReduce
Apache Spark
Feature spaces
Indexing Hashing (LSH)
Network anlysis

Reading list

Ester M., Sander J. Knowledge Discovery in Databases: Techniken und Anwendungen.
J. Leskovec, A. Rajaraman, J. Ullman. Mining of Massive Datasets.
J. Han, M. Kamber, J.Pei.Data Mining: Concepts and Techniques.
I. H. Witten , E. Frank, M. A. Hall. Data Mining: Practical Machine Learning Tools and Techniques.

Association in the course directory

Module: SDM

Last modified: Mo 07.09.2020 15:30