Data structure and algorithms for data acquisition and curation

  • Cours (CM) -
  • Cours intégrés (CI) 21h
  • Travaux dirigés (TD) 21h
  • Travaux pratiques (TP) 18h
  • Travail étudiant (TE) -

Langue de l'enseignement : Anglais

Niveau de l'enseignement : C1-Autonome - Utilisateur expérimenté

Description du contenu de l'enseignement

- Opening files in different programming languages (depending on the Master's speciality, as this is a common course) to retrieve the data they contain (reading the data while dealing with specific file formats) but also writing back some data using the same (or another) format, starting with CSV files. - File management from within the different programming languages (creating files, suppressing files, creating links, understanding the structure and exploring Linux directories). - Practicing fundamental file processing shell tools such as find, grep, sed, awk. - Writing shellscripts to launch automatic file processing on a set of files that can span several directories, using loops with shell variables. - The processing of the file will introduce data processing techniques such as moving averages, basic denoising, data dynamics (on sound, images wih the modification of brightness and contrast), filtering, applying and seeing the result of Fourier transforms. - Practical work sessions may use different programming languages depending on the specialty of the Master students: C/C++ for DSAI students, python / java for chemists or geoscientists).

Compétences à acquérir

It is not possible to be proficient in data sciences without understanding how files are structured and how they can be curated, processed and written in possibly another format. Indeed, the necessary information may come from files that have been obtained from different sources (image files, sound files, non-csv files, sound files, ...) requiring to understand and process a file format using different tools. Once the file has been opened and the data retrieved, it may be necessary to curate the data or process it (pass it to another program, generating curated output files, pipe files into several programs) and write the result in another format. The objective of this course is that by the end of the course, students are proficient at manipulating (reading, writing) formatted files, but also launching some data processing on the obtained data (using Linux scripts, for example).

At the end of this course, the student will be able to open, process and write files using different formats in different language, and automate file processing using advanced loops in shellscripts but also powerful linux processing tools such as awk. This course will be fundamental for all Master specialties and for the professional career of the student, be it in the industry or academy.


Rabih Amhaz

