Category: Uncategorized

1st Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning (EDML 2019)

Helmut Neukirchen, 22. November 2018

My experience with evaluating implementations of machine learning algorithms is that the content of many accepted research papers cannot be reproduced, in particular because the used implementations are not open-source and the authors typically do not even answer emails requesting to use their implementations. This is one aspect of the

1st Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning (EDML 2019)
Workshop at the SIAM International Conference on Data Mining (SDM19), May 2‑4, 2019

Description

A vital part of proposing new machine learning and data mining approaches is evaluating them empirically to allow an assessment of their capabilities. Numerous choices go into setting up such experiments: how to choose the data, how to preprocess them (or not), potential problems associated with the selection of datasets, what other techniques to compare to (if any), what metrics to evaluate, etc. and last but not least how to present and interpret the results. Learning how to make those choices on-the-job, often by copying the evaluation protocols used in the existing literature, can easily lead to the development of problematic habits. Numerous, albeit scattered, publications have called attention to those questions and have occasionally called into question published results, or the usability of published methods. At a time of intense discussions about a reproducibility crisis in natural, social, and life sciences, and conferences such as SIGMOD, KDD, and ECML/PKDD encouraging researchers to make their work as reproducible as possible, we therefore feel that it is important to bring researchers together, and discuss those issues on a fundamental level.

An issue directly related to the first choice mentioned above is the following: even the best-designed experiment carries only limited information if the underlying data are lacking. We therefore also want to discuss questions related to the availability of data, whether they are reliable, diverse, and whether they correspond to realistic and/or challenging problem settings.

Topics

In this workshop, we mainly solicit contributions that discuss those questions on a fundamental level, take stock of the state-of-the-art, offer theoretical arguments, or take well-argued positions, as well as actual evaluation papers that offer new insights, e.g. question published results, or shine the spotlight on the characteristics of existing benchmark data sets.
As such, topics include, but are not limited to

  • Benchmark datasets for data mining tasks: are they diverse/realistic/challenging?
  • Impact of data quality (redundancy, errors, noise, bias, imbalance, ...) on qualitative evaluation
  • Propagation/amplification of data quality issues on the data mining results (also interplay between data and algorithms)
  • Evaluation of unsupervised data mining (dilemma between novelty and validity)
  • Evaluation measures
  • (Automatic) data quality evaluation tools: What are the aspects one should check before starting to apply algorithms to given data?
  • Issues around runtime evaluation (algorithm vs. implementation, dependency on hardware, algorithm parameters, dataset characteristics)
  • Design guidelines for crowd-sourced evaluations

The workshop will feature a mix of invited speakers, a number of accepted presentations with ample time for questions since those contributions will be less technical, and more philosophical in nature, and a panel discussion on the current state, and the areas that most urgently need improvement, as well as recommendation to achieve those improvements. An important objective of this workshop is a document synthesizing these discussions that we intend to publish at a prominent venue.

Submission

Papers should be submitted as PDF, using the SIAM conference proceedings style, available at https://www.siam.org/Portals/0/Publications/Proceedings/soda2e_061418.zip?ver=2018-06-15-102100-887. Submissions should be limited to nine pages and submitted via Easychair at https://easychair.org/conferences/?conf=edml19.

Important dates

Submission deadline: February 15, 2019
Notification: March 15, 2019
SDM pre-registration deadline: April 2, 2019
Camera ready: April 15, 2019
Conference dates: May 2-4, 2019

Further info

Web page

11th Nordic Workshop on Multi-Core Computing (MCC2018)

Helmut Neukirchen, 19. September 2018

The objective of MCC is to bring together Nordic researchers and practitioners from academia and industry to present and discuss recent work in the area of multi-core computing. This year's edition is hosted by the Chalmers University of Technology (Gothenburg, Sweden).

The scope of the workshop is both hardware and software aspects of multi-core computing, including design and development as well as practical usage of systems. The topics of interest include, but is not limited to, the following:

Architecture of multi-core processors, GPUs, accelerators, heterogeneous systems, memory systems, interconnects and on-chip networks
Parallel programming models, languages, environments
Parallel algorithms and applications
Compiler optimizations and techniques for multi-core systems
Hardware/software design trade-offs in multi-core systems
Operating system, middleware, and run-time system support for multi-core systems
Correctness and performance analysis of parallel hardware and software
Tools and methods for development and evaluation of multi-core systems

There are two types of papers eligible for submission. The first type is original research work and the second type is work already published in 2017 or later. Participants submitting original work are asked to send an electronic version of the paper that does not exceed four pages using the ACM proceedings format, http://www.acm.org/publications/proceedings-template, to https://easychair.org/conferences/?conf=mcc2018. The same URL is to be used should you want to present an already published paper as described above. In that case, you need to clearly specify that the paper is already published and where the paper has been published.

No proceedings will be distributed. Contributions will not disqualify subsequent publications in conferences or journals. (This is a real "work"shop to facilitate discussion.)

Call for Papers (CfP).

The conference web page is https://sites.google.com/site/mccworkshop2018.

Full Paper Submission: October 8th, 2018
Author Notification: November 2nd, 2018
Registration Deadline: November 22nd, 2018
MCC Workshop: November 29th - 30th, 2018

The workshop will be held at Chalmers University of Technology, Gothenburg, Sweden.

2nd Nordic High Performance Computing & Applications Workshop, University of Iceland, Reykjavík, 13-15 June 2018

Helmut Neukirchen, 1. June 2018

Thanks to financial support from the Nordic e-Infrastructure Collaboration (NeIC) pooling competencies initiative, I am again able to organise together with my colleagues Morris Riedel (Jülich Supercomputing centre) and Matthias Book (University of Iceland) an HPC training workshop:

The University of Iceland is offering a free cross-national training workshop on high-performance computing (HPC) and applications at the University of Iceland in Reykjavík, Iceland, 13-15 June 2018 (noon-to-noon).

This training workshop is intended for novices (such as MSc or new PhD students) as well as for more advanced HPC users from Iceland and abroad. This time, there is some focus on data.

More information and registration on https://cs.hi.is/HPC/hpcworkshop2018.html

Note that there is another course on research software development (which is not specific to HPC), namely the CodeRefinery workshop in Reykjavik 21-23 August 2018. While it is also funded by NeIC (or part of NeIC in fact), the topics and trainers are different.

A general overview on the HPC activities of the University of Iceland's computer science department can be found here: https://cs.hi.is/HPC/html

Nordic High Performance Computing & Applications Workshop, University of Iceland, Reykjavík, 23-25 August 2017

Helmut Neukirchen, 29. June 2017

Thanks to financial support from the Nordic e-Infrastructure Collaboration (NeIC) pooling competencies initiative, I am organising together with my colleagues Morris Riedel (Jülich Supercomputing centre) and Matthias Book (University of Iceland) an HPC training workshop:

The University of Iceland is offering a free cross-national training workshop on high-performance computing (HPC) and applications at the University of Iceland in Reykjavík, Iceland, 23-25 August 2017 (noon-to-noon).

This training workshop is intended for novices (such as MSc or new PhD students) as well as for more advanced HPC users from Iceland and abroad.

More information and registration on https://cs.hi.is/HPC/hpcworkshop2017.html

A general overview on the HPC activities of the University of Iceland's computer science department can be found here: https://cs.hi.is/HPC/html

No teaching in autumn 2017 / Underfinancing of Icelandic universities #háskólaríhættu / How the University deals with it

Helmut Neukirchen, 28. January 2017

Update on teaching: I will teach HBV101F Software Maintenance in Spring 2018 and TÖL503M Distributed Systems will be very likely taught in fall 2017 by an external teacher.

I will not be teaching in autumn 2017, hence the course HBV101F Software Maintenance and TÖL503M/TÖL102F Distributed Systems will not be taught by me. Due to lack of sufficient financing of public universities by the Icelandic government, it is currently not possible to pay someone to teach these courses. If state financing for universities improves, this might change!

Students who would have needed to take the course HBV101F Software Maintenance (which is mandatory in the Software Engineering study line) can get an exemption and take another course instead.

Some background on public university financing: in Iceland, the state spends a little bit less than 1.3 million krona (at the current exchange rate: 10 660 EUR) per student and year (which is not only for salaries of all kind of staff, but also for infrastructure such as buildings, or infrastructure to do research) whereas the average in Iceland's Nordic neighbour states is more than 2.2 million krona (at the current exchange rate: 18 000 EUR) per student and year. As a result, I am not allowed to work overtime to teach beyond my teaching obligation as I did in the past. (Well, I could work overtime, but I will not get paid and then, the state would rely on stupid professors working for free and lower the funding even further). While typically permanent overtime is more expensive than hiring additional staff, a professor has 48% teaching obligation, 12% administration and 40% research obligation. Hence, hiring a new professor just in order to add more teaching capacity, pays not off: only 48% of this salary would go into teaching. Hence, permanent working overtime of professors to ensure teaching (not talking about research -- of course, a good university needs to do both) makes in fact economically sense and is thus often the norm. Reducing funding of universities to such an extent that the only way for the universities to safe money is reducing overtime payments, therefore leads to problems with respect to teaching offerings and teaching quality! Of course, best would be instead of working and paying overtime, to employ further professors, because these ensure both teaching and research which are both pillars of universities!

If you think the underfinancing of public universities by the Icelandic government is a shame, then you have not read how University administration deals with the current underfunding of the fiscal year 2017:

Our Faculty of Industrial Engineering, Mechanical Engineering and Computer Science (Icelandic: IVT) is part of the School of Engineering and Natural Sciences (SENS or Icelandic: VoN). For determining how the budget is distributing to the individual faculties, the University of Iceland applies a distribution model ("deililíkan", see the Icelandic description in Deililíkan Háskóla Íslands -- Skýrsla til rektors -- Tillögur um breytingar og úrbætur and MPA thesis Árangursstjórnun í háskólum á Íslandi, or for English texts, section 1.5.9. of Evaluation System for Public Higher Education Institutions Description and Self‐Review -- December 9, 2016 and section 3.4 of DOI:10.13177/irpa.a.2016.12.1.9, however the formula in the latter contains some typos) that involves an allocation formula that takes (among others) the teaching (in terms of number of students) and research activities (in terms of publications and acquired funding) into account. While this is calculated individually for each faculty, the money goes not directly to each faculty, but instead SENS receives the money for all its faculties. However, this money is not forwarded by the head of SENS to the faculties according to the distribution model! Instead we (who safe money) get less and others (who do not safe money) get more (in fact, they get our money):

While our IVT faculty is, together with a smaller one, the only faculty of SENS that manages to operate within the budget of the distribution model (we even use less because we have not as much permanent positions staffed as we should -- see above), the other faculties do not, but exceed their budget. Because our IVT faculty is so good in reducing costs (for example due to do teaching as cheaper overtime -- see above), our money is taken away and given instead to all the faculties that do not manage to stay within their budget. In fact, we are even requested to save even more (see above: no overtime payments) while the other faculties are allowed to continue spending more than the budget distributed to them according to the distribution model allows.

TL;DR: we are forced by the University administration to cut down our budget far beyond our allocation ("earned" by us due to our performance indicators used as input for the distribution model) in a way that we sacrifice our teaching offering and quality -- only to feed the other faculties that need more money than according to their allocation (they either have to improve their performance indicators, convince everyone to change the distribution model, or safe money). Due to a lack of transparency, we cannot even check whether the others at least try to safe money (e.g. while we cancel courses with less than 10 participants, we do not know whether they do this at all): we were only given by the dean of SENS their overall budget need but no motivation for their budget.

I leave it up to you to decide whether this makes SENSe or not.

P.S.: In January 2017, we were not paid any overtime: this overtime payment refers to overtime worked in 2016, i.e. the word is (we were never officially informed about the reason -- see lack of transparency above) that the dean of SENS refuses to pay work the we did back in 2016 (and even earlier in some cases) even though it was not forbidden to work overtime in these days, but overtime was rather ordered. This is a clear violation of the collective wage agreements (so also the University administration relies on stupid professors working for free). At least, I get my normal fixed salary paid -- in contrast to a part time teacher who did not get paid at all unless he threatened to go on hunger strike. Maybe we should do the same...

P.P.S.: Notably, the university mastered the severe financial crash 2008 in Iceland without the above problems. It needed a new government that in times of a flourishing economy of 2016/2017 underfinances the university. That government was elected (over the one that cleaned up the mess after the 2008 crisis and allowed indebted house owners to write off debts that were higher than 110% of their property's value) because it promised even more write-offs of housing debts (which is one of the reason why that government has no money left to finance the universities). It just came into light that those that benefited most from these write offs where those with high-income that took high loans -- in contrast to those that wisely did avoid high debts or even made no debts at all. Does this remind you of the above faculties that do not stay within their budget and thus get even more money and our faculty that wisely stays withing its budget...?

EGU session on eScience, ensemble methods and environmental changes in high latitudes

Helmut Neukirchen, 18. November 2016

The eSTICC project is holding a session on "eScience, ensemble methods and environmental changes in high latitudes" at EGU (European Geosciences Union General Assembly) 2017 Vienna, Austria, 23-28 April 2017.

Convener: Ignacio Pisso
Co-Conveners: Andreas Stohl, Michael Schulz, Torben R. Christensen, Risto Makkonen, Tuula Aalto, Helmut Neukirchen, Alberto Carrassi, Laurent Bertino.

The multiple environmental feedback processes at high latitudes involve interactions between the land, ocean, cryosphere, biosphere and atmosphere. For trustworthy computational predictions of future climate change, these interactions need to be taken into account by eScience tools. In particular, this requires: 1) Integration of existing measurement data and enhanced information flow between disciplines; 2) Representation of the current process understanding in Earth System Models (ESMs) for which computational limitations require balancing the process simplifications; and 3) Improved process understanding. eScience such as High-Performance Computing (HPC), big data or scientific workflows is central in all of these areas.
Contributions in fields related to the intersection of environmental change (such as, but not restricted to, measurements, inverse modeling, data assimilation, process parametrizations, ESMs) and eScience (such as, but not restricted to, and HPC, scientific workflows, big data, ensemble methods) are welcome.

The session welcome contributions in fields related to the intersection of environmental change (such as, but not restricted to, measurements, inverse modeling, data assimilation, process parametrizations, ESMs) and eScience (such as, but not restricted to, and HPC, scientific workflows, big data, ensemble methods).

The deadline for receipt of abstracts is 11 January 2017, 13:00 CET. You are welcome to submit abstracts via the session's web page.

Errata CDK5 book on Distributed Systems 5th edition by Coulouris, Dollimore, Kindberg and Blair

Helmut Neukirchen, 1. September 2015

While the CDK5 homepage lists some errata, I found more that are listed below (I reported them, however they did not make it into the official errata):

Table in Figure 3.23: Lower bound of 3G phone bandwidth is 0.384 Mbps, not 384 Mbps. As an update: 4G currently is up to 300 Mbps and has latencies as low as 5 ms.

The table in Figure 3.23 is at least outdated if not wrong, e.g. 10Base5 is only about the 500 m STP; the "T" standards are about twisted pair cables, hence listing coaxial cable (STP) lengths there makes no sense. 1000BaseT allows nowadays 100m twisted pair cables. The "fibre" lines refer rather ot the "F" standards not "T" as the column headings suggest. Furthermore, mono-mode fibre length for 1000BaseF has made significant advancements. Finally, 10GBase, 40GBase and 100GBase are now available.

Further errata to come...

Word cloudWord cloud

Helmut Neukirchen, 23. November 2010

Tag Cloud for Helmut NeukirchenTag Cloud for Helmut Neukirchen

Update from 2013: The above tag cloud was based on the word frequency of my CV from 2009. Since then, more topics have been added to my research area such as Cloud computing, Big data/MapReduce, or UML model-transformation.