pith. sign in

arxiv: 1906.12348 · v1 · pith:YIVQTF47new · submitted 2019-06-28 · 💻 cs.LG · cs.IR· stat.ML

MLFriend: Interactive Prediction Task Recommendation for Event-Driven Time-Series Data

Pith reviewed 2026-05-25 13:31 UTC · model grok-4.3

classification 💻 cs.LG cs.IRstat.ML
keywords prediction task recommendationevent-driven time-seriesautomated task discoveryinteractive machine learningtime series predictiontask generationexpert-in-the-loop systems
0
0 comments X

The pith

MLFriend generates all possible prediction tasks in a fixed space for event-driven time-series data and recommends the useful ones after interactive context learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MLFriend as a way to move beyond manual definition of what to predict on raw event-driven time-series. It first enumerates every candidate task allowed by a predefined space, then runs a short interaction with the data scientist to learn relevant context about the data before ranking and surfacing the better tasks. A sympathetic reader would care because the step of turning raw logs into labeled prediction problems still consumes expert time even after model selection is automated. The evaluation across three datasets produced 2885 tasks and marked 722 as useful by experts, while also showing the ranking can place liked tasks in the top ten of any batch of one hundred.

Core claim

MLFriend first generates all possible prediction tasks under a predefined space, then interacts with a data scientist to learn the context of the data and recommend good prediction tasks from all the tasks in the space. Evaluation on three datasets generated 2885 tasks total, of which 722 were deemed useful by expert data scientists, and showed that the system identifies the top 10 tasks a user may like within any batch of 100 tasks.

What carries the argument

The interactive recommendation step that ranks exhaustively generated tasks after learning data context from the user session.

If this is right

  • Experts can review hundreds of candidate tasks per dataset instead of inventing them from scratch.
  • Useful tasks surface early enough that only a small batch needs inspection.
  • The same generation-plus-ranking loop applies whenever a fixed task space can be written down for the data type.
  • Automation of task definition becomes measurable by counting how many generated tasks experts accept.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the interaction reliably captures context, repeated sessions on similar data could reuse prior rankings without new queries.
  • The approach could extend to streaming settings where new events continuously expand the candidate task list.
  • A multi-user version might aggregate context across several data scientists to produce consensus rankings.

Load-bearing premise

The predefined space already contains the tasks domain experts actually want and the short interaction supplies enough context to rank them correctly.

What would settle it

On a fresh dataset the top ten tasks returned by the system receive no useful ratings from the expert while lower-ranked tasks do.

Figures

Figures reproduced from arXiv: 1906.12348 by Kalyan Veeramachaneni, Lei Xu, Shubhra Kanti Karmaker Santu.

Figure 1
Figure 1. Figure 1: The procedure of using MLFriend to discover useful prediction tasks for a new dataset. MLFriend shown here as server serves proposes prediction tasks to the user/data scientist, who can give feedback and ranking. system that can formulate useful prediction tasks automati￾cally? Formulating a prediction task precisely is non-trivial. Natural language questions are vague and usually are not grounded in avail… view at source ↗
Figure 2
Figure 2. Figure 2: An example dataset and the process of generating prediction tasks and solving them. We start with a raw event-driven [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Top: Distribution of accuracies for classification tasks on three datasets. Bottom: Distribution of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Top rated prediction tasks on Chicago Bicycle, Flight Delay, and YouTube Trending dataset. O is the structural [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Interactive recommendation simulation results. We compare our model (LR) and uniform selection (PR) on three [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Most automation in machine learning focuses on model selection and hyper parameter tuning, and many overlook the challenge of automatically defining predictive tasks. We still heavily rely on human experts to define prediction tasks, and generate labels by aggregating raw data. In this paper, we tackle the challenge of defining useful prediction problems on event-driven time-series data. We introduce MLFriend to address this challenge. MLFriend first generates all possible prediction tasks under a predefined space, then interacts with a data scientist to learn the context of the data and recommend good prediction tasks from all the tasks in the space. We evaluate our system on three different datasets and generate a total of 2885 prediction tasks and solve them. Out of these 722 were deemed useful by expert data scientists. We also show that an automatic prediction task discovery system is able to identify top 10 tasks that a user may like within a batch of 100 tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper introduces MLFriend, a system that generates all possible prediction tasks for event-driven time-series data under a predefined space, interacts with a data scientist to learn data context, and recommends useful tasks. On three datasets it generates 2885 tasks and solves them, with 722 labeled useful by expert data scientists; it further claims an automatic system can identify the top 10 tasks a user may like within any batch of 100.

Significance. If the central claims hold, the work addresses a genuine gap in ML automation by moving beyond model selection to task definition. The concrete counts (2885 tasks generated, 722 judged useful) supply empirical grounding that is rare in this area, and the interactive recommendation framing is a plausible direction. However, the absence of a defined task space, interaction protocol, or evaluation baseline makes it impossible to assess whether the reported usefulness or top-10 performance reflects genuine context capture rather than surface heuristics.

major comments (3)
  1. [Abstract] Abstract: the claim that the system 'generates all possible prediction tasks under a predefined space' is load-bearing for both the 2885-task count and the usefulness judgment, yet the manuscript supplies no definition of that space (allowed operators, prediction horizons, aggregations, or feature constructions). Without this, it is impossible to verify whether the space is broad enough to contain tasks domain experts actually want.
  2. [Abstract] Abstract: the top-10-in-100 claim requires an evaluation protocol (user-study design, preference metric, baseline comparator, and how user feedback is modeled), none of which is described. The reported expert judgments likewise lack stated criteria, inter-rater agreement, or comparison against a non-interactive baseline, rendering the 722/2885 usefulness ratio non-interpretable.
  3. [Abstract] Abstract: the interaction mechanism that 'learns the context of the data' is central to the recommendation claim, yet no protocol (questions asked, feedback model, or learning algorithm) is provided. This omission directly undermines the assertion that the system captures sufficient context to rank tasks meaningfully.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We agree that several key elements were insufficiently described, which hinders assessment of the claims. We will revise the manuscript to address these issues by providing the missing definitions and protocols. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the system 'generates all possible prediction tasks under a predefined space' is load-bearing for both the 2885-task count and the usefulness judgment, yet the manuscript supplies no definition of that space (allowed operators, prediction horizons, aggregations, or feature constructions). Without this, it is impossible to verify whether the space is broad enough to contain tasks domain experts actually want.

    Authors: We acknowledge this omission in the current version of the manuscript. In the revised version, we will include a formal definition of the predefined task space, specifying the allowed operators, prediction horizons, aggregations, and feature constructions. This will enable readers to assess the breadth and relevance of the generated tasks. revision: yes

  2. Referee: [Abstract] Abstract: the top-10-in-100 claim requires an evaluation protocol (user-study design, preference metric, baseline comparator, and how user feedback is modeled), none of which is described. The reported expert judgments likewise lack stated criteria, inter-rater agreement, or comparison against a non-interactive baseline, rendering the 722/2885 usefulness ratio non-interpretable.

    Authors: We agree that the evaluation details are missing. The revised manuscript will describe the user-study design, the preference metric used, any baseline comparators, how user feedback is modeled, the criteria for deeming tasks useful, inter-rater agreement statistics, and comparisons to non-interactive baselines to make the 722/2885 ratio interpretable. revision: yes

  3. Referee: [Abstract] Abstract: the interaction mechanism that 'learns the context of the data' is central to the recommendation claim, yet no protocol (questions asked, feedback model, or learning algorithm) is provided. This omission directly undermines the assertion that the system captures sufficient context to rank tasks meaningfully.

    Authors: We recognize the need for a detailed description of the interaction protocol. In the revision, we will specify the questions asked during interaction, the feedback model employed, and the learning algorithm used to capture data context and rank tasks. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained with external validation

full rationale

The paper presents a system that enumerates tasks from a fixed space and ranks them via interaction, with usefulness judged by external expert data scientists on 2885 generated tasks. No equations, fitted parameters, or derivations are described that reduce to their own inputs by construction. Evaluation metrics rely on independent human judgments rather than self-referential scores, and no self-citation chains or uniqueness theorems are invoked as load-bearing premises. The approach is therefore not circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on two unstated modeling choices: the construction of the initial task space and the assumption that expert binary usefulness labels are stable and representative of downstream value.

free parameters (1)
  • predefined task space definition
    The space of all possible prediction tasks is described as predefined but its construction rules, parameters, or coverage are not specified.
axioms (1)
  • domain assumption Expert data scientists provide consistent and meaningful judgments of task usefulness
    The 722 useful tasks and the top-10-in-100 result rest entirely on these judgments.

pith-pipeline@v0.9.0 · 5695 in / 1239 out tokens · 32725 ms · 2026-05-25T13:31:59.539994+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 7 internal anchors

  1. [1]

    2017] Baker, B.; Gupta, O.; Raskar, R.; and Naik, N

    [Baker et al. 2017] Baker, B.; Gupta, O.; Raskar, R.; and Naik, N

  2. [2]

    Accelerating Neural Architecture Search using Performance Prediction

    Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823. [Bengio 2012] Bengio, Y

  3. [3]

    In Neural networks: Tricks of the trade

    Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer. 437–478. [Bergstra and Bengio 2012] Bergstra, J., and Bengio, Y

  4. [4]

    Journal of Machine Learning Research 13(Feb):281–305

    Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb):281–305. [Bergstra et al. 2011] Bergstra, J. S.; Bardenet, R.; Bengio, Y .; and K´egl, B

  5. [5]

    In Advances in neural information processing systems, 2546–2554

    Algorithms for hyper-parameter op- timization. In Advances in neural information processing systems, 2546–2554. [Bergstra, Yamins, and Cox 2013] Bergstra, J.; Yamins, D.; and Cox, D. D

  6. [6]

    [Feurer et al

    Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. [Feurer et al. 2015] Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; and Hutter, F

  7. [7]

    In Advances in Neural Information Processing Systems, 2962–2970

    Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems, 2962–2970. [Hutter, Hoos, and Leyton-Brown 2011] Hutter, F.; Hoos, H. H.; and Leyton-Brown, K

  8. [8]

    InInterna- tional Conference on Learning and Intelligent Optimization, 507–523

    Sequential model-based optimization for general algorithm configuration. InInterna- tional Conference on Learning and Intelligent Optimization, 507–523. Springer. [Jamieson and Nowak 2011] Jamieson, K. G., and Nowak, R

  9. [9]

    M., and Veeramachaneni, K

    [Kanter and Veeramachaneni 2015] Kanter, J. M., and Veeramachaneni, K

  10. [10]

    IEEE International Conference on, 1–10. IEEE. [Katz, Shin, and Song 2016] Katz, G.; Shin, E. C. R.; and Song, D

  11. [11]

    In Data Mining (ICDM), 2016 IEEE 16th In- ternational Conference on, 979–984

    Explorekit: Automatic feature generation and selection. In Data Mining (ICDM), 2016 IEEE 16th In- ternational Conference on, 979–984. IEEE. [Kaul, Maheshwary, and Pudi 2017] Kaul, A.; Maheshwary, S.; and Pudi, V

  12. [12]

    In Data Mining (ICDM), 2017 IEEE International Conference on, 217–226

    Autolearnautomated feature gener- ation and selection. In Data Mining (ICDM), 2017 IEEE International Conference on, 217–226. IEEE. [Khurana, Samulowitz, and Turaga 2017] Khurana, U.; Samulowitz, H.; and Turaga, D

  13. [13]

    Feature Engineering for Predictive Modeling using Reinforcement Learning

    Feature engineering for predictive modeling using reinforcement learning. arXiv preprint arXiv:1709.07150. [Liu et al. 2017a] Liu, C.; Zoph, B.; Shlens, J.; Hua, W.; Li, L.-J.; Fei-Fei, L.; Yuille, A.; Huang, J.; and Murphy, K. 2017a. Progressive neural architecture search. arXiv preprint arXiv:1712.00559. [Liu et al. 2017b] Liu, H.; Simonyan, K.; Vinyals...

  14. [14]

    In International Conference on Machine Learning , 2113–2122

    Gradient-based hyperparameter optimization through reversible learn- ing. In International Conference on Machine Learning , 2113–2122. [Mountantonakis and Tzitzikas 2017] Mountantonakis, M., and Tzitzikas, Y

  15. [15]

    In International Conference on Theory and Practice of Digital Libraries, 155–168

    How linked data can aid machine learning-based tasks. In International Conference on Theory and Practice of Digital Libraries, 155–168. Springer. [Pham et al. 2018] Pham, H.; Guan, M. Y .; Zoph, B.; Le, Q. V .; and Dean, J

  16. [16]

    Efficient Neural Architecture Search via Parameter Sharing

    Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268. [Real et al. 2018] Real, E.; Aggarwal, A.; Huang, Y .; and Le, Q. V

  17. [17]

    Regularized Evolution for Image Classifier Architecture Search

    Regularized evolution for image classifier ar- chitecture search. arXiv preprint arXiv:1802.01548. [Snoek, Larochelle, and Adams 2012] Snoek, J.; Larochelle, H.; and Adams, R. P

  18. [18]

    In Advances in neural information processing systems, 2951–2959

    Practical bayesian optimiza- tion of machine learning algorithms. In Advances in neural information processing systems, 2951–2959. [Swearingen et al. ] Swearingen, T.; Drevo, W.; Cyphers, B.; Cuesta-Infante, A.; Ross, A.; and Veeramachaneni, K. Atm: A distributed, collaborative, scalable system for automated machine learning. [Swersky, Snoek, and Adams 20...

  19. [19]

    In Advances in neural information processing systems , 2004–

    Multi-task bayesian optimization. In Advances in neural information processing systems , 2004–

  20. [20]

    2013] Thornton, C.; Hutter, F.; Hoos, H

    [Thornton et al. 2013] Thornton, C.; Hutter, F.; Hoos, H. H.; and Leyton-Brown, K

  21. [21]

    In Proceedings of the 19th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining, 847–855

    Auto-weka: Combined selec- tion and hyperparameter optimization of classification algo- rithms. In Proceedings of the 19th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining, 847–855. ACM. [van den Bosch 2017] van den Bosch, S

  22. [22]

    Master’s thesis, Faculty of Science, Radboud Univer- sity 3(1):3–1

    Automatic feature generation and selection in predictive analytics solu- tions. Master’s thesis, Faculty of Science, Radboud Univer- sity 3(1):3–1. [Wauthier, Jordan, and Jojic 2013] Wauthier, F.; Jordan, M.; and Jojic, N

  23. [23]

    In International Conference on Machine Learning , 109–117

    Efficient ranking from pairwise compar- isons. In International Conference on Machine Learning , 109–117. [Zhong, Xiong, and Socher 2017] Zhong, V .; Xiong, C.; and Socher, R

  24. [24]

    Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

    Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103. [Zoph and Le 2016] Zoph, B., and Le, Q. V

  25. [25]

    Neural Architecture Search with Reinforcement Learning

    Neu- ral architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. [Zoph et al. 2017] Zoph, B.; Vasudevan, V .; Shlens, J.; and Le, Q. V

  26. [26]

    Learning Transferable Architectures for Scalable Image Recognition

    Learning transferable architectures for scal- able image recognition. arXiv preprint arXiv:1707.07012 2(6). ChicagoBicycle O: <from_station_id> [all_fil(None), count_agg(None)] G: For each <from_station_id> predict the number of records H: Predict how many trips will start from this station tomorrow. O: <from_station_id> [eq_fil(<usertype>), count_agg(None)...