Novel methods for mining and learning from data streams / vorgelegt von Ammar Shaker. Paderborn, 2017
Inhalt
- Contents
- List of Figures
- List of Tables
- 1 Introduction
- 1.1 Application Example
- 1.2 Learning from Data Streams
- 1.3 Incremental, Adaptive and Evolving Learning
- 1.4 Contribution and Outline of the Thesis
- 2 Background
- 2.1 Machine Learning
- 2.2 Supervised Learning from Data Streams
- 2.3 Concept Change over Time
- 2.4 Change Detection Methods
- 2.5 Adaptive Supervised Learning: Related Work
- 3 Instance-Based Classification and Regression
- 3.1 Instance-Based Learning
- 3.2 Instance-Based versus Model-Based Learning
- 3.3 Instance-Based Learning on Data Streams
- 3.4 IBLStreams
- 3.4.1 Classification
- 3.4.2 Regression
- 3.4.3 Parameter adaptation in IBLStreams
- 3.4.4 Implementation issues
- 3.5 Experiments
- 3.5.1 IBLStreams versus other instance-based methods
- 3.5.2 Evaluating the parameter adaptation schemes
- 3.5.3 IBLStreams versus state-of-the-art model-based methods
- 3.6 Discussion and Conclusion
- 4 Evolving Fuzzy Pattern Trees
- 4.1 Introduction to Fuzzy Sets
- 4.2 Data-Driven Fuzzy Modeling
- 4.3 Fuzzy Pattern Trees
- 4.4 Evolving Fuzzy Pattern Trees
- 4.4.1 Performance Monitoring and Hypothesis Testing
- 4.4.2 Summary of the Algorithm
- 4.4.3 Refinements on the Neighbor Trees Generation
- 4.5 Empirical Evaluation
- 4.5.1 Performance Comparison
- 4.5.2 Model Size
- 4.5.3 Sensitivity Towards Significance Levels and Operators Retraining
- 4.6 Summary and Conclusion
- 5 Survival Analysis on Event Streams
- 5.1 Introduction
- 5.2 Survival Analysis
- 5.2.1 Censored data
- 5.2.2 Survival Functions
- 5.2.3 Estimating the Survival Function
- 5.2.4 Prognostic Factors for Survival
- 5.3 Survival Analysis on Data Streams
- 5.4 Case Study: Earthquake Analysis
- 5.5 Case Study: Twitter Data
- 5.6 Conclusion
- 6 Recovery Analysis for Adaptive Learning
- 6.1 Introduction
- 6.2 Learning under concept drift
- 6.3 Recovery Analysis
- 6.3.1 Main idea and experimental protocol
- 6.3.2 Bounding the optimal generalization performance
- 6.3.3 Recovery measures
- 6.3.4 Defining pure streams
- 6.3.5 Further practical issues
- 6.4 A comparison of algorithms
- 6.5 Experiments and results
- 6.5.1 Binary classification
- 6.5.2 Multiclass classification
- 6.5.3 Regression
- 6.5.4 Recovery measures
- 6.5.5 Summary of the experiments
- 6.6 Conclusion
- 7 Conclusion
- A Methods
- A.1 Adaptive Hoeffding Tree
- A.2 Adaptive Model Rules
- A.3 Fast Incremental Model Trees with Drift Detection
- A.4 FLEXible Fuzzy Inference Systems
- B MOA
- C M-Tree
- D Data Sets
- D.1 Synthetic Data Sets
- D.1.1 Hyperplane data
- D.1.2 Distance to hyperplane data
- D.1.3 Random trees data
- D.1.4 Radial basis function data
- D.1.5 SEA concept functions
- D.1.6 STAGGER concept functions
- D.2 Synthetic Data Manipulation
- D.3 Real Data Sets
- D.3.1 Cover type data
- D.3.2 Mushroom data
- D.3.3 Page blocks data
- D.3.4 Letter recognition
- D.3.5 StatLog (shuttle) data
- D.3.6 Skin segmentation data
- D.3.7 MAGIC gamma telescope data
- D.3.8 Breast cancer Wisconsin
- D.3.9 Parkinson's telemonitoring data
- D.3.10 Slice localization data
- D.3.11 Bank32h
- D.3.12 Census-house
- D.4 Event Streams
- E Incremental Statistics
- Bibliography
