Machine learning is a fast changing field. The list of ideas is practically endless: Decision trees, ensemble learning, random forests, boosting, neural networks, hidden Markov models, graphical models, kernel methods, conditional random fields, sparsity, compressed sensing, budgeted learning, multi-kernel learning, transfer learning, co-training, active learning, multitask learning, deep learning, lifelong learning and many more.
The problem is, ideas come and go, and bounce back, roughly every 10-15 years. Long enough for a grad student learns the tricks, makes a big noise, graduates when it is still hot and gets a good academic job, IF he is lucky to start early in the cycle. Also long enough so that the new batch of students are not aware of the hot things of the previous wave. How familiar is "particle filtering" to you?
Popular in the late 1990s and early 2000s, particle filtering is a fast way to generate samples of state for a dynamic system when an observation is made.When I started my grad training in 2004, I asked one of my supervisors on what hot topic I should focus on. He said, pick either graphical models or kernel methods (which meant SVM at the time). I picked graphical models, and then was given conditional random fields (CRFs) to work on. By the time I submitted my PhD thesis in early 2008, CRFs were largely gone. SVMs were gone a couple of years before that, just around the time neural nets bounced back under a new umbrella, deep learning, in 2006. It used to be all about convex loss functions (SVMs & CRFs), now everything is non-convex. Local minima? Doesn't matter, adaptive stochastic gradient descents such as Adagrad, Adam or RMSprop will find a really good one for you.
Applying machine learning is like flying commercial aircraft
Ever wanted to apply a technique to your problem? A sure way is to employ a PhD in machine learning! Packages available, but what are the correct ways to use, let alone the best way? Think about flying commercial aircrafts. There are hundreds of knobs to tune. There are even autopilot mode. We just need to have two human pilots: one to tune the right knob at the right time, and the other making sure that the correct things are being done.
Wanna use deep learning? You need to decide between: feedforward, recurrent, convolutional nets and any combination of these three. Will attention be used? How about memory? Which loss function? Embedding size? Optimizers and their parameters? and many many more.
I work with clinicians on clinical problems. At least two of them -- young, smart and highly motivated -- insisted to come over and observe how I do machine learning and learn to do it themselves. They claimed they could do Statra, R and sometimes Python. My boss was crossed. This is not how collaboration should work, right? You want to learn our art for free, then trash us?
But I told the boss, let them come.
They came and left, even more puzzled. I ended up doing the job I usually did and so did they.
Machine learning in three lines
I once delivered an internal talk on deep learning. My boss requested that I talked only about three things. Just three things. This bugged me a lot. But the trick actually worked.
Here I am trying to characterize the current state of machine learning in general and it should apply to deep learning. Machine learning works by:
- Having good priors of the problem at hand (80-90% gain)
- Accounting for data uncertainty and model uncertainty with ensemble and Bayesian methods. (1-5% gain)
- Reusing models when data/model redundancies are available (5-10% gain)
By "good priors", I meant several things:
- Features that capture all meaningful signals from data. Getting good features are the job of feature engineering, which usually accounts for 80-90% of total effort in a machine learning project. Once you have good features, most modern classifiers will work just fine. Deep learning succeed partly because it solves this problem.
- Problem structures are respected. For example, sequential data would suggest the use of Hidden Markov Models (HMM) or chain-like Conditional Random Fields (CRF). In deep learning, it reduces to architecture engineering!
- Knowledge about the model class. E.g., will linearity be the dominant model? What are the expected complexity and nonlinearity? Will interpretability needed? What is about transparency? Is sparsity important? For neural nets, how many layers? For SVMs, will one kernel type be enough?
- Assumptions about data manifold. One well-studied phenomenon is the intrinsic low dimensionality of data embedded in a very high dimensional space. This is usually manifested through data redundancies. Another assumption is separation of classes, e.g., the region at the class boundary is usually sparse, but is very dense near the class examplars. This assumption essentially gives rise to semi-supervised learning.
- Assumptions about the data space. How many data instances? Will characterizing the data variance enough? If yes then use PCA. What about factors of variation are the key? If yes then RBM perhaps helps.
Even with a good prior, we would never be sure that our choices are correct. Model uncertainty is there and must be accounted for. A popular way is to use many (diverse) models, then employ model averaging, ensemble methods and Bayesian approach. Deep learning has dropout as one of the best tricks invented in the past 10 years. It works like wonder. Bayesian neural nets, which were studied in mid 1990s, is also back!
In fact, every single modern challenge was won by some ensemble, mostly gradient boosting by the time of this writing AND model blending. One of the best known example is the Netflix challenge, which was won by blending hundreds of models -- so complex that Netflix found it was useless to implement in practice.
Many are easier than one
I often told my 5-year old daughter: do one thing at a time. But by listening to me AND playing at the same time, she has already multi-tasked. Humans seem to learn better that way. We learn by making senses of many co-occurring feedback signals.
A key idea in recent machine learning is model reuse. It has many forms:
- Domain adaption, which is about reusing previous model on similar domains with minimal changes.
- Transfer learning, which is about reusing models on similar tasks with minimal changes. In neural nets, it is equivalent to not forgetting the old trained nets when learning a new concepts.
- Multi-task learning, which is about learning more than ones correlated tasks at a time. The idea is that models can be partly shared among tasks, leading to less training data and less overfitting.
- Lifelong learning, which is like continual version of transfer learning. Just like humans, we learn to do new things from birth to death, every single day! Popularized by Sebastian Thrun in mid 1990s, lifelong learning is now back in various forms: never-ending learning at CMU, reinforce learning in robotics and games at a various labs.
- Multi-X, where X is substituted by view, modality, instance, label, output/outcome, target, type and so on.
this post is awesome, 90% of my working time is ensure the data quality, data understanding, feature selection.
ReplyDeleteSo true. Data quality is the bottleneck of everything, but also the least appreciated.
DeleteI think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!! Best Machine Learning institute in Chennai | python machine learning course in chennai | Machine Learning course in chennai
ReplyDeleteThis comment has been removed by the author.
DeleteYour comment is very beneficial for the future, you have written in a very beautiful way, you have an inspiration for the youth who come to your comment is really very beautiful, nowadays children do not know where they are wandering, no one can make a comment like you. The post is very different.
DeleteDeep Learning Projects for Final Year Students
The course covers a number of subjects and tools that acts as body and soul in database management like basic Statistics, Hypothesis Testing, Data Mining and Clearing, Machine Learning, Data Forecasting, Data Visualization, Programming Languages like Mattlab, C++, Hadoop, Plotting Libraries Like Python, Plotly, Matplotlib, etc. artificial intelligence course in hyderabad
ReplyDeleteImpressive. Your story always bring hope and new energy. Keep up the good work.
ReplyDeleteBest Data Science Courses in Hyderabad
Your comment is very beneficial for the future, you have written in a very beautiful way, you have an inspiration for the youth who come to your comment is really very beautiful, nowadays children do not know where they are wandering, no one can make a comment like you. The post is very different.
ReplyDeleterussian escorts gurugram
call girls sector 18
escort service sector 77
call girls in gurgaon
Awesome and interesting article. Great things you've always shared with us. Thanks. Just continue composing this kind of post.
ReplyDeleteAWS Training in Hyderabad
AWS Course in Hyderabad
Companies are increasingly turning to data for decision-making and are depending on data professionals to do so. Develop strong logical and numerical aptitude and learn to work with R, Python, SQL, Hadoop, and statistical techniques like Linear Regression, Logistic Regression, etc. Sign up for the Data Scientist training in Bangalore, and gain expertise in using sophisticated analytical methods and statistical methods to prepare data for predictive and prescriptive modeling.
ReplyDeleteData Science Training in Jaipur
Enroll in the Data Science course near me to learn the handling of huge amounts of data by analyzing it with the help of analytical tools. This field offers ample job profiles to work as a Data Architect, Data Administrator, Data Analyst, Business Analyst, Data Manager, and BI Manager. Step into an exciting career in the field of Data Science and achieve great heights by acquiring the right knowledge and skills to formulate solutions to business problems.
ReplyDeleteData Science Course in Delhi
Hello
ReplyDelete