• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

adeshpande3/Machine-Learning-Links-And-Lessons-Learned: List of all the lessons ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称(OpenSource Name):

adeshpande3/Machine-Learning-Links-And-Lessons-Learned

开源软件地址(OpenSource Url):

https://github.com/adeshpande3/Machine-Learning-Links-And-Lessons-Learned

开源编程语言(OpenSource Language):


开源软件介绍(OpenSource Introduction):

Machine Learning Links and Lessons Learned

List of all the lessons learned, best practices, and links from my time studying machine learning.

Learning Machine Learning

"How do you get started with machine learning?". With AI and ML becoming such huge words in the tech industry, it's hard to go a full week without hearing something along these lines on online forums, in discussions with other students at UCLA, and even from fellow pre-meds and humanities majors. From my own experience of getting familiar with ML and from my experiences of teaching others through ACM AI, here's my best response to that question.

  1. Before getting started with any code or any technical terms, I think the best first step is to gain a big picture understanding of what machine learning is, and what it is attempting to do. When first teaching other students about ML, I've found that it's very important to give them a general understanding of the field before starting to dive into terms like gradient descent and loss function. Machine learning, as stated by Wikipedia, is a field of computer science that gives computers the ability to learn without being explicitly programmed. I might also add that machine learning is a subfield of AI, and that it is a unique approach to creating intelligent systems by making use of training data and optimization. I'd recommend the following high level links and videos to get you comfortable with the field as a whole.

    • Machine Learning Introduction: Loved this video because it starts with great definitions and introduces you to important terminology. Feel free to stop at 6:47.
    • What is Machine Learning?: Explains the 3 different subareas of machine learning: supervised learning, unsupervised learning, reinforcement learning
    • A Friendly Intro to Machine Learning: Great video with some cool illustrations, but honestly I think just watching until 5:54 is sufficient.
    • Basic Machine Learning Algorithms Overview: Don't worry about knowing exactly what every one of these terms mean. Just get a sense for the different algorithms and the tasks they are used for. We'll go into way more detail later on.
    • Machine Learning from Zero to Hero: Get motivated to learn Machine Learning. Sometimes a simple sitting back and watching the right videos can ignite and fan the fire to get active in ML. If you know software, this is a great starting post. Don't worry about knowing every single detail in each of the videos, but rather think about the high level goal of ML.
  2. Okay cool, so now you should have a general idea of the goal of machine learning. We want to be able to create a system that is able to perform some task and we want to train this system using a dataset that we have. Let's now jump into some of the models used in machine learning. An approach that has helped me is learning about these models one at a time, while also constantly thinking about the similarities and differences between them. For each model, I think going through the following process would be helpful:

    • Understand the high level approach that the model is taking. What type of task is the model trying to solve, and how is it going about solving it?
    • Start to go into the specifics of what goes into the model. Think about what function the model is trying to compute. What is the loss function that is commonly used? Does this model use gradient descent for its optimization procedure?
    • Think about an example where this model could is used. Is the model used for classification tasks or for regression tasks? In the particular example that you thought of, what do the inputs and outputs to the model look like? What type of dataset would you need to train this algorithm?
    • Transition to more practical exercises. If you're comfortable with coding, try to reconstruct the model in code. Think about how you would code the function, compute the loss, the gradients, etc. If that's a bit too intimidating right now (don't worry, it was initially for me too!), then just try to write down some pseudo-code and then look online to see how other people did it!
    • Finally, I think the coolest part of learning any machine learning algorithm is getting to see it in practice. The final step would be to do some sort of project or experiment where you run either the algorithm you coded up in the previous step or one imported from SciKitLearn on a particular problem of your choice.
    • To make sure that you've really understood the model, it's helpful to just quiz yourself. Take a piece of paper, and (from memory) write down a summary of the model as if you were explaining it to a 5 year old. Then, write a technical summary that talks about what function is being computed, how the training procedure occurs, what the loss function is, etc. See if you can explain it to a friend. Think about a new problem space this model could be used for. Before jumping into the next model to learn, really ask yourself if you've firmly understood the concepts (and be honest with yourself here!) and then proceed accordingly. If the answer is still no, YouTube and Google are your friends :) With the scale of content that we have with this field, it won't be hard to find a video or tutorial that has the answers to your questions.

    Okay, that was definitely a lot of info. The tl;dr is that learning about machine learning models has a lot of steps involved. First, there's understanding the high level perspective, then there's identifying the unique characteristics of the model, then it's testing yourself to see if you can code it up, and finally it's using the model in a practical setting. Don't worry if this takes you a week or even a month with something like linear regression. Going through this process slowly and making sure that you're understanding each step is critical for retaining the information.

For steps #3 - 7, repeat the process I talked about above with each of the following ML models. I have listed a couple of links per model. You don't have to watch of all them per se, just wanted to include as many good resources I could find. If you need more material or you still don't understand a topic, do a simple YouTube or Google search! You'll be surprised with how much you find.

  1. Linear Regression:

  2. Logistic Regression:

  3. K Nearest Neighbors:

  4. K-Means:

  5. Decision Trees:

Optional, but Worth Your Time: Random Forest, SVMs, Naive Bayes, Gradient Boosted Methods, PCA

So now that we have a decent understanding of some ML models, I think that we can transition into deep learning.

  1. Neural Networks: If someone wants to get started with deep learning, I think that the best approach is to first get familiar with machine learning (which you all will have done by this point) and then start with neural networks. Following the same high level understanding -> model specifics -> code -> practical example approach would be great here as well.

  2. Convolutional Neural Networks: A convolutional neural network is a special type of neural network that has been successfully used for image processing tasks.

  3. Recurrent Neural Networks: A recurrent neural network is a special type of neural network that has been successfully used for natural language processing tasks.

  4. Reinforcement Learning: While the 3 prior ML methods are necessarily important for understanding RL, a lot of recent progress in this field has combined elements from the deep learning camp as well as from the traditional reinforcement learning field.

Awesome, so now you should have a decent understanding of where ML and DL are in today's day and age. The world is your playground now. Read research papers, try Kaggle contests, watch ML tech talks, build cool projects, talk to others interested in ML, never stop learning, and most importantly, have fun! :) This is a great time to get into ML and in the rush to gain knowledge as quickly as possible, it often good to just slow down and think about the types of amazing applications and positive change we can create in this world with this technology.

Best Courses

Most Important Deep Learning Papers

Not a comprehensive list by any sense. I just thought these papers were incredibly influential in getting deep learning to where it is today. I also add a couple important general ML papers from time to time.

Cool Use Cases of ML

ML Tech Talks

Best Blogs

Data and Features

  • Normalizing inputs isn’t guaranteed to help.
    • A lot of people say you should always normalize, but TBH in practice, it doesn’t help for me all the time. This advice is very very dataset dependent, and it’s up to you to figure out if normalizing will be helpful for your particular task.
      • It's also model dependent. Normalization will likely help for linear models like linear/logistic reg, but won't be as helpful for deep learning models and neural networks.
    • In order to determine whether it will help or not, visualize your data to see what type of features you're dealing with. Do you have some features that have values around 1000 or 2000? Do you have some that are only 0.1 or 0.2? If so, normalizing those inputs is likely a good idea.
  • The process in which you convert categorical features into numerical ones is very important.
    • You have to decide whether you want to use dummies variables, use bins, etc.
  • Many different methods of normalization
    • Min-max normalization where you do (x - a)/(b-a) where x is the data point, a is the min value and b is the max value.
    • Max normalization where you do x/b where b is the max value.
    • L1 normalization where you do x/c where c is the summation of all the values.
    • L2 normalization where you do x/c where c is the square root of the summation of all the squared values.
    • Z score normalization where you do (x - d)/e where d is the mean and e is the standard deviation.
  • Something interesting you can do if you either have little data or missing data is that you can use a neural network to predict those values based on the data that you currently have.
    • I believe the matrix imputation is the correct term.
  • Data augmentation comes in many different forms. Noise injection is the most general form. While working with images, the number of options increases a whole lot.
    • Translations
    • Rotations
    • Brightness
    • Shears
  • Whenever thinking about whether or not to even apply machine learning to your problem, always consider the type of data that you have to solve the problem. Let's consider a model where you want to predict whether someone will develop cancer in the next few months based on their health data. The data is your input into the model and the output is a binary result. Seems like a good place where we can apply machine learning right, given that we have a number of patients' data and cancer results? However, we should also be thinking about whether the health data that we have about the individual is enough to be able to make some conclusion about the patient. Does the health data accurately encompass the different amounts of variation there are from person to person? Are we accurately representing each input example with the most comprehensive amount of information? Because if the data doesn't have enough information/features, your model will risk just overfitting to noise or the training set, and won't really learn anything useful at all.
  • Never underestimate the power of simply increasing the size of your training dataset when you're attempting to improve your model accuracy. If you have $100 for your machine learning project, spend like 90 of those dollars on good data collection and preprocessing.

Models

  • For regression problems where the goal is to output some real valued number, linear regression should always be your first choice. It gets great accuracy on a lot of datasets. Obviously, this is very dependent on the type of data you have, but you should always first try linreg, and then move on to other more complex models.
  • XGBoost has been one of the best models I've seen for a lot of regression or classification tasks.
    • Averaging multiple runs of XGBoost with different seeds helps to reduce model variance.
    • Model hyperparameter tuning is important for XGBoost
  • Whenever your features are highly correlated with each other, running OLS will have a large variance, and therefore you should use ridge regression instead.
  • KNN does poorly with high dimensional data because the distance metrics would get all messed up with the scales.
  • One of the themes I hear over and over again is that the effective capacity of your model should match the complexity of your task.
    • For simple house price prediction, a linear regression model might do.
    • For object segmentation, a large and specialized CNN will be a necessity.
  • Use RNNs for time series data and natural language data since NLP data is contains dependencies between different steps in the input.

Hyperparameters

  • Optimizing hyperparameters can be a bit of a pain and can be really computationally expensive, but sometimes they can make anywhere from a 1 - 5 percent improvement. In some use cases, that can be the difference between a good product and a great product.
  • Not saying that everyone should invest in a Titan X GPU to do large grid searches for optimizing hyperparameters, but it's very important to develop a structured way for how your going to go about it.
    • The easiest and simplest way is to just write a nested for loop that goes through different values, and keeps track of the models that perform the best.
    • You can also use specialized libraries and functions that will make the code look a little nicer.
      • Scikit-Learn has a good one
  • SVM’s
  • The Deep Learning book says that the learning rate is the most important hyperparameter in most deep networks, and I do have to agree. A low LR makes training unbearably slow (and could get stuck in some minima), while a high LR can make training extremely unstable.
  • Hyperopt: A Python library for hyperparameter optimization

Tensorflow/Deep Nets

  • Always make sure your model is able to overfit to a couple pieces of the training data. Once you know that the network is learning correctly and is able to overfit, then you can work on building a more general model.
  • Evaluation of your Tensorflow graph only happens at runtime. Your job is to first define the variables and the placeholders and the operations (which together make up the graph) and then you can execute this graph in a session (execution environment), where you feed in values into the placeholders.
  • Regularization is huuuuugely important with deep nets.
    • Weight Decay
    • Dropout
      • Seriously. Use it. It’s sooo simple, but it’s so good at helping to prevent overfitting and thus gives your test accuracy a nice boost most of the time.
    • Data Augmentation
    • Early Stopping
      • Really effective. Basically, just stop training when the validation error stops decreasing. Because if you training for any longer, you’re just going to overfit to the training dataset more, and that’s not going to help you with that test set accuracy. And honestly, this method of regularization is just really easy to implement.
    • Batch normalization (Main purpose is not necessarily to regularize, but there's a slight regularization effect)
      • In my experience, batch norm is very helpful with deep nets. The intuition is that it helps reduce the internal covariate shift inside the network. In simpler terms, let's think about the activations in layer 3 of the network. These activations are a function of the weights of the previous two layers and the inputs. The job of the 3rd layer weights is to take those activations and apply another transformation such that the resulting output is close to the predicted value. Now, because the activations are a function of the weight matrices of the previous two layers (which are changing a lot b/c gradient descent), the distribution of those activations can vary a lot as the network is training. Intuitively, this makes it very difficult for the weights of the 3rd layer to figure out how to properly set itself. Batch norm helps alleviate that problem that making sure to keep the mean and variance of the activations the same (the exact values for mean and variance aren't necessarily 0 and 1, but are rather a function of 2 learnable parameters beta and gamma). This results in the activation distributions changing a lot less, meaning that the weights of the 3rd layer are easier to get adjusted to the optimal values, and thus results in faster training times.
      • Great blog post by Rohan Varma
    • Label smoothing
    • Model averaging
    • Regularizers are thought to helpful the generalization ability of networks, but are not the main reason that deep nets generalize so well. The idea is that you don’t use all of the above, but if you are having overfitting issues, you should use some of these to combat those issues.
  • That being said, modal architecture and the structure of your layers is sometimes thought to be more important.
  • Always always always do controlled experiments. By this, I mean only change one variable at a time when you’re trying to tune your model. There is an incredible number of hyperparameters and design decisions that you can change. Here are a few examples.
    • Network architecture
      • Number of convolutional layers in a CNN
      • Number of LSTM units in an RNN
    • Choice of optimizer
    • Learning rate
    • Regularization
    • Data augmentation
    • Data preprocessing
    • Batch size If you try to change too many of the above variables at once, you’ll lose track of which changes really had the most impact. Once you make changes to the variables, always keep track of what impact that had on the overall accuracy or whatever other metric you’re using.

Deep Learning Frameworks

  • Keras - My friend and I have a joke where we say that you’ll have a greater number of lines in your code just doing Keras imports, compared to the actual code because the functions are so incredibly high level. Like seriously. You could load a pretrained network and finetune it on your own task in like 6 lines. It’s incredible. This is definitely the framework I use for hackathons and when I’m in a time crunch, but I think if you really really want to learn ML and DL, relying on Keras’s nice API might not be the best call.
  • Tensorflow - This is my go-to deep library nowdays. Honestly I think it has the steepest learning curve because it takes quite a while to get comfortable with the ideas of Tensorflow variables, placeholders, and building/executing graphs. One of the big plus sides to Tensorflow is the number of Github and Stackoverflow help you can get. You can find the answer to almost any error you get in Tensorflow because someone has likely run into it before. I think that's hugely helpful.
  • Torch - 2015 was definitely the year of Torch, but unless you really want to learn Lua, PyTorch is probably the way to go now. However, there’s a lot of good documentation and tutorials associated with Torch, so that’s a good upside.
  • PyTorch - My other friend and I have this joke where we say that if you’re running into a bug in PyTorch, you could probably read the entirety of PyTorch’s documentation in less than 2 hours and you still wouldn’t find your answer LOL. But honestly, so many AI researchers have been raving about it, so it’s definitely worth giving it a shot even though it’s still pretty young. I think Tensorflow and PyTorch will be the 2 frameworks that will start to take over the DL framework space.
  • Caffe and Caffe2 - Never played around with Caffe, but this was one of the first deep learning libraries out there. Caffe2 is notable because it's the production framework that Facebook uses to serve its models. According to Soumith Chintala, researchers at Facebook will try out new models and research ideas using PyTorch and will deploy using Caffe2.

CNNs

  • Use CNNs for any image related task. It's really hard for me to think of any image processing task that hasn't been absoluted revolutionized by CNNs.
    • That being said, you might not want to use CNNs if you have latency, power, computation, or memory constraints. In a lot of different areas (IoT for example), some of the downsides of using a CNN flare up.
  • Transfer learning is harder than it looks. Found out from firsthand experience. During a hackathon, my friend and I wanted to determine whether someone has bad posture or not (from a picture) and so we spent quite a bit of time creating a 500 image dataset, but even with using a pretrained model, chopping off the last layer, and retraining it, and while the network was able to learn and get a decent accuracy on the training set, the validation and testing accuracies weren’t up to par signaling that overfitting might be a problem (due to our small dataset). Moral of the story is don’t think that transfer learning will come and save your image processing task. Take a good amount of time to create a solid dataset, and understand what type of model you’ll need, and what kind of modifications you’ll need to make to the pretrained network.
    • Transfer learning is also interesting because there are two different ways that you can go with it. You can use transfer learning for finetuning. This is where you take a pretrained CNN (trained on Imagenet), chop off the last fully connected layer, add an FC layer specific to your task, and then retrain on your dataset. However, there's also another interesting approach called transfer learning for feature extraction. This is where you take a pretrained CNN, pass your data through the CNN, get the output activations from the last convolutional layer, and then use that feature representation as your data for training a more simple model like an SVM or linear regression.

NLP

  • Use pretrained word embeddings whenever you can. From my experience, it’s just a lot less hassle, and quite frankly I’m not sure if you even get a performance improvement if you try to train them jointly with whatever other main task you want to solve. It’s task-dependent I guess. For something simple like sentiment analysis though, pretrained word vectors worked perfectly.

Deep Reinforcement Learning

Some cool articles and blog posts on RL.

ML Project Advice

  • Create your machine learning pipeline first and then start to worry about how you can tune and make your model better later.
    • For example, if you’re creating an image classification model, make sure you’re able to load data into your program, create test/train matrices, create a very simple model (using one of the DL libraries), create your training loop, and make sure the network is learning something and that you can get to some baseline accuracy. Only once you’ve done these steps can you start to worry about things like regularization, data augmentation, etc.
    • Too many times you can get over complicated with the model and hyperparameters when your issues may lie just in the way you’re loading in data or creating your training batches. Be sure to get those simple parts of the machine learning pipeline down.
    • Another benefit to making sure you have a minimal but working end to end pipeline is that you’re able to track performance metrics as you start to change your model and tune your hyperparameters.
  • If you’re trying to create some sort of end product, 80% of your ML project will honestly be just doing front-end and back-end work. And even within that 20% of ML work, a lot of it will probably be dataset creation or preprocessing.
  • Always divide your data into train and validation (and test if you want) sets. Checking performance on your validation set at certain points during training will help you determine whether the network is learning and when overfitting starts to happen.
  • Always shuffle your data when you're creating training batches.

Math Resources

Bias in Machine Learning

Kaggle

  • Bit of a love/hate relationship with Kaggle. I think it's great for beginners in machine learning who are looking to get more practical experience. I can't tell you how much it helped me to go through the process of loading in data with Pandas, creating a model with Tensorflow/Scikit-Learn, training the model, and fine tuning to get good performance. Seeing how your model stacks up against the competition afterwards is a cool feeling as well.
  • The part of Kaggle that I don't really enjoy is how much feature engineering is required to really get into the top 10-15% of the leaderboard. You have to be really committed to data visualization and hyperparameter tuning. S

鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap