Getting Started with Machine Learning
What is Machine Learning ?
One of the most fascinating technologies one has ever encountered is machine learning. The scientific discipline of machine learning enables computers to learn without explicit programming.
Difference between Classical Programming and Machine Learning Programming
Classical Programming: DATA (input) + RULES (logic) are fed into the machine, where they are operated to produce the output/answers.
Machine Learning Programming: During training, we feed the machine with DATA (input) + ANSWERS(Output), run it, and the machine develops its own RULES(logic), which can be assessed during testing.
Examples:
- Scientists and researchers have created models to teach computers how to recognize cancer simply by looking at slide-cell photos. This task would have taken a long time for people to complete. However, there is no longer any need for delay because machines can now reasonably anticipate whether a patient will develop cancer or not.
- Gmail uses text categorization, a component of machine learning, to categorize emails as social, promotional, updates, or forums.
- Google Lens, which uses machine learning to extract text from the images you provide it.
What is the Process of Machine Learning?
Gathering Data: Collecting past data in any format that may be processed. The more appropriate data is for modelling, the higher its quality must be.
Data Processing: Sometimes the data that is gathered needs to be pre-processed because it is in raw form. To do machine learning or any other type of data mining, a tuple that has missing values for one or more characteristics must be supplied with appropriate values. While missing values for categorical attributes may be replaced with the attribute with the highest mode, missing values for numerical attributes, such as the price of the house, may be replaced with the property's mean value. The kind of filters we employ always affects this. It will be necessary to convert data, whether it be in the form of a list, array, or matrix, if it is in the form of text or images. Simply said, data must be made consistent and meaningful. It must be transformed into a machine-understandable format.
Splitting of Data: Create training, cross-validation, and test sets from the supplied data. The respective sets must be arranged in a ratio of 6:2:2.
Choosing a suitable algorithm and Testing: Building models with suitable algorithms and techniques on the training set. Testing the performance of our proposed model with data that was not provided to it during training and assessing it using measures like F1 score, precision, and recall.
The most crucial component of data analytics, machine learning, and artificial intelligence is data. Without data, we are unable to train any model, and all current automation and research will be ineffective. Large corporations spend a lot of money just to obtain as much precise data as they can.
What is Data ?
It can be any unprocessed fact, value, text, sound, or picture that is not being interpreted and analyzed.
Information is data that has been analyzed and transformed such that it may now be inferred in a way that is useful to users. Combination of experiences, knowledge, and insights that can be understood. Knowledge is what produces awareness or concept development for a person or an organization.
Types of Data
- Numerical data: Any sort of quantifiable information, such as your height, weight, or monthly phone bill, is referred to as numerical data or quantitative data. By attempting to average out the numbers or sorting them in ascending or descending order, you can assess whether a piece of data is numerical. Discrete numbers are ones that are exact or whole, like "26 people in a class," while continuous numbers are those that fall into a range, like "3.6 percent interest rate." Remember that numerical data is just raw numbers; it is important to keep this in mind while you absorb this kind of information.
- Categorical data: Sorting categorical data is done according to its distinguishing traits. This can include your profession, hometown, socioeconomic class, ethnicity, gender, or any number of other labels. Be aware that this data type is non-numerical, which prevents you from adding, averaging, or ordering it in any logical way as you learn it. In order to simplify the data analysis process for your machine learning model, categorical data is excellent for classifying people or concepts that have a lot in common.
- Time series data: Data points that are indexed at particular times in time make up time series data. This information is often gathered at regular periods. It is simple to compare data from week to week, month to month, year to year, or according to any other time-based statistic you desire once you have learned how to use time series data. Numerical data is just a collection of numbers that aren't rooted in certain time periods, whereas time series data is a collection of numbers that have clear starting and ending points.
- Text Data: Text data is just words, sentences, or paragraphs that can offer your machine learning models some level of insight. These terms are frequently combined together or examined using other techniques like word frequency, text classification, or sentiment analysis because they might be challenging for models to understand on their own.
Dividing Data in Machine Learning
- Training data: Data used to train our model, or training data. Your model actually observes and learns from this data (both input and output).
- Validation Data: The portion of data required to perform routine model evaluations, including fitting the model to the training dataset and optimizing any relevant hyperparameters (initially set parameters before the model begins learning). When the model is actually being trained, this data is important.
- Testing Data: After our model has been fully trained, testing data offers an objective assessment. When we incorporate the testing data as inputs, our model will forecast certain values (without seeing actual output). After making a prediction, we assess the model's performance by contrasting it with the actual output shown in the test data. We measure how much our model has learned from the experiences fed in as training data, which were established at the time of training, in this manner.
Python Libraries for Machine Learning
- Numpy
- Scipy
- Scikit-learn
- Theano
- TensorFlow
- Keras
- PyTorch
- Pandas
- Matplotlib
Next we will cover about Classifications of Machine Learning and other topics. Have a Great Day !