Latest Posts from PySnackshttps://pysnacks.com/2020-10-15T16:49:37.327962+00:00PySnackshello@pysnacks.comhttps://www.pysnacks.com/contactA Tutorial on using BERT for Text Classification w Fine Tuning2020-01-23T12:06:42.753436+00:002020-10-15T16:49:37.327962+00:00https://example.com/machine-learning/bert-text-classification-with-fine-tuning/Foo<div class="block-paragraph"><div class="rich-text"><p>In this tutorial, we will learn how to use BERT for text classification. We will begin with a brief introduction of BERT, its architecture and fine-tuning mechanism. Then we will learn how to fine-tune BERT for text classification on following classification tasks:</p><ol><li><b>Binary Text Classification</b>: <a href="#binary-text-classification-using-bert">IMDB sentiment analysis with BERT</a> [88% accuracy].</li><li><b>Multi-class Text Classification:</b> <a href="#multi-class-text-classification-using-bert">20-Newsgroup classification with BERT</a> [90% accuracy].</li><li><b>Multi-label Text Classification:</b> <a href="#multilabel-text-classification-using-bert">Toxic-comment classification with BERT</a> [90% accuracy].</li></ol><p>We will use BERT through the keras-bert Python library, and train and test our model on GPU’s provided by Google Colab with Tensorflow backend.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="what-is-bert""></a><a class="body-link" href="#what-is-bert">What is BERT ?</a></h2><p>BERT stands for Bidirectional Encoder Representation of Transformers. It is a deep learning based unsupervised language representation model developed by researchers at Google AI Language. It is the first deeply-bidirectional unsupervised language model. The language models, until BERT, learnt from text sequences in either left-to-right or combined left-to-right and right-to-left contexts. Thus they were either not bidirectional or not bidirectional in all layers.The diagram below shows its bidirectional architecture as compared to other language models.</p><p></p><img alt="bert-bidirectional" class="richtext-image full-width img-responsive lazyload" height="340"data-src="https://pysnacks-media.s3.amazonaws.com/images/bert-bidirectional-pysnacks.width-1280.png" width="1280"><p></p><p>Deep-Bi-directionality in BERT <a href="https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html" target="_blank" rel="noopener noreferrer">Source</a></p><p>BERT incorporated deep bi-directionality in learning representations using a novel Masked Language Model(MLM) approach. This deep-bidirectional learning approach allows BERT to learn words with their context being both left and right words. Under the hood, BERT uses the popular Attention model for bidirectional training of transformers. With this approach BERT claims to have achieved the state-of-the-art results on a series of natural language processing and understanding tasks.</p><p></p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="an-overview-of-bert-architecture""></a><a class="body-link" href="#an-overview-of-bert-architecture">An overview of BERT Architecture</a></h2><p>Before diving into using BERT for text classification, let us take a quick overview of BERT’s architecture. BERT is a multilayered bidirectional Transformer encoder. The diagram below shows a 12 layered BERT model(BERT-Base version). Note that each Transformer is based on the Attention Model.</p><img alt="bert-architecture" class="richtext-image full-width img-responsive lazyload" height="720"data-src="https://pysnacks-media.s3.amazonaws.com/images/bert-architecture.width-1280.png" width="1280"><p></p></div></div> <div class="block-paragraph"><div class="rich-text"><p>There are multiple pre-trained model versions with varying numbers of encoder layers, attention heads and hidden size dimensions available. Below is a list of different model variants available.</p></div></div> <div class="block-paragraph"><div class="rich-text"><p>H = The hidden size.</p><p>A = Number of self attention heads.</p><p>L = Number of Layers (Transformer Blocks)</p><p>The largest model available is BERT-Large which has 24 layers, 16 attention heads and 1024 dimensional output hidden vectors. For each model, there are also cased and uncased variants available. In this tutorial we will use BERT-Base which has 12 encoder layers with 12 attention heads and has 768 hidden sized representations.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="different-ways-to-use-bert""></a><a class="body-link" href="#different-ways-to-use-bert">Different Ways To Use BERT</a></h2><p>BERT can be used for text classification in three ways.</p><ol><li><b>Fine Tuning Approach</b>: In the fine tuning approach, we add a dense layer on top of the last layer of the pretrained BERT model and then train the whole model with a task specific dataset.</li><li><b>Feature Based Approach</b>: In this approach fixed features are extracted from the pretrained model.The activations from one or more layers are extracted without fine-tuning and these contextual embeddings are used as input to the downstream network for specific tasks. A few strategies for feature extraction discussed in the BERT paper are as follows:<ol><li>Extracting Second-to-Last Hidden Layer</li><li>Extracting Last Hidden Layer</li><li>Concat Last Four Hidden</li><li>Weighted Sum All 12 Layers</li></ol></li><li><b>As word-embedding</b>: In this approach, the trained model is used to generate token embedding (vector representation of words) without any fine-tuning for an end-to-end NLP task. The vectors representations of tokens then can then be used for specific tasks like classification, topic modeling, summarisation etc. The following code demonstrates using BERT as word-embedding using the bert-embedding library.</li></ol></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="c1">#Source: https://pypi.org/project/bert-embedding/</span> <span class="n">pip</span> <span class="n">install</span> <span class="n">bert</span><span class="o">-</span><span class="n">embedding</span> <span class="kn">from</span> <span class="nn">bert_embedding</span> <span class="kn">import</span> <span class="n">BertEmbedding</span> <span class="n">text</span> <span class="o">=</span> <span class="s2">"A tutorial on how to generate token embeddings using BERT"</span> <span class="n">bert_embedding</span> <span class="o">=</span> <span class="n">BertEmbedding</span><span class="p">()</span> <span class="n">result</span> <span class="o">=</span> <span class="n">bert_embedding</span><span class="p">(</span><span class="n">text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">))</span> <span class="n">first_sentence</span> <span class="o">=</span> <span class="n">result</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="n">embedding</span> <span class="o">=</span> <span class="n">first_sentence</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="nb">print</span> <span class="p">(</span><span class="n">embedding</span><span class="p">)</span> <span class="c1"># array([ 0.4805648 , 0.18369392, -0.28554988, ..., -0.01961522,</span> <span class="c1"># 1.0207764 , -0.67167974], dtype=float32)</span> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>So which approach to choose for text classification with BERT? The answer depends on the performance requirements and the amount of effort we wish to put in, in terms of resources and time. Fine-tuning and feature-based extraction approaches require training, testing and validating on GPU or TPU and therefore are more time taking and resource intensive as compared to embedding-based approach. However, they are expected to yield better results as they benefit from the use of bidirectional contextual representation of whole sentences, tuned specifically for the task at hand.</p><p></p><p>The BERT paper recommends fine-tuning for better results. A few advantages of fine tuning BERT are as follows:</p><ol><li><b>Better Results:</b> Deeply-bidirectional learning enables it to achieve comparable or even better results than custom architecture tailored to one specific task.</li><li><b>Lesser data:</b> BERT is trained on the BooksCorpus (800M words) and Wikipedia (2,500M words). The pre-trained model therefore has weights that allow us to fine tune for a specific dataset using much smaller datasets as compared to the case where the model needs to learn weights on a train from scratch.</li><li><b>Lesser resources:</b> With advantage of being able to work with lesser training data, it cuts down the excessive compute and memory resources required to train the models from scratch.</li></ol></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="understanding-input-to-bert""></a><a class="body-link" href="#understanding-input-to-bert">Understanding Input to BERT</a></h2><p>So, what is the input to BERT? Input to BERT is an embedding representation derived by summing token embedding, segmentation embedding and the position embedding of the text.</p><p></p><img alt="bert-input" class="richtext-image full-width img-responsive lazyload" height="720"data-src="https://pysnacks-media.s3.amazonaws.com/images/bert-input.width-1280.png" width="1280"><p></p></div></div> <div class="block-paragraph"><div class="rich-text"><p>What are token embedding, segmentation embedding and the position embedding?</p><ol><li><b>Token Embeddings:</b> Token embeddings are the representations for the word-tokens of the text derived by tokenizing using WordPiece token vocabulary. For BERT-Base, the hidden size is 768, thus the token embedding created has a (SEQ_LEN X 768) size representation. The token embedding also includes [CLS] and [SEP] markers which denote the class(classification -category or label) and sentence separation respectively.</li><li><b>Position Embeddings:</b> The position embedding is a representation for the position of each token in the sentence. For BERT-Base it is a 2D array of size (SEQ_LEN, 768), where each Nth row is a vector representation for the Nth position.</li><li><b>Segment Embeddings:</b> The segment embedding identifies the different unique sentences in the text.</li></ol><p>Note that each of the embeddings(token, position and segment), being summed to derive the input, has (SEQ_LEN x Hidden-Size) dimension. The SEQ_LEN value can be changed and is decided based on the length of the sentences in the downstream task dataset. The sentences which have length less than the sequence length need to be padded. The Hidden-Size (H) is decided by the choice of the BERT model(like BERT Tiny, Small, Base , Large etc.).</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="how-to-fine-tune-bert-for-text-classification""></a><a class="body-link" href="#how-to-fine-tune-bert-for-text-classification">How to Fine Tune BERT for Text Classification ?</a></h2><p>To Fine Tuning BERT for text classification, take a pre-trained BERT model, apply an additional fully-connected dense layer on top of its output layer and train the entire model with the task dataset. The diagram below shows how BERT is used for text-classification:</p><p></p><p></p><img alt="bert_text_classification_input.png" class="richtext-image full-width img-responsive lazyload" height="720"data-src="https://pysnacks-media.s3.amazonaws.com/images/bert-text-classification-input.width-1280.png" width="1280"><p></p><p></p></div></div> <div class="block-paragraph"><div class="rich-text"><p>Note that only the final hidden state corresponding to the class token ([CLS]) is used as the aggregate sequence representation to feed into a fully connected dense layer for classification tasks. To understand it better, let us look at the last layers of BERT(BERT-Base, 12 Layers).</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>Encoder-11-FeedForward-Norm <span class="o">(</span>La <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">1536</span> Encoder-11-FeedForward-Add<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ Encoder-12-MultiHeadSelfAttenti <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">2362368</span> Encoder-11-FeedForward-Norm<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ Encoder-12-MultiHeadSelfAttenti <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">0</span> Encoder-12-MultiHeadSelfAttention __________________________________________________________________________________________________ Encoder-12-MultiHeadSelfAttenti <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">0</span> Encoder-11-FeedForward-Norm<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> Encoder-12-MultiHeadSelfAttention __________________________________________________________________________________________________ Encoder-12-MultiHeadSelfAttenti <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">1536</span> Encoder-12-MultiHeadSelfAttention __________________________________________________________________________________________________ Encoder-12-FeedForward <span class="o">(</span>FeedFor <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">4722432</span> Encoder-12-MultiHeadSelfAttention __________________________________________________________________________________________________ Encoder-12-FeedForward-Dropout <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">0</span> Encoder-12-FeedForward<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ Encoder-12-FeedForward-Add <span class="o">(</span>Add <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">0</span> Encoder-12-MultiHeadSelfAttention Encoder-12-FeedForward-Dropout<span class="o">[</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ Encoder-12-FeedForward-Norm <span class="o">(</span>La <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">1536</span> Encoder-12-FeedForward-Add<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ Extract <span class="o">(</span>Extract<span class="o">)</span> <span class="o">(</span>None, <span class="m">768</span><span class="o">)</span> <span class="m">0</span> Encoder-12-FeedForward-Norm<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ NSP-Dense <span class="o">(</span>Dense<span class="o">)</span> <span class="o">(</span>None, <span class="m">768</span><span class="o">)</span> <span class="m">590592</span> Extract<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>For fine-tuning this model for classification tasks, we take the last layer NSP-Dense (Next Sentence Prediction-Dense) and tie its output to a new fully connected dense layer, as shown below.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="c1"># Add dense layer for classification</span> <span class="n">inputs</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span> <span class="n">dense</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">'NSP-Dense'</span><span class="p">)</span><span class="o">.</span><span class="n">output</span> <span class="n">outputs</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'softmax'</span><span class="p">)(</span><span class="n">dense</span><span class="p">)</span> <span class="n">model</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">outputs</span><span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>The updated model looks like this for binary text classification:</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>Encoder-12-FeedForward-Norm <span class="o">(</span>La <span class="o">(</span>None, <span class="m">128</span>, <span class="m">768</span><span class="o">)</span> <span class="m">1536</span> Encoder-12-FeedForward-Add<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ Extract <span class="o">(</span>Extract<span class="o">)</span> <span class="o">(</span>None, <span class="m">768</span><span class="o">)</span> <span class="m">0</span> Encoder-12-FeedForward-Norm<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ NSP-Dense <span class="o">(</span>Dense<span class="o">)</span> <span class="o">(</span>None, <span class="m">768</span><span class="o">)</span> <span class="m">590592</span> Extract<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> __________________________________________________________________________________________________ dense <span class="o">(</span>Dense<span class="o">)</span> <span class="o">(</span>None, <span class="m">20</span><span class="o">)</span> <span class="m">15380</span> NSP-Dense<span class="o">[</span><span class="m">0</span><span class="o">][</span><span class="m">0</span><span class="o">]</span> <span class="o">==================================================================================================</span> Total params: <span class="m">109</span>,202,708 Trainable params: <span class="m">109</span>,202,708 Non-trainable params: <span class="m">0</span> __________________________________________________________________________________________________ None </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>The size of the last fully connected dense layer is equal to the number of classification classes or labels.</p><p>So, how do we choose activation and loss function for text classification? For Binary and Multiclass text classification we use the softmax activation function with sparse categorical cross entropy loss function while for multilabel text classification, sigmoid activation function with binary cross entropy loss function is more suitable.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="recommended-fine-tuning-hyper-parameters""></a><a class="body-link" href="#recommended-fine-tuning-hyper-parameters">Recommended Fine Tuning Hyper Parameters</a></h2><p>According to the BERT paper, the following range of values are recommended:</p><ol><li>Batch size: 16, 32</li><li>Learning rate (Adam): 5e-5, 3e-5, 2e-5</li><li>Number of epochs: 2, 3, 4</li></ol></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="preparing-input-datasets""></a><a class="body-link" href="#preparing-input-datasets">Preparing Input datasets</a></h2><p>Let us take a look at working examples of binary, multiclass and multilabel text classification by fine-tuning BERT. We will use Python based keras-bert library with Tensorflow backend and run our examples on Google Colab with GPU accelerators. Some of the code for these examples are taken from keras-bert documentation.</p><p>One method that is common across, all the tasks is the method that prepares the training, test and validation datasets. We need a method that generates these sets in the format BERT expects for text classification.</p><p></p><h3><a id="understanding-the-input-to-keras-bert""></a><a class="body-link" href="#understanding-the-input-to-keras-bert">Understanding the input to keras-bert</a></h3><p>For fine-tuning using keras-bert the following inputs are required:</p><ol><li><b>Token Embedding:</b> Each sentence in the dataset needs to be tokenized using WordPiece vocabulary, add [CLS] and [SEP] tokens, add padding.</li><li><b>Segment Mask Embedding:</b> Generate segment embedding. (Array of zeros for single sentence representation.)</li><li><b>Target Labels</b></li></ol><p>The positional embedding is derived internally and does not need to be passed explicitly.</p><p>To do the above three tasks we will use a method called <i>load_data</i>, the input to which would vary depending on the dataset format, however the processing logic and the output is the same across all. The output of <i>load_data</i> method is a tuple where the first item in a list of size two, the first item being text’s token embedding and the second item being texts segment embedding(array of zeros as we are classifying or labelling only one sentence at a time). The second item of the tuple is the target class, index wise-paired with the token and segment embedding.</p><p></p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="binary-text-classification-using-bert""></a><a class="body-link" href="#binary-text-classification-using-bert">Binary Text Classification Using BERT</a></h2><p>To demonstrate using BERT with fine-tuning for binary text classification, we will use the <a href="https://ai.stanford.edu/~amaas/data/sentiment/" target="_blank" rel="noopener noreferrer"><i>Large Movie Review Dataset</i></a>. This is a dataset for binary sentiment classification and contains a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.</p><p></p><p>Let us begin with first downloading the dataset and preparing the training and test datasets.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="ch">#!wget -q https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip</span> <span class="c1">#!unzip -o uncased_L-12_H-768_A-12.zip</span> <span class="n">dataset</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">get_file</span><span class="p">(</span> <span class="n">fname</span><span class="o">=</span><span class="s2">"aclImdb.tar.gz"</span><span class="p">,</span> <span class="n">origin</span><span class="o">=</span><span class="s2">"http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"</span><span class="p">,</span> <span class="n">extract</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="p">)</span> <span class="n">token_dict</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">with</span> <span class="n">codecs</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">vocab_path</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">,</span> <span class="s1">'utf8'</span><span class="p">)</span> <span class="k">as</span> <span class="n">reader</span><span class="p">:</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span> <span class="n">token</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="n">token_dict</span><span class="p">[</span><span class="n">token</span><span class="p">]</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">token_dict</span><span class="p">)</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">token_dict</span><span class="p">)</span> <span class="k">def</span> <span class="nf">load_data</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">tagset</span><span class="p">):</span> <span class="k">global</span> <span class="n">tokenizer</span> <span class="n">indices</span><span class="p">,</span> <span class="n">sentiments</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">folder</span><span class="p">,</span> <span class="n">sentiment</span> <span class="ow">in</span> <span class="n">tagset</span><span class="p">:</span> <span class="n">folder</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">folder</span><span class="p">)</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">folder</span><span class="p">)):</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">folder</span><span class="p">,</span> <span class="n">name</span><span class="p">),</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">reader</span><span class="p">:</span> <span class="n">text</span> <span class="o">=</span> <span class="n">reader</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">ids</span><span class="p">,</span> <span class="n">segments</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">max_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">)</span> <span class="n">indices</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">ids</span><span class="p">)</span> <span class="n">sentiments</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sentiment</span><span class="p">)</span> <span class="n">items</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">indices</span><span class="p">,</span> <span class="n">sentiments</span><span class="p">))</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">items</span><span class="p">)</span> <span class="n">indices</span><span class="p">,</span> <span class="n">sentiments</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">items</span><span class="p">)</span> <span class="n">indices</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">indices</span><span class="p">)</span> <span class="n">mod</span> <span class="o">=</span> <span class="n">indices</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">%</span> <span class="n">BATCH_SIZE</span> <span class="k">if</span> <span class="n">mod</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span> <span class="n">indices</span><span class="p">,</span> <span class="n">sentiments</span> <span class="o">=</span> <span class="n">indices</span><span class="p">[:</span><span class="o">-</span><span class="n">mod</span><span class="p">],</span> <span class="n">sentiments</span><span class="p">[:</span><span class="o">-</span><span class="n">mod</span><span class="p">]</span> <span class="k">return</span> <span class="p">[</span><span class="n">indices</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">indices</span><span class="p">)],</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">sentiments</span><span class="p">)</span> <span class="n">train_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">dataset</span><span class="p">),</span> <span class="s1">'aclImdb'</span><span class="p">,</span> <span class="s1">'train'</span><span class="p">)</span> <span class="n">test_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">dataset</span><span class="p">),</span> <span class="s1">'aclImdb'</span><span class="p">,</span> <span class="s1">'test'</span><span class="p">)</span> <span class="n">tagset</span> <span class="o">=</span> <span class="p">[(</span><span class="s1">'neg'</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="p">(</span><span class="s1">'pos'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)]</span> <span class="n">id_to_labels</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">:</span> <span class="s1">'negative'</span><span class="p">,</span> <span class="mi">1</span><span class="p">:</span> <span class="s1">'positive'</span><span class="p">}</span> <span class="n">train_x</span><span class="p">,</span> <span class="n">train_y</span> <span class="o">=</span> <span class="n">load_data</span><span class="p">(</span><span class="n">train_path</span><span class="p">,</span> <span class="n">tagset</span><span class="p">)</span> <span class="n">test_x</span><span class="p">,</span> <span class="n">test_y</span> <span class="o">=</span> <span class="n">load_data</span><span class="p">(</span><span class="n">test_path</span><span class="p">,</span> <span class="n">tagset</span><span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>Once we have our training data ready, let us define our model training hyper-parameters. We set the batch-size as 16 and learning-rate at 2e-5 as recommended by the BERT paper. It's important to not set a high value for learning rate, as it could cause the training to not converge or catastrophic forgetting.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="c1"># Bert Model Constants</span> <span class="n">SEQ_LEN</span> <span class="o">=</span> <span class="mi">128</span> <span class="n">BATCH_SIZE</span> <span class="o">=</span> <span class="mi">16</span> <span class="n">EPOCHS</span> <span class="o">=</span> <span class="mi">3</span> <span class="n">LR</span> <span class="o">=</span> <span class="mf">2e-5</span> <span class="n">pretrained_path</span> <span class="o">=</span> <span class="s1">'uncased_L-12_H-768_A-12'</span> <span class="n">config_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">pretrained_path</span><span class="p">,</span> <span class="s1">'bert_config.json'</span><span class="p">)</span> <span class="n">checkpoint_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">pretrained_path</span><span class="p">,</span> <span class="s1">'bert_model.ckpt'</span><span class="p">)</span> <span class="n">vocab_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">pretrained_path</span><span class="p">,</span> <span class="s1">'vocab.txt'</span><span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>The next step is to build and train the model. We first load the pre-trained BERT-Base model. Then we take its last layer (NSP-Dense) and connect it to binary classification layer. The binary classification layer is essentially a fully-connected dense layer with size 2. Since it is a case of binary classification, we want the probabilities of the output nodes to sum upto 1, we use the <i>softmax</i> as the activation function.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">load_trained_model_from_checkpoint</span><span class="p">(</span> <span class="n">config_path</span><span class="p">,</span> <span class="n">checkpoint_path</span><span class="p">,</span> <span class="n">training</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">trainable</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">seq_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">,</span> <span class="p">)</span> <span class="n">inputs</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span> <span class="n">dense</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">'NSP-Dense'</span><span class="p">)</span><span class="o">.</span><span class="n">output</span> <span class="n">outputs</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'softmax'</span><span class="p">)(</span><span class="n">dense</span><span class="p">)</span> <span class="n">model</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">outputs</span><span class="p">)</span> <span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span> <span class="n">RAdam</span><span class="p">(</span><span class="n">lr</span><span class="o">=</span><span class="n">LR</span><span class="p">),</span> <span class="n">loss</span><span class="o">=</span><span class="s1">'sparse_categorical_crossentropy'</span><span class="p">,</span> <span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'sparse_categorical_accuracy'</span><span class="p">],</span> <span class="p">)</span> <span class="n">history</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span> <span class="n">train_x</span><span class="p">,</span> <span class="n">train_y</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="n">EPOCHS</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">BATCH_SIZE</span><span class="p">,</span> <span class="n">validation_split</span><span class="o">=</span><span class="mf">0.20</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>Train on <span class="m">19993</span> samples, validate on <span class="m">4999</span> samples Epoch <span class="m">1</span>/3 <span class="m">19993</span>/19993 <span class="o">[==============================]</span> - 426s 21ms/sample - loss: <span class="m">0</span>.3789 - sparse_categorical_accuracy: <span class="m">0</span>.8250 - val_loss: <span class="m">0</span>.3106 - val_sparse_categorical_accuracy: <span class="m">0</span>.8666 Epoch <span class="m">2</span>/3 <span class="m">19993</span>/19993 <span class="o">[==============================]</span> - 410s 20ms/sample - loss: <span class="m">0</span>.2370 - sparse_categorical_accuracy: <span class="m">0</span>.9029 - val_loss: <span class="m">0</span>.2764 - val_sparse_categorical_accuracy: <span class="m">0</span>.8852 Epoch <span class="m">3</span>/3 <span class="m">19993</span>/19993 <span class="o">[==============================]</span> - 408s 20ms/sample - loss: <span class="m">0</span>.1392 - sparse_categorical_accuracy: <span class="m">0</span>.9472 - val_loss: <span class="m">0</span>.3310 - val_sparse_categorical_accuracy: <span class="m">0</span>.8898 </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>One the training is done, let us evaluate the model.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">accuracy_score</span><span class="p">,</span> <span class="n">f1_score</span> <span class="n">predicts</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test_x</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="n">accuracy</span> <span class="o">=</span> <span class="n">accuracy_score</span><span class="p">(</span><span class="n">test_y</span><span class="p">,</span> <span class="n">predicts</span><span class="p">)</span> <span class="n">macro_f1</span> <span class="o">=</span> <span class="n">f1_score</span><span class="p">(</span><span class="n">test_y</span><span class="p">,</span> <span class="n">predicts</span><span class="p">,</span> <span class="n">average</span><span class="o">=</span><span class="s1">'macro'</span><span class="p">)</span> <span class="nb">print</span> <span class="p">(</span><span class="s2">"Accuracy: </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">accuracy</span><span class="p">)</span> <span class="nb">print</span> <span class="p">(</span><span class="s2">"macro_f1: </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">macro_f1</span><span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>Accuracy: <span class="m">0</span>.8842429577464789 macro_f1: <span class="m">0</span>.8841799318689518 </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>We could save the model with <i>model.save(modelname.h5).</i> The following code shows how to generate predictions.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="n">texts</span> <span class="o">=</span> <span class="p">[</span> <span class="s2">"It's a must watch"</span><span class="p">,</span> <span class="s2">"Can't wait for it's next part!"</span><span class="p">,</span> <span class="s1">'It fell short of expectations.'</span><span class="p">,</span> <span class="s1">'Wish there was more to it!'</span><span class="p">,</span> <span class="s1">'Just wow!'</span><span class="p">,</span> <span class="s1">'Colossial waste of time'</span><span class="p">,</span> <span class="s1">'Save youself from this 90 mins trauma!'</span> <span class="p">]</span> <span class="k">for</span> <span class="n">text</span> <span class="ow">in</span> <span class="n">texts</span><span class="p">:</span> <span class="n">ids</span><span class="p">,</span> <span class="n">segments</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">max_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">)</span> <span class="n">inpu</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">ids</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="n">SEQ_LEN</span><span class="p">])</span> <span class="n">predicted_id</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">([</span><span class="n">inpu</span><span class="p">,</span><span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">inpu</span><span class="p">)])</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="nb">print</span> <span class="p">(</span><span class="s2">"</span><span class="si">%s</span><span class="s2">: </span><span class="si">%s</span><span class="s2">"</span><span class="o">%</span> <span class="p">(</span><span class="n">id_to_labels</span><span class="p">[</span><span class="n">predicted_id</span><span class="p">],</span> <span class="n">text</span><span class="p">))</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>positive: It<span class="s1">'s a must watch</span> <span class="s1">positive: Can'</span>t <span class="nb">wait</span> <span class="k">for</span> it<span class="err">'</span>s next part! negative: It fell short of expectations. positive: Wish there was more to it! positive: Just wow! negative: Colossial waste of <span class="nb">time</span> negative: Save youself from this <span class="m">90</span> mins trauma! </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p><a href="https://colab.research.google.com/drive/14b2rbIgwhQ1BI-zkyiMjQv-jV85xj9tf" target="_blank" rel="noopener noreferrer"><b>Google Colab</b></a> <b>for IMDB sentiment analysis with BERT fine tuning.</b></p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="multi-class-text-classification-using-bert""></a><a class="body-link" href="#multi-class-text-classification-using-bert">Multi-class Text Classification Using BERT</a></h2><p>To demonstrate multi-class text classification we will use the <a href="http://qwone.com/~jason/20Newsgroups/" target="_blank" rel="noopener noreferrer">20-Newsgroup dataset</a>. It is a collection of about 20,000 newsgroup documents, spread evenly across 20 different newsgroups.</p><p></p><p>Let us first prepare the training and test datasets.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="n">dataset</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">get_file</span><span class="p">(</span> <span class="n">fname</span><span class="o">=</span><span class="s2">"20news-18828.tar.gz"</span><span class="p">,</span> <span class="n">origin</span><span class="o">=</span><span class="s2">"http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz"</span><span class="p">,</span> <span class="n">extract</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="p">)</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">token_dict</span><span class="p">)</span> <span class="k">def</span> <span class="nf">load_data</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">tagset</span><span class="p">):</span> <span class="k">global</span> <span class="n">tokenizer</span> <span class="n">indices</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">folder</span><span class="p">,</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">tagset</span><span class="p">:</span> <span class="n">folder</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">folder</span><span class="p">)</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">folder</span><span class="p">)):</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">folder</span><span class="p">,</span> <span class="n">name</span><span class="p">),</span> <span class="s1">'r'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s2">"utf-8"</span><span class="p">,</span> <span class="n">errors</span><span class="o">=</span><span class="s1">'ignore'</span><span class="p">)</span> <span class="k">as</span> <span class="n">reader</span><span class="p">:</span> <span class="n">text</span> <span class="o">=</span> <span class="n">reader</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">ids</span><span class="p">,</span> <span class="n">segments</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">max_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">)</span> <span class="n">indices</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">ids</span><span class="p">)</span> <span class="n">labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">label</span><span class="p">)</span> <span class="n">items</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">indices</span><span class="p">,</span> <span class="n">labels</span><span class="p">))</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">items</span><span class="p">)</span> <span class="n">indices</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">items</span><span class="p">)</span> <span class="n">indices</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">indices</span><span class="p">)</span> <span class="n">mod</span> <span class="o">=</span> <span class="n">indices</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">%</span> <span class="n">BATCH_SIZE</span> <span class="k">if</span> <span class="n">mod</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span> <span class="n">indices</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">indices</span><span class="p">[:</span><span class="o">-</span><span class="n">mod</span><span class="p">],</span> <span class="n">labels</span><span class="p">[:</span><span class="o">-</span><span class="n">mod</span><span class="p">]</span> <span class="k">return</span> <span class="p">[</span><span class="n">indices</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">indices</span><span class="p">)],</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span> <span class="n">path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">dataset</span><span class="p">),</span> <span class="s1">'20news-18828'</span><span class="p">)</span> <span class="n">tagset</span> <span class="o">=</span> <span class="p">[(</span><span class="n">x</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">x</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">path</span><span class="p">))]</span> <span class="n">id_to_labels</span> <span class="o">=</span> <span class="p">{</span><span class="n">id_</span><span class="p">:</span> <span class="n">label</span> <span class="k">for</span> <span class="n">label</span><span class="p">,</span> <span class="n">id_</span> <span class="ow">in</span> <span class="n">tagset</span><span class="p">}</span> <span class="c1"># Load data, split 80-20 for triaing/testing.</span> <span class="n">all_x</span><span class="p">,</span> <span class="n">all_y</span> <span class="o">=</span> <span class="n">load_data</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">tagset</span><span class="p">)</span> <span class="n">train_perc</span> <span class="o">=</span> <span class="mf">0.8</span> <span class="n">total</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">all_y</span><span class="p">)</span> <span class="n">n_train</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">train_perc</span> <span class="o">*</span> <span class="n">total</span><span class="p">)</span> <span class="n">n_test</span> <span class="o">=</span> <span class="p">(</span><span class="n">total</span> <span class="o">-</span> <span class="n">n_train</span><span class="p">)</span> <span class="n">test_x</span> <span class="o">=</span> <span class="p">[</span><span class="n">all_x</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">n_train</span><span class="p">:],</span> <span class="n">all_x</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="n">n_train</span><span class="p">:]]</span> <span class="n">train_x</span> <span class="o">=</span> <span class="p">[</span><span class="n">all_x</span><span class="p">[</span><span class="mi">0</span><span class="p">][:</span><span class="n">n_train</span><span class="p">],</span> <span class="n">all_x</span><span class="p">[</span><span class="mi">1</span><span class="p">][:</span><span class="n">n_train</span><span class="p">]]</span> <span class="n">train_y</span><span class="p">,</span> <span class="n">test_y</span> <span class="o">=</span> <span class="n">all_y</span><span class="p">[:</span><span class="n">n_train</span><span class="p">],</span> <span class="n">all_y</span><span class="p">[</span><span class="n">n_train</span><span class="p">:]</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"# Total: </span><span class="si">%s</span><span class="s2">, # Train: </span><span class="si">%s</span><span class="s2">, # Test: </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">total</span><span class="p">,</span> <span class="n">n_train</span><span class="p">,</span> <span class="n">n_test</span><span class="p">))</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span><span class="c1"># Total: 18816, # Train: 15052, # Test: 3764</span> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>Next, we build and train our model. We use the recommended BERT fine-tuning parameters and train our model for 4 epochs. The classification layer added on top of pre-trained BERT model is a fully-connected dense layer of size 20 (as 20 output classes) .</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="c1">#pip install -q keras-bert keras-rectified-adam</span> <span class="c1"># Bert Model Constants</span> <span class="n">SEQ_LEN</span> <span class="o">=</span> <span class="mi">128</span> <span class="n">BATCH_SIZE</span> <span class="o">=</span> <span class="mi">16</span> <span class="n">EPOCHS</span> <span class="o">=</span> <span class="mi">4</span> <span class="n">LR</span> <span class="o">=</span> <span class="mf">2e-5</span> <span class="n">pretrained_path</span> <span class="o">=</span> <span class="s1">'uncased_L-12_H-768_A-12'</span> <span class="n">config_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">pretrained_path</span><span class="p">,</span> <span class="s1">'bert_config.json'</span><span class="p">)</span> <span class="n">checkpoint_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">pretrained_path</span><span class="p">,</span> <span class="s1">'bert_model.ckpt'</span><span class="p">)</span> <span class="n">vocab_path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">pretrained_path</span><span class="p">,</span> <span class="s1">'vocab.txt'</span><span class="p">)</span> <span class="n">model</span> <span class="o">=</span> <span class="n">load_trained_model_from_checkpoint</span><span class="p">(</span> <span class="n">config_path</span><span class="p">,</span> <span class="n">checkpoint_path</span><span class="p">,</span> <span class="n">training</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">trainable</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">seq_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">,</span> <span class="p">)</span> <span class="c1"># Add dense layer for classification</span> <span class="n">inputs</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span> <span class="n">dense</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">'NSP-Dense'</span><span class="p">)</span><span class="o">.</span><span class="n">output</span> <span class="n">outputs</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'softmax'</span><span class="p">)(</span><span class="n">dense</span><span class="p">)</span> <span class="n">model</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">outputs</span><span class="p">)</span> <span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span> <span class="n">RAdam</span><span class="p">(</span><span class="n">lr</span><span class="o">=</span><span class="n">LR</span><span class="p">),</span> <span class="n">loss</span><span class="o">=</span><span class="s1">'sparse_categorical_crossentropy'</span><span class="p">,</span> <span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'sparse_categorical_accuracy'</span><span class="p">],</span> <span class="p">)</span> <span class="n">history</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span> <span class="n">train_x</span><span class="p">,</span> <span class="n">train_y</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="n">EPOCHS</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">BATCH_SIZE</span><span class="p">,</span> <span class="n">validation_split</span><span class="o">=</span><span class="mf">0.20</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7 8 9</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>Train on <span class="m">12041</span> samples, validate on <span class="m">3011</span> samples Epoch <span class="m">1</span>/4 <span class="m">12041</span>/12041 <span class="o">[==============================]</span> - 765s 64ms/sample - loss: <span class="m">1</span>.6826 - sparse_categorical_accuracy: <span class="m">0</span>.5052 - val_loss: <span class="m">0</span>.6773 - val_sparse_categorical_accuracy: <span class="m">0</span>.7948 Epoch <span class="m">2</span>/4 <span class="m">12041</span>/12041 <span class="o">[==============================]</span> - 749s 62ms/sample - loss: <span class="m">0</span>.4951 - sparse_categorical_accuracy: <span class="m">0</span>.8481 - val_loss: <span class="m">0</span>.4421 - val_sparse_categorical_accuracy: <span class="m">0</span>.8698 Epoch <span class="m">3</span>/4 <span class="m">12041</span>/12041 <span class="o">[==============================]</span> - 748s 62ms/sample - loss: <span class="m">0</span>.2534 - sparse_categorical_accuracy: <span class="m">0</span>.9239 - val_loss: <span class="m">0</span>.3752 - val_sparse_categorical_accuracy: <span class="m">0</span>.8947 Epoch <span class="m">4</span>/4 <span class="m">12041</span>/12041 <span class="o">[==============================]</span> - 746s 62ms/sample - loss: <span class="m">0</span>.1386 - sparse_categorical_accuracy: <span class="m">0</span>.9588 - val_loss: <span class="m">0</span>.3471 - val_sparse_categorical_accuracy: <span class="m">0</span>.9083 </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>Once we have our model train, let us evaluate and use for muti-class labelling.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">accuracy_score</span><span class="p">,</span> <span class="n">f1_score</span> <span class="n">predicts</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test_x</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="n">accuracy</span> <span class="o">=</span> <span class="n">accuracy_score</span><span class="p">(</span><span class="n">test_y</span><span class="p">,</span> <span class="n">predicts</span><span class="p">)</span> <span class="n">macro_f1</span> <span class="o">=</span> <span class="n">f1_score</span><span class="p">(</span><span class="n">test_y</span><span class="p">,</span> <span class="n">predicts</span><span class="p">,</span> <span class="n">average</span><span class="o">=</span><span class="s1">'macro'</span><span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>Accuracy: <span class="m">0</span>.9024973432518597 macro_f1: <span class="m">0</span>.9001928370898599 </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>Predict newsgroup labels with the trained model.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="n">texts</span> <span class="o">=</span> <span class="p">[</span> <span class="s1">'Who scored the maximum goals?'</span><span class="p">,</span> <span class="s1">'Mars might have water and dragons!'</span><span class="p">,</span> <span class="s1">'CPU is over-clocked, causing it to heating too much!'</span><span class="p">,</span> <span class="s1">'I need to buy new prescriptions.'</span><span class="p">,</span> <span class="s1">'This is just government propaganda.'</span> <span class="p">]</span> <span class="k">for</span> <span class="n">text</span> <span class="ow">in</span> <span class="n">texts</span><span class="p">:</span> <span class="n">ids</span><span class="p">,</span> <span class="n">segments</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">max_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">)</span> <span class="n">inpu</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">ids</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="n">SEQ_LEN</span><span class="p">])</span> <span class="n">predicted_id</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">([</span><span class="n">inpu</span><span class="p">,</span><span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">inpu</span><span class="p">)])</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="nb">print</span> <span class="p">(</span><span class="s2">"</span><span class="si">%s</span><span class="s2">: </span><span class="si">%s</span><span class="s2">"</span><span class="o">%</span> <span class="p">(</span><span class="n">id_to_labels</span><span class="p">[</span><span class="n">predicted_id</span><span class="p">],</span> <span class="n">text</span><span class="p">))</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>rec.sport.hockey: Who scored the maximum goals? sci.space: Mars might have water and dragons! comp.sys.ibm.pc.hardware: CPU is over-clocked, causing it to heating too much! sci.med: I need to buy new prescriptions. talk.politics.misc: This is just government propaganda. talk.politics.misc: This is just government propaganda. </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p><a href="https://colab.research.google.com/drive/1VuPv_SInihZIO9gwy1p0YqYQy76bwBuS" target="_blank" rel="noopener noreferrer"><b>Google Colab</b></a> <b>for 20 Newsgroup Multi-class Text Classification using BERT</b></p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="multilabel-text-classification-using-bert""></a><a class="body-link" href="#multilabel-text-classification-using-bert">Multilabel Text Classification Using BERT</a></h2><p>To demonstrate multi-label text classification we will use <a href="https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge" target="_blank" rel="noopener noreferrer">Toxic Comment Classification dataset</a>. It is a dataset on Kaggle, with Wikipedia comments which have been labeled by human raters for toxic behaviour. The different types o toxicity are: toxic, severe_toxic, obscene, threat, insult and identity_hate. Each comment can have either none or one or more type of toxicity. The dataset has over 100,000 labelled data, but for this tutorial we will use 25% of it to keep training memory and time requirements manageable.</p><p></p><p>Let us first build the training and test datasets.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="kn">from</span> <span class="nn">google.colab</span> <span class="kn">import</span> <span class="n">drive</span> <span class="n">drive</span><span class="o">.</span><span class="n">mount</span><span class="p">(</span><span class="s1">'/content/gdrive'</span><span class="p">)</span> <span class="n">RESOUCE_DIR</span> <span class="o">=</span> <span class="s2">"/content/gdrive/My\ Drive/resources"</span> <span class="c1"># Train/test Files</span> <span class="n">datasets_dir</span> <span class="o">=</span> <span class="s2">"</span><span class="si">%s</span><span class="s2">/datasets/jigsaw-toxic-comment-classification-challenge"</span> <span class="o">%</span> <span class="p">(</span><span class="n">RESOUCE_DIR</span><span class="p">)</span> <span class="n">test_datapath</span> <span class="o">=</span> <span class="s2">"</span><span class="si">%s</span><span class="s2">/test.csv"</span> <span class="o">%</span> <span class="p">(</span><span class="n">datasets_dir</span><span class="p">)</span> <span class="n">test_labels</span> <span class="o">=</span> <span class="s2">"</span><span class="si">%s</span><span class="s2">/test_labels.csv"</span> <span class="o">%</span> <span class="p">(</span><span class="n">datasets_dir</span><span class="p">)</span> <span class="n">train_datapath</span> <span class="o">=</span> <span class="s2">"</span><span class="si">%s</span><span class="s2">/train.csv"</span> <span class="o">%</span> <span class="p">(</span><span class="n">datasets_dir</span><span class="p">)</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">token_dict</span><span class="p">)</span> <span class="k">def</span> <span class="nf">load_data</span><span class="p">(</span><span class="n">comments</span><span class="p">,</span> <span class="n">comment_labels</span><span class="p">):</span> <span class="k">global</span> <span class="n">tokenizer</span> <span class="n">indices</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">comments</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):</span> <span class="n">ids</span><span class="p">,</span> <span class="n">segments</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">comments</span><span class="p">[</span><span class="n">x</span><span class="p">],</span> <span class="n">max_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">)</span> <span class="n">indices</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">ids</span><span class="p">)</span> <span class="n">labels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">comment_labels</span><span class="p">[</span><span class="n">x</span><span class="p">])</span> <span class="n">items</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">indices</span><span class="p">,</span> <span class="n">labels</span><span class="p">))</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">items</span><span class="p">)</span> <span class="n">indices</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">items</span><span class="p">)</span> <span class="n">indices</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">indices</span><span class="p">)</span> <span class="n">mod</span> <span class="o">=</span> <span class="n">indices</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">%</span> <span class="n">BATCH_SIZE</span> <span class="k">if</span> <span class="n">mod</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span> <span class="n">indices</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">indices</span><span class="p">[:</span><span class="o">-</span><span class="n">mod</span><span class="p">],</span> <span class="n">labels</span><span class="p">[:</span><span class="o">-</span><span class="n">mod</span><span class="p">]</span> <span class="k">return</span> <span class="p">[</span><span class="n">indices</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">indices</span><span class="p">)],</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span> <span class="n">train_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">train_datapath</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">,</span> <span class="s1">''</span><span class="p">))</span> <span class="n">train_df</span> <span class="o">=</span> <span class="n">train_df</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">frac</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span><span class="n">random_state</span> <span class="o">=</span> <span class="mi">42</span><span class="p">)</span> <span class="n">train_lines</span> <span class="o">=</span> <span class="n">train_df</span><span class="p">[</span><span class="s1">'comment_text'</span><span class="p">]</span><span class="o">.</span><span class="n">values</span> <span class="n">labels_ordered</span> <span class="o">=</span> <span class="p">[</span> <span class="s1">'toxic'</span><span class="p">,</span> <span class="s1">'severe_toxic'</span><span class="p">,</span> <span class="s1">'obscene'</span><span class="p">,</span> <span class="s1">'threat'</span><span class="p">,</span> <span class="s1">'insult'</span><span class="p">,</span> <span class="s1">'identity_hate'</span> <span class="p">]</span> <span class="n">train_labels</span> <span class="o">=</span> <span class="n">train_df</span><span class="p">[</span><span class="n">labels_ordered</span><span class="p">]</span><span class="o">.</span><span class="n">values</span> <span class="n">train_x</span><span class="p">,</span> <span class="n">train_y</span> <span class="o">=</span> <span class="n">load_data</span><span class="p">(</span><span class="n">train_lines</span><span class="p">,</span> <span class="n">train_labels</span><span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>Next we build model and train it. The multi-label classification layer is a fully-connected dense layer of size 6 (6 possible labels), and we use sigmoid activation function to get independent probabilities of each class.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">load_trained_model_from_checkpoint</span><span class="p">(</span> <span class="n">config_path</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">,</span> <span class="s1">''</span><span class="p">),</span> <span class="n">checkpoint_path</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">,</span> <span class="s1">''</span><span class="p">),</span> <span class="n">training</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">trainable</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">seq_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">,</span> <span class="p">)</span> <span class="c1"># Add dense layer for classification</span> <span class="n">inputs</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span> <span class="n">dense</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">get_layer</span><span class="p">(</span><span class="s1">'NSP-Dense'</span><span class="p">)</span><span class="o">.</span><span class="n">output</span> <span class="n">outputs</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span> <span class="n">units</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">labels_ordered</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s1">'sigmoid'</span><span class="p">,</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'Toxic-Categories-Dense'</span> <span class="p">)(</span><span class="n">dense</span><span class="p">)</span> <span class="n">model</span> <span class="o">=</span> <span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">outputs</span><span class="p">)</span> <span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span> <span class="n">RAdam</span><span class="p">(</span><span class="n">lr</span><span class="o">=</span><span class="n">LR</span><span class="p">),</span> <span class="n">loss</span><span class="o">=</span><span class="s1">'binary_crossentropy'</span><span class="p">,</span> <span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s1">'accuracy'</span><span class="p">],</span> <span class="p">)</span> <span class="n">history</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span> <span class="n">train_x</span><span class="p">,</span> <span class="n">train_y</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="n">EPOCHS</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">BATCH_SIZE</span><span class="p">,</span> <span class="n">validation_split</span><span class="o">=</span><span class="mf">0.33</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="p">)</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>Train on <span class="m">26724</span> samples, validate on <span class="m">13164</span> samples Epoch <span class="m">1</span>/2 <span class="m">26724</span>/26724 <span class="o">[==============================]</span> - 1251s 47ms/sample - loss: <span class="m">0</span>.0858 - acc: <span class="m">0</span>.9660 - val_loss: <span class="m">0</span>.0450 - val_acc: <span class="m">0</span>.9822 Epoch <span class="m">2</span>/2 <span class="m">26724</span>/26724 <span class="o">[==============================]</span> - 1235s 46ms/sample - loss: <span class="m">0</span>.0404 - acc: <span class="m">0</span>.9845 - val_loss: <span class="m">0</span>.0431 - val_acc: <span class="m">0</span>.9827 </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>We see that in just 2 epoch, our model achieved a 98% accuracy on the validation set. We can further save this model and use this model to generate labels as follows:</p></div></div> <div class="block-code"><div class='row codeblock-header'>Python</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span><span class="n">texts</span> <span class="o">=</span> <span class="p">[</span> <span class="s1">'You are an idiot!'</span><span class="p">,</span> <span class="s1">'You are a drug addict!'</span><span class="p">,</span> <span class="s1">'I will kill you!'</span><span class="p">,</span> <span class="s1">'I want to goto London'</span><span class="p">,</span> <span class="p">]</span> <span class="k">for</span> <span class="n">text</span> <span class="ow">in</span> <span class="n">texts</span><span class="p">:</span> <span class="n">ids</span><span class="p">,</span> <span class="n">segments</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">max_len</span><span class="o">=</span><span class="n">SEQ_LEN</span><span class="p">)</span> <span class="n">inpu</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">ids</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="n">SEQ_LEN</span><span class="p">])</span> <span class="n">predicted</span> <span class="o">=</span> <span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">([</span><span class="n">inpu</span><span class="p">,</span><span class="n">np</span><span class="o">.</span><span class="n">zeros_like</span><span class="p">(</span><span class="n">inpu</span><span class="p">)])</span> <span class="o">>=</span> <span class="mf">0.5</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span> <span class="n">labels</span> <span class="o">=</span> <span class="p">[</span> <span class="n">label</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">label</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">labels_ordered</span><span class="p">)</span> <span class="k">if</span> <span class="n">predicted</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="p">]</span> <span class="nb">print</span> <span class="p">(</span><span class="s2">"</span><span class="si">%s</span><span class="s2">: </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="n">labels</span><span class="p">))</span> </pre></div> </td></tr></table></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>You are an idiot!: <span class="o">[</span><span class="s1">'toxic'</span>, <span class="s1">'obscene'</span>, <span class="s1">'insult'</span><span class="o">]</span> You are a drug addict!: <span class="o">[</span><span class="s1">'toxic'</span><span class="o">]</span> I will <span class="nb">kill</span> you!: <span class="o">[</span><span class="s1">'toxic'</span>, <span class="s1">'threat'</span><span class="o">]</span> I want to goto London: <span class="o">[]</span> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p><a href="https://colab.research.google.com/drive/1UEnLAFs1Hrr1NCCQ2Apu5CtRZSK4DHDi" target="_blank" rel="noopener noreferrer"><b>Google Colab</b></a> <b>for Toxic Comment Classification with BERT fine tuning.</b></p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="conclusion""></a><a class="body-link" href="#conclusion">Conclusion</a></h2><p>In this tutorial, we learnt how to use BERT with fine tuning for text classification. We saw that how using the pre-trained BERT model and just one additional classification layer, we can achieve high classification accuracy for different text classification tasks. BERT proves to be a very powerful language model and can be of immense value for text classification tasks.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="resources-amp-references""></a><a class="body-link" href="#resources-amp-references">Resources & References</a></h2><ol><li><a href="https://arxiv.org/abs/1810.04805" target="_blank" rel="noopener noreferrer">Paper on BERT</a></li><li><a href="https://pypi.org/project/keras-bert/" target="_blank" rel="noopener noreferrer">keras-bert</a></li><li><a href="https://colab.research.google.com/drive/14b2rbIgwhQ1BI-zkyiMjQv-jV85xj9tf" target="_blank" rel="noopener noreferrer">Google Colab for IMDB sentiment analysis with BERT fine tuning</a></li><li><a href="https://colab.research.google.com/drive/1VuPv_SInihZIO9gwy1p0YqYQy76bwBuS" target="_blank" rel="noopener noreferrer">Google Colab For 20 Newsgroup Multi-class Text Classification using BERT</a></li><li><a href="https://colab.research.google.com/drive/1UEnLAFs1Hrr1NCCQ2Apu5CtRZSK4DHDi" target="_blank" rel="noopener noreferrer">Google Colab for Toxic Comment Classification with BERT fine tuning.</a></li></ol><p></p></div></div>How to Reverse Python Lists | In-place, slicing & reversed()2020-05-18T17:55:01.148382+00:002020-05-18T18:24:22.321324+00:00https://example.com/python-tutorials/how-reverse-python-lists-in-place-slicing-reversed/Foo<div class="block-paragraph"><div class="rich-text"><p>Python lists can be reversed using built-in methods reverse(), reversed() or by [::-1] list slicing technique. The reverse() built-in method reverses the list in place while the slicing technique creates a copy of the original list. The reversed() method simply returns a list iterator that returns elements in reverse order.</p></div></div> <div class="block-paragraph"><div class="rich-text"><p>Below are the three built-in, common method used for reversing Python lists.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h4><b>1. Reversing lists in-place using reverse()</b></h4></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">nums</span> <span class="o">=</span> <span class="o">[</span><span class="m">1</span>,2,3,4,5,6,7,8<span class="o">]</span> >>> type<span class="o">(</span>nums.reverse<span class="o">())</span> <<span class="nb">type</span> <span class="s1">'NoneType'</span>> >>> nums <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><h4><b>2. Reversing lists using slicing (creates a new copy)</b></h4></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">nums</span> <span class="o">=</span> <span class="o">[</span><span class="m">1</span>,2,3,4,5,6,7,8<span class="o">]</span> >>> >>> <span class="nv">nums_reversed</span> <span class="o">=</span> nums<span class="o">[</span>::-1<span class="o">]</span> >>> nums_reversed <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> type<span class="o">(</span>nums_reversed<span class="o">)</span> <<span class="nb">type</span> <span class="s1">'list'</span>> >>> >>> nums <span class="o">[</span><span class="m">1</span>, <span class="m">2</span>, <span class="m">3</span>, <span class="m">4</span>, <span class="m">5</span>, <span class="m">6</span>, <span class="m">7</span>, <span class="m">8</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><h4><b>3. Reversing lists using reversed</b></h4></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">nums</span> <span class="o">=</span> <span class="o">[</span><span class="m">1</span>,2,3,4,5,6,7,8<span class="o">]</span> >>> reversed<span class="o">(</span>nums<span class="o">)</span> <listreverseiterator object at 0x10fced990> >>> >>> <span class="o">[</span>n <span class="k">for</span> n in reversed<span class="o">(</span>nums<span class="o">)]</span> <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>Let us look at each in detail to understand pros, cons and when to use a particular method.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="using-reverse-for-in-place-list-reversal""></a><a class="body-link" href="#using-reverse-for-in-place-list-reversal">Using reverse() for In-Place List Reversal</a></h2></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">nums</span> <span class="o">=</span> <span class="o">[</span><span class="m">1</span>,2,3,4,5,6,7,8<span class="o">]</span> >>> type<span class="o">(</span>nums.reverse<span class="o">())</span> <<span class="nb">type</span> <span class="s1">'NoneType'</span>> >>> nums <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><h4><b>Time and Space Complexity of Python List reverse()</b></h4><p>The reverse() method works in O(n) time complexity and with O(1) space. Internally, when reverse() is called it operates by swapping i-th element with (n-i)th element. Therefore, the first element is replaced with the last element, the second element is replaced with the second last element and so on. Thus, a total of N/2 swap operations are required for list reversal. That makes the overall time complexity as O(N/2) which is same as O(N)</p></div></div> <div class="block-paragraph"><div class="rich-text"><h4><b>Pros of reverse methods:</b></h4><ul><li>In-Place</li><li>Intuitive and easy to understand, it upholds code readability.</li></ul><h4><b>Cons of reverse() method:</b></h4><ul><li>The order of elements in the original list is changed.</li></ul><h4><b>When to use reverse() methods ?</b></h4><p>Scenarios where order of elements in the original list can be altered and keeping a low memory footprint is desired .</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="using-slicing-for-python-list-reversal""></a><a class="body-link" href="#using-slicing-for-python-list-reversal">Using Slicing For Python List Reversal</a></h2></div></div> <div class="block-paragraph"><div class="rich-text"><p>Python lists can be reversed using the [::-1] slicing suffix. It creates and returns a new copy of the list without altering the actual list.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">nums</span> <span class="o">=</span> <span class="o">[</span><span class="m">1</span>,2,3,4,5,6,7,8<span class="o">]</span> >>> >>> <span class="nv">nums_reversed</span> <span class="o">=</span> nums<span class="o">[</span>::-1<span class="o">]</span> >>> nums_reversed <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> type<span class="o">(</span>nums_reversed<span class="o">)</span> <<span class="nb">type</span> <span class="s1">'list'</span>> >>> >>> nums <span class="o">[</span><span class="m">1</span>, <span class="m">2</span>, <span class="m">3</span>, <span class="m">4</span>, <span class="m">5</span>, <span class="m">6</span>, <span class="m">7</span>, <span class="m">8</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>What does [::-1] notation mean? It means to select elements starting from the first element till the last element with a stride of negative one, i.e, in reverse order. The list slicing notation is [start:end:step], so here start=end=None means the defaults (0 and n-1) and step=-1 implies reverse order.</p></div></div> <div class="block-paragraph"><div class="rich-text"><p>What are the pros, cons of using slicing for list reversal, and when should we prefer slicing over reverse() or reversed() ?</p></div></div> <div class="block-paragraph"><div class="rich-text"><h4><b>Pros of list slicing for list reversal:</b></h4><ul><li>The original list is not altered. The order of elements in the original arrays is maintained before and after the slicing operation.</li></ul></div></div> <div class="block-paragraph"><div class="rich-text"><h4><b>Cons:</b></h4><ul><li>Takes extra space by creating a list of the same size.</li><li>While [::-1] notation is shorter, it is cryptic and requires more attention to understand as compared to english words syntax reverse() or reversed(). In short not the best for code readability.</li></ul></div></div> <div class="block-paragraph"><div class="rich-text"><h4><b>When to use slicing or Python List Reversal:</b></h4><ul><li>If it is a requirement to preserve the order of elements in the original list.</li><li>It is fine to allocate extra memory for the copy of the list.</li></ul></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="using-reversed-for-python-list-reversal""></a><a class="body-link" href="#using-reversed-for-python-list-reversal">Using reversed() for Python list reversal</a></h2><p>Python lists can also be reversed using the built-in reversed() method. The reversed() method neither reverses the list in-place nor it creates a copy of the full list. It instead returns a list iterator(<i>listreverseiterator</i>) that generates elements of the list in reverse order.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">nums</span> <span class="o">=</span> <span class="o">[</span><span class="m">1</span>,2,3,4,5,6,7,8<span class="o">]</span> >>> reversed<span class="o">(</span>nums<span class="o">)</span> <listreverseiterator object at 0x10fced990> >>> >>> <span class="o">[</span>n <span class="k">for</span> n in reversed<span class="o">(</span>nums<span class="o">)]</span> <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> >>> >>> def reverse_python_list<span class="o">(</span>nums<span class="o">)</span>: ... <span class="k">for</span> num in reversed<span class="o">(</span>nums<span class="o">)</span>: ... yield num ... >>> >>> list<span class="o">(</span>reverse_python_list<span class="o">([</span><span class="m">1</span>,2,3,4,5,6,7,8<span class="o">]))</span> <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><p>Note that calling reversed(<i>nums</i>) simply returns an iterator object. We can see in the following example that the <i>reverse_python_list</i> method, which simply wraps the reversed() method, does not modify the original list or create a copy of the list.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h4><b>Pros of reversed() for list reversal:</b></h4><ul><li>No extra space is required</li><li>The original list remains unchanged</li><li>The syntax aids to code readability</li></ul><h4><b>Cons:</b></h4><ul><li>None really. Just that extra caution needs to be exercised with iterators.The returned iterator can be used only once(it gets exhausted on looping over once). So, if it is required to access the reversed list multiple times, we need to create a copy of the list or call the reversed() function multiple times.</li></ul></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="common-list-reversal-problems""></a><a class="body-link" href="#common-list-reversal-problems">Common List Reversal Problems</a></h2><p>Let us take a look at a few other common Python Lists reversal related problems.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h3><a id="how-to-reverse-a-list-in-python-using-for-loop""></a><a class="body-link" href="#how-to-reverse-a-list-in-python-using-for-loop">How to reverse a list in python using for loop ?</a></h3><p>To reverse a list of size n using for loop, iterate on the list from (n-1)th element to the first element and yield each element.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7 8 9</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> def reverse_list_using_for<span class="o">(</span>nums<span class="o">)</span>: ... <span class="c1"># Traverse [n-1, -1) , in the opposite direction.</span> ... <span class="k">for</span> i in range<span class="o">(</span>len<span class="o">(</span>nums<span class="o">)</span>-1, -1, -1<span class="o">)</span>: ... yield nums<span class="o">[</span>i<span class="o">]</span> ... >>> >>> print list<span class="o">(</span>reverse_list_using_for<span class="o">([</span><span class="m">1</span>,2,3,4,5,6,7<span class="o">]))</span> <span class="o">[</span><span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><h3><a id="how-to-reverse-python-list-using-recursion""></a><a class="body-link" href="#how-to-reverse-python-list-using-recursion">How to reverse python list using recursion ?</a></h3><p>To reverse a list of using recursion, we define a method that returns sum of two lists, the first being the last element (selected by -1 index) and the second one being reverse of the entire list upto the last element (selected by :-1). The base condition is met when all the elements are exhausted and the array is empty, upon which we return an empty array. Below is the functional code.</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-syntax responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7 8 9</pre></div></td><td class="code"><div class="highlight-syntax responsive-table codehighlight "><pre><span></span>>>> def reverse_list_using_recursion<span class="o">(</span>nums<span class="o">)</span>: ... <span class="k">if</span> not nums: ... <span class="k">return</span> <span class="o">[]</span> ... <span class="k">return</span> <span class="o">[</span>nums<span class="o">[</span>-1<span class="o">]]</span> + reverse_list_using_recursion<span class="o">(</span>nums<span class="o">[</span>:-1<span class="o">])</span> ... >>> >>> print reverse_list_using_recursion<span class="o">([</span><span class="m">1</span>,2,3,4,5,6,7<span class="o">])</span> <span class="o">[</span><span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><h3><a id="how-to-reverse-partsubset-or-slice-of-python-list""></a><a class="body-link" href="#how-to-reverse-partsubset-or-slice-of-python-list">How to reverse part(subset or slice) of Python list?</a></h3><p>To reverse a part of a list, the built-in reverse, reversed or slicing methods can be used on the subset identified by slicing. The following code shows all the three approaches:</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">nums</span> <span class="o">=</span> <span class="o">[</span><span class="m">1</span>, <span class="m">2</span>, <span class="m">3</span>, <span class="m">4</span>, <span class="m">5</span>, <span class="m">6</span>, <span class="m">7</span>, <span class="m">8</span>, <span class="m">9</span>, <span class="m">10</span><span class="o">]</span> >>> <span class="c1"># Method: 1 Using Slicing</span> >>> nums<span class="o">[</span><span class="m">3</span>:8<span class="o">][</span>::-1<span class="o">]</span> <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span><span class="o">]</span> >>> >>> <span class="c1"># Method: 2 Using reverse()</span> >>> <span class="nv">nums_subset</span> <span class="o">=</span> nums<span class="o">[</span><span class="m">3</span>:8<span class="o">]</span> >>> nums_subset.reverse<span class="o">()</span> >>> nums_subet <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span><span class="o">]</span> >>> >>> <span class="c1"># Method: 3 Using reversed()</span> >>> list<span class="o">(</span>reversed<span class="o">(</span>nums<span class="o">[</span><span class="m">3</span>:8<span class="o">]))</span> <span class="o">[</span><span class="m">8</span>, <span class="m">7</span>, <span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span><span class="o">]</span> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><h3><a id="how-to-reverse-python-numpy-array""></a><a class="body-link" href="#how-to-reverse-python-numpy-array">How to reverse Python Numpy Array?</a></h3><p>The numpy arrays can be reversed using the slicing technique (using [::-1] slice descriptor) or by using numpy’s flipud method. The following code shows the usage of both:</p></div></div> <div class="block-code"><div class='row codeblock-header'>Bash</div><table class="highlight-xcode responsive-table codehighlight table"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15</pre></div></td><td class="code"><div class="highlight-xcode responsive-table codehighlight "><pre><span></span>>>> <span class="nv">np_array</span> <span class="o">=</span> np.array<span class="o">([</span><span class="m">1</span>, <span class="m">2</span>, <span class="m">3</span>, <span class="m">4</span>, <span class="m">5</span>, <span class="m">6</span><span class="o">])</span> >>> <span class="c1"># Method 1: Using slicing </span> >>> np_array<span class="o">[</span>::-1<span class="o">]</span> array<span class="o">([</span><span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">])</span> >>> type<span class="o">(</span>np_array<span class="o">[</span>::-1<span class="o">])</span> <<span class="nb">type</span> <span class="s1">'numpy.ndarray'</span>> >>> >>> <span class="c1"># Method 1: Using flipud</span> >>> np.flipud<span class="o">(</span>np_array<span class="o">)</span> array<span class="o">([</span><span class="m">6</span>, <span class="m">5</span>, <span class="m">4</span>, <span class="m">3</span>, <span class="m">2</span>, <span class="m">1</span><span class="o">])</span> >>> type<span class="o">(</span>np.flipud<span class="o">(</span>np_array<span class="o">))</span> <<span class="nb">type</span> <span class="s1">'numpy.ndarray'</span>> >>> </pre></div> </td></tr></table></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="summary""></a><a class="body-link" href="#summary">Summary</a></h2><p>In this tutorial we learnt the three techniques for Python list reversal, viz reverse(), reversed() and slicing. We also looked at the pros and cons of each method.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h4><b>So, which is the best way to reverse list in python?</b></h4><p>The answer depends on the requirements. If the requirement is to maintain the order of original elements then reversed() or slicing technique should be used. If the requirement is to have minimal memory footprint, reverse() or reversed() are more suited. If it is required to have a minimal memory footprint along with maintaining order of elements in the original list, reversed() should be used. In general, if there is no such preference, reverse() or reversed() can be preferred over slicing technique as it aids to code readability.</p></div></div>Web Scraping at scale using Python Multithreading2020-01-23T12:08:35.287220+00:002020-01-25T22:49:22.548566+00:00https://example.com/web-scraping-scale-using-python-multithreading/Foo<div class="block-heading">Web Scraping at scale using Python Multithreading</div>About2020-01-23T12:12:55.392826+00:002020-05-13T07:00:06.182066+00:00https://example.com/about/Foo<div class="block-two_columns"><div class="row"> <div class="col m6"> <section class="block-paragraph"> <div class="rich-text"><h2><a id="welcome-to-pysnacks""></a><a class="body-link" href="#welcome-to-pysnacks">Welcome to PySnacks!</a></h2><p><b>PySnacks brings quality Python tutorials on Data Structures, Machine Learning, Web and Backend development.</b></p><p></p><p>Hi There! My name is Kundan Kumar and I am the founder, publisher and the gatekeeper of PySnacks. I believe learning should never stop. I created PySnacks to share what I learn, with a hope that it may help others with similar interest.</p><p>I am a software engineer. I started in the software industry in 2011, and have worked with Samsung R&D, Ittiam Systems and LeadSift.</p><p>In 2017, I moved to Canada to pursue Masters in Computer Science. I currently work at LeadSift where I work in the field of data-mining/machine-learning, data pipelines and web application backends.</p><p></p><p>It would be my pleasure to know my readers, and would love to add you to my <a href="https://www.linkedin.com/in/kundan-linkedin/" target="_blank" rel="noopener noreferrer">LinkedIn</a> network. You can also like and follow us on our social networks:</p><ol><li>PySnacks <a href="https://www.facebook.com/pysnacks/" target="_blank" rel="noopener noreferrer">Facebook Page</a></li><li>PySnacks <a href="https://twitter.com/pysnacks" target="_blank" rel="noopener noreferrer">Twitter</a></li></ol><p></p></div> </section> </div> <div class="col m6"> <section class="block-paragraph"> <div class="rich-text"><p></p><img alt="kundan-kumar" class="richtext-image full-width img-responsive lazyload" height="770"data-src="https://pysnacks-media.s3.amazonaws.com/images/WhatsApp_Image_2020-05-03_at_9.04.12_AM.width-1280.jpg" width="1024"><p>Me with my wife, Amalfi Coast, Italy, Feb-2019</p></div> </section> </div> </div> </div>Contact2020-01-23T12:14:04.676620+00:002020-01-23T12:14:33.079887+00:00https://example.com/contact/Foo<div class="block-heading">Contact</div>Privacy Policy2020-05-03T08:31:56.144946+00:002020-05-03T08:52:33.447702+00:00https://example.com/privacy-policy/Foo<div class="block-paragraph"><div class="rich-text"><h2><a id="welcome-to-our-privacy-policy""></a><a class="body-link" href="#welcome-to-our-privacy-policy">Welcome to our Privacy Policy</a></h2><h3><a id="your-privacy-is-important-to-us""></a><a class="body-link" href="#your-privacy-is-important-to-us">Your privacy is important to us.</a></h3><p>PySnacks is located at:</p><p>PySnacks, North End Halifax, B3K 5X5 - Nova Scotia , Canada</p><p>It is PySnacks's policy to respect your privacy regarding any information we may collect while operating our website. This Privacy Policy applies to <a href="https://www.pysnacks.com" target="_blank" rel="noopener noreferrer">https://www.pysnacks.com</a> (hereinafter, "us", "we", or "https://www.pysnacks.com"). We respect your privacy and are committed to protecting personally identifiable information you may provide us through the Website. We have adopted this privacy policy ("Privacy Policy") to explain what information may be collected on our Website, how we use this information, and under what circumstances we may disclose the information to third parties. This Privacy Policy applies only to information we collect through the Website and does not apply to our collection of information from other sources.</p><p>This Privacy Policy, together with the Terms and conditions posted on our Website, set forth the general rules and policies governing your use of our Website. Depending on your activities when visiting our Website, you may be required to agree to additional terms and conditions.</p></div></div> <div class="block-paragraph"><div class="rich-text"><h2><a id="website-visitors""></a><a class="body-link" href="#website-visitors">Website Visitors</a></h2><p>Like most website operators, PySnacks collects non-personally-identifying information of the sort that web browsers and servers typically make available, such as the browser type, language preference, referring site, and the date and time of each visitor request. PySnacks's purpose in collecting non-personally identifying information is to better understand how PySnacks's visitors use its website. From time to time, PySnacks may release non-personally-identifying information in the aggregate, e.g., by publishing a report on trends in the usage of its website.</p><p>PySnacks also collects potentially personally-identifying information like Internet Protocol (IP) addresses for logged in users and for users leaving comments on https://www.pysnacks.com blog posts. PySnacks only discloses logged in user and commenter IP addresses under the same circumstances that it uses and discloses personally-identifying information as described below.</p><h2><a id="gathering-of-personally-identifying-information""></a><a class="body-link" href="#gathering-of-personally-identifying-information">Gathering of Personally-Identifying Information</a></h2><p>Certain visitors to PySnacks's websites choose to interact with PySnacks in ways that require PySnacks to gather personally-identifying information. The amount and type of information that PySnacks gathers depends on the nature of the interaction. For example, we ask visitors who sign up for a blog at https://www.pysnacks.com to provide a username and email address.</p><h2><a id="security""></a><a class="body-link" href="#security">Security</a></h2><p>The security of your Personal Information is important to us, but remember that no method of transmission over the Internet, or method of electronic storage is 100% secure. While we strive to use commercially acceptable means to protect your Personal Information, we cannot guarantee its absolute security.</p><h2><a id="advertisements""></a><a class="body-link" href="#advertisements">Advertisements</a></h2><p>Ads appearing on our website may be delivered to users by advertising partners, who may set cookies. These cookies allow the ad server to recognize your computer each time they send you an online advertisement to compile information about you or others who use your computer. This information allows ad networks to, among other things, deliver targeted advertisements that they believe will be of most interest to you. This Privacy Policy covers the use of cookies by PySnacks and does not cover the use of cookies by any advertisers.</p><h2><a id="links-to-external-sites""></a><a class="body-link" href="#links-to-external-sites">Links To External Sites</a></h2><p>Our Service may contain links to external sites that are not operated by us. If you click on a third party link, you will be directed to that third party's site. We strongly advise you to review the Privacy Policy and terms and conditions of every site you visit.</p><p>We have no control over, and assume no responsibility for the content, privacy policies or practices of any third party sites, products or services.</p><p></p><h2><a id="httpswwwpysnackscom-uses-google-adwords-for-remarketing""></a><a class="body-link" href="#httpswwwpysnackscom-uses-google-adwords-for-remarketing">Https://www.pysnacks.com uses Google AdWords for remarketing</a></h2><p>Https://www.pysnacks.com uses the remarketing services to advertise on third party websites (including Google) to previous visitors to our site. It could mean that we advertise to previous visitors who haven't completed a task on our site, for example using the contact form to make an enquiry. This could be in the form of an advertisement on the Google search results page, or a site in the Google Display Network. Third-party vendors, including Google, use cookies to serve ads based on someone's past visits. Of course, any data collected will be used in accordance with our own privacy policy and Google's privacy policy.</p><p>You can set preferences for how Google advertises to you using the Google Ad Preferences page, and if you want to you can opt out of interest-based advertising entirely by cookie settings or permanently using a browser plugin.</p><h2><a id="protection-of-certain-personally-identifying-information""></a><a class="body-link" href="#protection-of-certain-personally-identifying-information">Protection of Certain Personally-Identifying Information</a></h2><p>PySnacks discloses potentially personally-identifying and personally-identifying information only to those of its employees, contractors and affiliated organizations that (i) need to know that information in order to process it on PySnacks's behalf or to provide services available at PySnacks's website, and (ii) that have agreed not to disclose it to others. Some of those employees, contractors and affiliated organizations may be located outside of your home country; by using PySnacks's website, you consent to the transfer of such information to them. PySnacks will not rent or sell potentially personally-identifying and personally-identifying information to anyone. Other than to its employees, contractors and affiliated organizations, as described above, PySnacks discloses potentially personally-identifying and personally-identifying information only in response to a subpoena, court order or other governmental request, or when PySnacks believes in good faith that disclosure is reasonably necessary to protect the property or rights of PySnacks, third parties or the public at large.</p><p>If you are a registered user of https://www.pysnacks.com and have supplied your email address, PySnacks may occasionally send you an email to tell you about new features, solicit your feedback, or just keep you up to date with what's going on with PySnacks and our products. We primarily use our blog to communicate this type of information, so we expect to keep this type of email to a minimum. If you send us a request (for example via a support email or via one of our feedback mechanisms), we reserve the right to publish it in order to help us clarify or respond to your request or to help us support other users. PySnacks takes all measures reasonably necessary to protect against the unauthorized access, use, alteration or destruction of potentially personally-identifying and personally-identifying information.</p><h2><a id="aggregated-statistics""></a><a class="body-link" href="#aggregated-statistics">Aggregated Statistics</a></h2><p>PySnacks may collect statistics about the behavior of visitors to its website. PySnacks may display this information publicly or provide it to others. However, PySnacks does not disclose your personally-identifying information.</p><h2><a id="cookies""></a><a class="body-link" href="#cookies">Cookies</a></h2><p>To enrich and perfect your online experience, PySnacks uses "Cookies", similar technologies and services provided by others to display personalized content, appropriate advertising and store your preferences on your computer.</p><p>A cookie is a string of information that a website stores on a visitor's computer, and that the visitor's browser provides to the website each time the visitor returns. PySnacks uses cookies to help PySnacks identify and track visitors, their usage of https://www.pysnacks.com, and their website access preferences. PySnacks visitors who do not wish to have cookies placed on their computers should set their browsers to refuse cookies before using PySnacks's websites, with the drawback that certain features of PySnacks's websites may not function properly without the aid of cookies.</p><p>By continuing to navigate our website without changing your cookie settings, you hereby acknowledge and agree to PySnacks's use of cookies.</p><h2><a id="privacy-policy-changes""></a><a class="body-link" href="#privacy-policy-changes">Privacy Policy Changes</a></h2><p>Although most changes are likely to be minor, PySnacks may change its Privacy Policy from time to time, and in PySnacks's sole discretion. PySnacks encourages visitors to frequently check this page for any changes to its Privacy Policy. Your continued use of this site after any change in this Privacy Policy will constitute your acceptance of such change.</p><p></p><h2><a id="contact-information""></a><a class="body-link" href="#contact-information">Contact Information</a></h2><p>If you have any questions about this Privacy Policy, please contact us via <a href="mailto:hello@pysnacks.com">hello@pysnacks.com</a></p></div></div>