Document (#38866)

Author
Karpathy, A.
Title
¬The unreasonable effectiveness of recurrent neural networks
Source
https://karpathy.github.io/2015/05/21/rnn-effectiveness/
Year
2015
Abstract
There's something magical about Recurrent Neural Networks (RNNs). I still remember when I trained my first recurrent network for Image Captioning. Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense. Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times. What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I've in fact reached the opposite conclusion). Fast forward about a year: I'm training RNNs all the time and I've witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me. This post is about sharing some of that magic with you. By the way, together with this post I am also releasing code on Github (https://github.com/karpathy/char-rnn) that allows you to train character-level language models based on multi-layer LSTMs. You give it a large chunk of text and it will learn to generate text like it one character at a time. You can also use it to reproduce my experiments below. But we're getting ahead of ourselves; What are RNNs anyway?

Similar documents (content)

  1. Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.23
    0.23343934 = sum of:
      0.23343934 = product of:
        0.8337119 = sum of:
          0.010076349 = weight(abstract_txt:with in 2868) [ClassicSimilarity], result of:
            0.010076349 = score(doc=2868,freq=1.0), product of:
              0.05167069 = queryWeight, product of:
                1.0623134 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019485999 = queryNorm
              0.19501092 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.061870273 = weight(abstract_txt:networks in 2868) [ClassicSimilarity], result of:
            0.061870273 = score(doc=2868,freq=2.0), product of:
              0.10914658 = queryWeight, product of:
                1.0917435 = boost
                5.1305847 = idf(docFreq=713, maxDocs=44421)
                0.019485999 = queryNorm
              0.5668549 = fieldWeight in 2868, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1305847 = idf(docFreq=713, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.029174373 = weight(abstract_txt:about in 2868) [ClassicSimilarity], result of:
            0.029174373 = score(doc=2868,freq=1.0), product of:
              0.09536673 = queryWeight, product of:
                1.2498549 = boost
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.019485999 = queryNorm
              0.3059177 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.021423684 = weight(abstract_txt:that in 2868) [ClassicSimilarity], result of:
            0.021423684 = score(doc=2868,freq=4.0), product of:
              0.05797689 = queryWeight, product of:
                1.2580937 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019485999 = queryNorm
              0.3695211 = fieldWeight in 2868, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.07096061 = weight(abstract_txt:generate in 2868) [ClassicSimilarity], result of:
            0.07096061 = score(doc=2868,freq=1.0), product of:
              0.15067586 = queryWeight, product of:
                1.2827363 = boost
                6.0281444 = idf(docFreq=290, maxDocs=44421)
                0.019485999 = queryNorm
              0.4709488 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0281444 = idf(docFreq=290, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.18679874 = weight(abstract_txt:neural in 2868) [ClassicSimilarity], result of:
            0.18679874 = score(doc=2868,freq=3.0), product of:
              0.19917783 = queryWeight, product of:
                1.4748099 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.019485999 = queryNorm
              0.93784904 = fieldWeight in 2868, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.45340788 = weight(abstract_txt:recurrent in 2868) [ClassicSimilarity], result of:
            0.45340788 = score(doc=2868,freq=2.0), product of:
              0.47138807 = queryWeight, product of:
                2.7787564 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.019485999 = queryNorm
              0.9618569 = fieldWeight in 2868, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
        0.28 = coord(7/25)
    
  2. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention Is all you need (2017) 0.19
    0.1925091 = sum of:
      0.1925091 = product of:
        0.60159093 = sum of:
          0.011400087 = weight(abstract_txt:with in 1972) [ClassicSimilarity], result of:
            0.011400087 = score(doc=1972,freq=2.0), product of:
              0.05167069 = queryWeight, product of:
                1.0623134 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019485999 = queryNorm
              0.22062966 = fieldWeight in 1972, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
          0.05969408 = weight(abstract_txt:training in 1972) [ClassicSimilarity], result of:
            0.05969408 = score(doc=1972,freq=3.0), product of:
              0.10803203 = queryWeight, product of:
                1.086155 = boost
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.019485999 = queryNorm
              0.5525591 = fieldWeight in 1972, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
          0.03499911 = weight(abstract_txt:networks in 1972) [ClassicSimilarity], result of:
            0.03499911 = score(doc=1972,freq=1.0), product of:
              0.10914658 = queryWeight, product of:
                1.0917435 = boost
                5.1305847 = idf(docFreq=713, maxDocs=44421)
                0.019485999 = queryNorm
              0.32066154 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1305847 = idf(docFreq=713, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
          0.008569474 = weight(abstract_txt:that in 1972) [ClassicSimilarity], result of:
            0.008569474 = score(doc=1972,freq=1.0), product of:
              0.05797689 = queryWeight, product of:
                1.2580937 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019485999 = queryNorm
              0.14780845 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
          0.02775783 = weight(abstract_txt:time in 1972) [ClassicSimilarity], result of:
            0.02775783 = score(doc=1972,freq=1.0), product of:
              0.10705154 = queryWeight, product of:
                1.3242123 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.019485999 = queryNorm
              0.2592941 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
          0.08627864 = weight(abstract_txt:neural in 1972) [ClassicSimilarity], result of:
            0.08627864 = score(doc=1972,freq=1.0), product of:
              0.19917783 = queryWeight, product of:
                1.4748099 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.019485999 = queryNorm
              0.43317392 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
          0.116405465 = weight(abstract_txt:train in 1972) [ClassicSimilarity], result of:
            0.116405465 = score(doc=1972,freq=1.0), product of:
              0.24319485 = queryWeight, product of:
                1.6296439 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.019485999 = queryNorm
              0.47865102 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
          0.25648624 = weight(abstract_txt:recurrent in 1972) [ClassicSimilarity], result of:
            0.25648624 = score(doc=1972,freq=1.0), product of:
              0.47138807 = queryWeight, product of:
                2.7787564 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.019485999 = queryNorm
              0.54410845 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=1972)
        0.32 = coord(8/25)
    
  3. Haynes, M.: Your Google algorithm cheat sheet : Panda, Penguin, and Hummingbird (2013) 0.18
    0.18080154 = sum of:
      0.18080154 = product of:
        0.64571977 = sum of:
          0.009026315 = weight(abstract_txt:this in 3542) [ClassicSimilarity], result of:
            0.009026315 = score(doc=3542,freq=1.0), product of:
              0.04801561 = queryWeight, product of:
                1.0240514 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.019485999 = queryNorm
              0.18798709 = fieldWeight in 3542, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=3542)
          0.010076349 = weight(abstract_txt:with in 3542) [ClassicSimilarity], result of:
            0.010076349 = score(doc=3542,freq=1.0), product of:
              0.05167069 = queryWeight, product of:
                1.0623134 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019485999 = queryNorm
              0.19501092 = fieldWeight in 3542, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=3542)
          0.04462938 = weight(abstract_txt:still in 3542) [ClassicSimilarity], result of:
            0.04462938 = score(doc=3542,freq=1.0), product of:
              0.11060617 = queryWeight, product of:
                1.099019 = boost
                5.164776 = idf(docFreq=689, maxDocs=44421)
                0.019485999 = queryNorm
              0.4034981 = fieldWeight in 3542, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.164776 = idf(docFreq=689, maxDocs=44421)
                0.078125 = fieldNorm(doc=3542)
          0.05053149 = weight(abstract_txt:about in 3542) [ClassicSimilarity], result of:
            0.05053149 = score(doc=3542,freq=3.0), product of:
              0.09536673 = queryWeight, product of:
                1.2498549 = boost
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.019485999 = queryNorm
              0.52986497 = fieldWeight in 3542, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.078125 = fieldNorm(doc=3542)
          0.021423684 = weight(abstract_txt:that in 3542) [ClassicSimilarity], result of:
            0.021423684 = score(doc=3542,freq=4.0), product of:
              0.05797689 = queryWeight, product of:
                1.2580937 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019485999 = queryNorm
              0.3695211 = fieldWeight in 3542, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3542)
          0.075439416 = weight(abstract_txt:post in 3542) [ClassicSimilarity], result of:
            0.075439416 = score(doc=3542,freq=1.0), product of:
              0.15695108 = queryWeight, product of:
                1.309175 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.019485999 = queryNorm
              0.48065558 = fieldWeight in 3542, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.078125 = fieldNorm(doc=3542)
          0.43459317 = weight(title_txt:your in 3542) [ClassicSimilarity], result of:
            0.43459317 = score(doc=3542,freq=1.0), product of:
              0.20016158 = queryWeight, product of:
                1.4784474 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.019485999 = queryNorm
              2.1712117 = fieldWeight in 3542, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.3125 = fieldNorm(doc=3542)
        0.28 = coord(7/25)
    
  4. Kelley, D.: Relevance feedback : getting to know your user (2008) 0.15
    0.150788 = sum of:
      0.150788 = product of:
        0.6282833 = sum of:
          0.012507228 = weight(abstract_txt:this in 2924) [ClassicSimilarity], result of:
            0.012507228 = score(doc=2924,freq=3.0), product of:
              0.04801561 = queryWeight, product of:
                1.0240514 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.019485999 = queryNorm
              0.26048255 = fieldWeight in 2924, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=2924)
          0.013962199 = weight(abstract_txt:with in 2924) [ClassicSimilarity], result of:
            0.013962199 = score(doc=2924,freq=3.0), product of:
              0.05167069 = queryWeight, product of:
                1.0623134 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019485999 = queryNorm
              0.27021506 = fieldWeight in 2924, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=2924)
          0.040425193 = weight(abstract_txt:about in 2924) [ClassicSimilarity], result of:
            0.040425193 = score(doc=2924,freq=3.0), product of:
              0.09536673 = queryWeight, product of:
                1.2498549 = boost
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.019485999 = queryNorm
              0.423892 = fieldWeight in 2924, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.0625 = fieldNorm(doc=2924)
          0.012119067 = weight(abstract_txt:that in 2924) [ClassicSimilarity], result of:
            0.012119067 = score(doc=2924,freq=2.0), product of:
              0.05797689 = queryWeight, product of:
                1.2580937 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019485999 = queryNorm
              0.20903271 = fieldWeight in 2924, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=2924)
          0.02775783 = weight(abstract_txt:time in 2924) [ClassicSimilarity], result of:
            0.02775783 = score(doc=2924,freq=1.0), product of:
              0.10705154 = queryWeight, product of:
                1.3242123 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.019485999 = queryNorm
              0.2592941 = fieldWeight in 2924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=2924)
          0.5215118 = weight(title_txt:your in 2924) [ClassicSimilarity], result of:
            0.5215118 = score(doc=2924,freq=1.0), product of:
              0.20016158 = queryWeight, product of:
                1.4784474 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.019485999 = queryNorm
              2.605454 = fieldWeight in 2924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.375 = fieldNorm(doc=2924)
        0.24 = coord(6/25)
    
  5. Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.15
    0.14996825 = sum of:
      0.14996825 = product of:
        0.5356009 = sum of:
          0.014442103 = weight(abstract_txt:this in 675) [ClassicSimilarity], result of:
            0.014442103 = score(doc=675,freq=4.0), product of:
              0.04801561 = queryWeight, product of:
                1.0240514 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.019485999 = queryNorm
              0.30077934 = fieldWeight in 675, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.011400087 = weight(abstract_txt:with in 675) [ClassicSimilarity], result of:
            0.011400087 = score(doc=675,freq=2.0), product of:
              0.05167069 = queryWeight, product of:
                1.0623134 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019485999 = queryNorm
              0.22062966 = fieldWeight in 675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.014842764 = weight(abstract_txt:that in 675) [ClassicSimilarity], result of:
            0.014842764 = score(doc=675,freq=3.0), product of:
              0.05797689 = queryWeight, product of:
                1.2580937 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019485999 = queryNorm
              0.25601172 = fieldWeight in 675, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.02775783 = weight(abstract_txt:time in 675) [ClassicSimilarity], result of:
            0.02775783 = score(doc=675,freq=1.0), product of:
              0.10705154 = queryWeight, product of:
                1.3242123 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.019485999 = queryNorm
              0.2592941 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.12439321 = weight(abstract_txt:character in 675) [ClassicSimilarity], result of:
            0.12439321 = score(doc=675,freq=3.0), product of:
              0.17625016 = queryWeight, product of:
                1.3873316 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.019485999 = queryNorm
              0.70577645 = fieldWeight in 675, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.08627864 = weight(abstract_txt:neural in 675) [ClassicSimilarity], result of:
            0.08627864 = score(doc=675,freq=1.0), product of:
              0.19917783 = queryWeight, product of:
                1.4748099 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.019485999 = queryNorm
              0.43317392 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.25648624 = weight(abstract_txt:recurrent in 675) [ClassicSimilarity], result of:
            0.25648624 = score(doc=675,freq=1.0), product of:
              0.47138807 = queryWeight, product of:
                2.7787564 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.019485999 = queryNorm
              0.54410845 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
        0.28 = coord(7/25)