Document (#38872)

Author
Kiros, R.
Salakhutdinov, R.
Zemel, R.S.
Title
Unifying visual-semantic embeddings with multimodal neural language models
Source
http://arxiv.org/pdf/1411.2539v1.pdf
Year
2014
Abstract
Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Oxford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car* - "blue" + "red" is near images of red cars. Sample captions generated for 800 images are made available for comparison.
Content
Vgl. auch: https://news.ycombinator.com/item?id=8621658.
Theme
Automatisches Indexieren
Form
Bilder

Similar documents (content)

  1. Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.37
    0.37400562 = sum of:
      0.37400562 = product of:
        1.0389044 = sum of:
          0.007110103 = weight(abstract_txt:with in 2868) [ClassicSimilarity], result of:
            0.007110103 = score(doc=2868,freq=1.0), product of:
              0.036460027 = queryWeight, product of:
                1.0731962 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.013610341 = queryNorm
              0.19501092 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.061491117 = weight(abstract_txt:image in 2868) [ClassicSimilarity], result of:
            0.061491117 = score(doc=2868,freq=3.0), product of:
              0.084539086 = queryWeight, product of:
                1.1555384 = boost
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.013610341 = queryNorm
              0.72736907 = fieldWeight in 2868, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.054414406 = weight(abstract_txt:novel in 2868) [ClassicSimilarity], result of:
            0.054414406 = score(doc=2868,freq=2.0), product of:
              0.08919811 = queryWeight, product of:
                1.1869528 = boost
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.013610341 = queryNorm
              0.6100399 = fieldWeight in 2868, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.110925674 = weight(abstract_txt:sentences in 2868) [ClassicSimilarity], result of:
            0.110925674 = score(doc=2868,freq=2.0), product of:
              0.14340615 = queryWeight, product of:
                1.5050105 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.013610341 = queryNorm
              0.77350706 = fieldWeight in 2868, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.033173095 = weight(abstract_txt:language in 2868) [ClassicSimilarity], result of:
            0.033173095 = score(doc=2868,freq=1.0), product of:
              0.10180217 = queryWeight, product of:
                1.7932843 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.013610341 = queryNorm
              0.3258584 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.19771427 = weight(abstract_txt:neural in 2868) [ClassicSimilarity], result of:
            0.19771427 = score(doc=2868,freq=3.0), product of:
              0.21081673 = queryWeight, product of:
                2.2348778 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.013610341 = queryNorm
              0.93784904 = fieldWeight in 2868, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.07313589 = weight(abstract_txt:images in 2868) [ClassicSimilarity], result of:
            0.07313589 = score(doc=2868,freq=1.0), product of:
              0.17244612 = queryWeight, product of:
                2.333981 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.013610341 = queryNorm
              0.42410865 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.16594283 = weight(abstract_txt:embedding in 2868) [ClassicSimilarity], result of:
            0.16594283 = score(doc=2868,freq=1.0), product of:
              0.27053645 = queryWeight, product of:
                2.531711 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.013610341 = queryNorm
              0.61338437 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
          0.33499694 = weight(abstract_txt:multimodal in 2868) [ClassicSimilarity], result of:
            0.33499694 = score(doc=2868,freq=1.0), product of:
              0.5123463 = queryWeight, product of:
                4.4978757 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.013610341 = queryNorm
              0.65384865 = fieldWeight in 2868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.078125 = fieldNorm(doc=2868)
        0.36 = coord(9/25)
    
  2. Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.32
    0.3207887 = sum of:
      0.3207887 = product of:
        1.0024648 = sum of:
          0.007110103 = weight(abstract_txt:with in 2557) [ClassicSimilarity], result of:
            0.007110103 = score(doc=2557,freq=1.0), product of:
              0.036460027 = queryWeight, product of:
                1.0731962 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.013610341 = queryNorm
              0.19501092 = fieldWeight in 2557, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
          0.050207287 = weight(abstract_txt:image in 2557) [ClassicSimilarity], result of:
            0.050207287 = score(doc=2557,freq=2.0), product of:
              0.084539086 = queryWeight, product of:
                1.1555384 = boost
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.013610341 = queryNorm
              0.59389436 = fieldWeight in 2557, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
          0.038476795 = weight(abstract_txt:novel in 2557) [ClassicSimilarity], result of:
            0.038476795 = score(doc=2557,freq=1.0), product of:
              0.08919811 = queryWeight, product of:
                1.1869528 = boost
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.013610341 = queryNorm
              0.43136334 = fieldWeight in 2557, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
          0.033879556 = weight(abstract_txt:models in 2557) [ClassicSimilarity], result of:
            0.033879556 = score(doc=2557,freq=1.0), product of:
              0.09380197 = queryWeight, product of:
                1.4907583 = boost
                4.623126 = idf(docFreq=1185, maxDocs=44421)
                0.013610341 = queryNorm
              0.36118174 = fieldWeight in 2557, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.623126 = idf(docFreq=1185, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
          0.110925674 = weight(abstract_txt:sentences in 2557) [ClassicSimilarity], result of:
            0.110925674 = score(doc=2557,freq=2.0), product of:
              0.14340615 = queryWeight, product of:
                1.5050105 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.013610341 = queryNorm
              0.77350706 = fieldWeight in 2557, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
          0.16143303 = weight(abstract_txt:neural in 2557) [ClassicSimilarity], result of:
            0.16143303 = score(doc=2557,freq=2.0), product of:
              0.21081673 = queryWeight, product of:
                2.2348778 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.013610341 = queryNorm
              0.7657505 = fieldWeight in 2557, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
          0.12667507 = weight(abstract_txt:images in 2557) [ClassicSimilarity], result of:
            0.12667507 = score(doc=2557,freq=3.0), product of:
              0.17244612 = queryWeight, product of:
                2.333981 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.013610341 = queryNorm
              0.7345777 = fieldWeight in 2557, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
          0.4737572 = weight(abstract_txt:multimodal in 2557) [ClassicSimilarity], result of:
            0.4737572 = score(doc=2557,freq=2.0), product of:
              0.5123463 = queryWeight, product of:
                4.4978757 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.013610341 = queryNorm
              0.92468154 = fieldWeight in 2557, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.078125 = fieldNorm(doc=2557)
        0.32 = coord(8/25)
    
  3. Névéol, A.; Deserno, T.M.; Darmoni, S.J.; Güld, M.O.; Aronson, A.R.: Natural language processing versus content-based image analysis for medical document retrieval (2009) 0.17
    0.17360595 = sum of:
      0.17360595 = product of:
        0.72335815 = sum of:
          0.011331044 = weight(abstract_txt:using in 3702) [ClassicSimilarity], result of:
            0.011331044 = score(doc=3702,freq=1.0), product of:
              0.052445322 = queryWeight, product of:
                1.1146914 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.013610341 = queryNorm
              0.21605442 = fieldWeight in 3702, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=3702)
          0.08520459 = weight(abstract_txt:image in 3702) [ClassicSimilarity], result of:
            0.08520459 = score(doc=3702,freq=9.0), product of:
              0.084539086 = queryWeight, product of:
                1.1555384 = boost
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.013610341 = queryNorm
              1.0078721 = fieldWeight in 3702, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.375318 = idf(docFreq=558, maxDocs=44421)
                0.0625 = fieldNorm(doc=3702)
          0.053354803 = weight(abstract_txt:joint in 3702) [ClassicSimilarity], result of:
            0.053354803 = score(doc=3702,freq=1.0), product of:
              0.12870994 = queryWeight, product of:
                1.42581 = boost
                6.6325636 = idf(docFreq=158, maxDocs=44421)
                0.013610341 = queryNorm
              0.41453522 = fieldWeight in 3702, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6325636 = idf(docFreq=158, maxDocs=44421)
                0.0625 = fieldNorm(doc=3702)
          0.026538474 = weight(abstract_txt:language in 3702) [ClassicSimilarity], result of:
            0.026538474 = score(doc=3702,freq=1.0), product of:
              0.10180217 = queryWeight, product of:
                1.7932843 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.013610341 = queryNorm
              0.26068673 = fieldWeight in 3702, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=3702)
          0.082743816 = weight(abstract_txt:images in 3702) [ClassicSimilarity], result of:
            0.082743816 = score(doc=3702,freq=2.0), product of:
              0.17244612 = queryWeight, product of:
                2.333981 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.013610341 = queryNorm
              0.47982416 = fieldWeight in 3702, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.0625 = fieldNorm(doc=3702)
          0.4641854 = weight(abstract_txt:multimodal in 3702) [ClassicSimilarity], result of:
            0.4641854 = score(doc=3702,freq=3.0), product of:
              0.5123463 = queryWeight, product of:
                4.4978757 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.013610341 = queryNorm
              0.90599924 = fieldWeight in 3702, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.0625 = fieldNorm(doc=3702)
        0.24 = coord(6/25)
    
  4. Li, W.; Zheng, Y.; Zhan, Y.; Feng, R.; Zhang, T.; Fan, W.: Cross-modal retrieval with dual multi-angle self-attention (2021) 0.14
    0.14356627 = sum of:
      0.14356627 = product of:
        0.5981928 = sum of:
          0.030781439 = weight(abstract_txt:novel in 1068) [ClassicSimilarity], result of:
            0.030781439 = score(doc=1068,freq=1.0), product of:
              0.08919811 = queryWeight, product of:
                1.1869528 = boost
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.013610341 = queryNorm
              0.3450907 = fieldWeight in 1068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.521451 = idf(docFreq=482, maxDocs=44421)
                0.0625 = fieldNorm(doc=1068)
          0.026538474 = weight(abstract_txt:language in 1068) [ClassicSimilarity], result of:
            0.026538474 = score(doc=1068,freq=1.0), product of:
              0.10180217 = queryWeight, product of:
                1.7932843 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.013610341 = queryNorm
              0.26068673 = fieldWeight in 1068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=1068)
          0.057377245 = weight(abstract_txt:space in 1068) [ClassicSimilarity], result of:
            0.057377245 = score(doc=1068,freq=1.0), product of:
              0.17021567 = queryWeight, product of:
                2.318838 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.013610341 = queryNorm
              0.33708557 = fieldWeight in 1068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=1068)
          0.082743816 = weight(abstract_txt:images in 1068) [ClassicSimilarity], result of:
            0.082743816 = score(doc=1068,freq=2.0), product of:
              0.17244612 = queryWeight, product of:
                2.333981 = boost
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.013610341 = queryNorm
              0.47982416 = fieldWeight in 1068, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.428591 = idf(docFreq=529, maxDocs=44421)
                0.0625 = fieldNorm(doc=1068)
          0.13275427 = weight(abstract_txt:embedding in 1068) [ClassicSimilarity], result of:
            0.13275427 = score(doc=1068,freq=1.0), product of:
              0.27053645 = queryWeight, product of:
                2.531711 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.013610341 = queryNorm
              0.4907075 = fieldWeight in 1068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.0625 = fieldNorm(doc=1068)
          0.26799756 = weight(abstract_txt:multimodal in 1068) [ClassicSimilarity], result of:
            0.26799756 = score(doc=1068,freq=1.0), product of:
              0.5123463 = queryWeight, product of:
                4.4978757 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.013610341 = queryNorm
              0.5230789 = fieldWeight in 1068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.0625 = fieldNorm(doc=1068)
        0.24 = coord(6/25)
    
  5. Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N.: Generating Wikipedia by summarizing long sequences (2018) 0.14
    0.1387787 = sum of:
      0.1387787 = product of:
        0.8673669 = sum of:
          0.06269653 = weight(abstract_txt:introduce in 1774) [ClassicSimilarity], result of:
            0.06269653 = score(doc=1774,freq=1.0), product of:
              0.10937832 = queryWeight, product of:
                1.3143809 = boost
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.013610341 = queryNorm
              0.57320803 = fieldWeight in 1774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.09375 = fieldNorm(doc=1774)
          0.34556928 = weight(abstract_txt:decoder in 1774) [ClassicSimilarity], result of:
            0.34556928 = score(doc=1774,freq=2.0), product of:
              0.27088335 = queryWeight, product of:
                2.0684583 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.013610341 = queryNorm
              1.2757125 = fieldWeight in 1774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.09375 = fieldNorm(doc=1774)
          0.13698046 = weight(abstract_txt:neural in 1774) [ClassicSimilarity], result of:
            0.13698046 = score(doc=1774,freq=1.0), product of:
              0.21081673 = queryWeight, product of:
                2.2348778 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.013610341 = queryNorm
              0.6497609 = fieldWeight in 1774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.09375 = fieldNorm(doc=1774)
          0.32212064 = weight(abstract_txt:encoder in 1774) [ClassicSimilarity], result of:
            0.32212064 = score(doc=1774,freq=1.0), product of:
              0.3728021 = queryWeight, product of:
                2.9719427 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.013610341 = queryNorm
              0.86405265 = fieldWeight in 1774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.09375 = fieldNorm(doc=1774)
        0.16 = coord(4/25)