I read your post with great interest. My concern stays the same as during the talks of @Evanmassenhove and @_atoral at MTSummit: it seems that no error evaluation is done on the MT results, meaning that MT systems can produce seeming richness that are just wrong translations...
-
-
-
or made-up words that will seem to add to richness but that are actually unacceptable or even incomprehensible or plain wrong. Admittedly, this might not happen THAT frequent, but it is important to take into account I believe.
- Još 21 odgovor
Novi razgovor -
-
-
I agree with
@BramVanroy , unless you get professional human translators to look at the MT output (and to translate the references in the first place) and analyse it then there is no credibility. At a quick glance I can tell you that I really doubt the Finnish MT, for example is -
That is all fine, but I am not a fan of beliefs :) The scripts and data are there, people can just prove or disprove it, but proof I would demand. The other question is if the automatic measures of LD are worth anything. That would be a good attack point, I think.
- Još 8 drugih odgovora
Novi razgovor -
-
-
This reminded me a work on GANs showing GAN-generated images have low diversity:http://www.offconvex.org/2017/07/06/GANs3/ …
-
There's a lot of work that shows that for generation from language models. However, MT is not the same as LMs. MT does have access to the full source as a guide which in my opinion changes everything. GANs and LMs for generation are on their own when generating content.
- Još 3 druga odgovora
Novi razgovor -
-
-
Also repo for full reproduciblity: https://github.com/emjotde/diversity … Just run make -j
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
Corrections corner: I wrote that Vanmassenhove et al. (
@Evanmassenhove,@DShterionov ) used 100K training sentences. That's wrong, they used 1M sentences. Apologies for that. Unfortunately, I saw@DShterionov's tweet about that only now. It's corrected in the post as well.Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
-
Not yet. Just general data cleaning and trying to find ways to weed out horrible stuff from production data. LD seems like an interesting path and to make sure I know what I am doing I computed a couple of numbers. Then after looking at them: "Huh, that's odd. Let's blog!"
Kraj razgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.