Opens profile photo
Follow
Click to Follow zijianwang30
Zijian Wang
@zijianwang30
Applied Scientist at AWS AI Labs . LLM for code and more. Past: M.S. at & . #NLProc
Palo Alto, CAnlp.stanford.edu/~zijwangJoined August 2017

Zijian Wang’s Tweets

📢We have extended the deadline to *February 10th 11:59 PM AoE* Call for papers: dl4c.github.io/callforpapers/ Submission Link: openreview.net/group?id=ICLR.
Quote Tweet
📢 The deadline to submit to the DL4C Workshop is less than a month away! Link: openreview.net/group?id=ICLR. Submit by February 3rd
Show this thread
4
Excited to co-organize the 2nd workshop w/ . We are looking for contributions (due Feb. 3) and reviewers. See dl4c.github.io/callforpapers/ for details. Workshop sponsored by
Quote Tweet
Excited to share that the second installment of Deep Learning for Code workshop is back at ICLR 2023! Considering submitting your work by Feb 3rd, 2023. Details: dl4c.github.io/callforpapers/ @tscholak @GOrlanski @DL4Code
Show this thread
1
9
Thrilled to share MBXP, our new execution-based multilingual code generation benchmark ⬇️
Quote Tweet
We present multi-lingual code evaluation benchmarks MBXP and multi-lingual HumanEval, available in 10+ programming languages. With these datasets, we evaluate many facets of language model’s code generation abilities (arxiv.org/abs/2210.14868) @AmazonScience
Show this thread
Summary: Our paper proposes a language framework for execution-based code evaluation datasets that converts prompts and test cases. From this framework, we converted Python datasets MBPP and HumanEval to 10+ additional programming languages. To synthesize canonical solutions, we use code language models to generate many versions of function bodies, and filtered out only the correct ones according to the converted test in the respective language. We process these multi-lingual execution-based datasets for many code evaluation purposes, for instance, to evaluate code translation abilities, code insertion, code summarization, of robustness to prompt perturbation.
10
We hope that this work paves the way to large-scale studies of how readers form judgments of guilt in crime reporting, and encourages the development of systems that provide guidance on the presentation of these reports. 7/7
Show this thread
We showed that SuspectGuilt can form a basis for building tools that can predict guilt. We trained BERT models, and showed that these models were improved by genre-specific pretraining (w/ 470k+ unlabeled crime news) and token-level supervision from the highlighting. 6/7
Image
1
Show this thread
However, for the number of *highlighted* hedges, it becomes a significant predictor for the author belief rating, suggesting that the less likely annotators thought the author believed the suspect was guilty, the more hedges they highlighted to justify the given rating. 5/7
Image
1
Show this thread
SuspectGuilt provides a rich picture of how linguistic choices affect subjective guilt judgments. For example, looking at the rating vs the number of hedges, we found that the number of hedges has no effect on the guilt assessments. 4/7
1
Show this thread
Annotators were asked to answer two questions & highlight in the story why they gave their response 1. Reader perception: How likely is it that the main suspect(s) is(are) guilty? 2. Author belief: How much does the author believe that the main suspect(s) is(are) are guilty? 3/7
Image
1
Show this thread
Crime reporting has the power to shape public perceptions and social policies. How does the language of such reports act on readers? To address this question, we created SuspectGuilt, a corpus of 1800+ annotated crime stories from 1000+ local newspapers in the U.S. 2/7
1
Show this thread
The NLP community has made great progress on open-domain question answering, but our systems still struggle to answer complex questions using lots of text. Read our latest blog post, by , about enabling multi-step reasoning in these systems!
49
Check out our #emnlp2019 paper “Answering Complex Open-domain Questions Through Iterative Query Generation” at arxiv.org/pdf/1910.07000
Quote Tweet
"What's the Aquaman actor's next movie?" Complex questions are common in daily comms, but current open-domain QA systems struggle with finding all supporting facts needed. We present a system in #emnlp2019 paper that answers them efficiently & explainably: qipeng.me/blog/answering
Show this thread
4
2) a labeled Reddit dataset of condescending linguistic acts in context. We show the importance of context, motivate techniques to deal with low rates of condescension, and use our model to estimate condescension rates in various subreddits. Data & model:
Show this thread