"\"ORNIS, s. m. toile des Indes, (Comm.) sortes de\\ntoiles de coton ou de mousseline, qui se font a Brampour ville de l'Indoustan, entre Surate & Agra. Ces\\ntoiles sont par bandes, moitié coton & moitié or &\\nargent. Il y en a depuis quinze jusqu'à vingt aunes.\""
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[column_text].tolist()[0]"
]
},
{
"attachments": {},
"cell_type": "markdown",
...
...
@@ -610,58 +583,50 @@
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The hypothesis with the highest score is: \"Commerce\" with a probability of 70.05%\n"
]
}
],
"outputs": [],
"source": [
"# pose sequence as a NLI premise and label (politics) as a hypothesis\n",
"ORNIS, s. m. toile des Indes, (Comm.) sortes de\ntoiles de coton ou de mousseline, qui se font a Brampour ville de l'Indoustan, entre Surate & Agra. Ces\ntoiles sont par bandes, moitié coton & moitié or &\nargent. Il y en a depuis quinze jusqu'à vingt aunes."
%% Cell type:markdown id: tags:
## 3. Classification
%% Cell type:markdown id: tags:
The approach, proposed by [Yin et al. (2019)](https://arxiv.org/abs/1909.00161), uses a pre-trained MNLI sequence-pair classifier as an out-of-the-box zero-shot text classifier that actually works pretty well. The idea is to take the sequence we're interested in labeling as the "premise" and to turn each candidate label into a "hypothesis." If the NLI model predicts that the premise "entails" the hypothesis, we take the label to be true. See the code snippet below which demonstrates how easily this can be done with 🤗 Transformers.