BIG-bench论文 - 专知

会员服务 ·

BIG-bench

BIG-Bench Extra Hard

Arxiv

0+阅读 · 2月26日

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Arxiv

0+阅读 · 2024年7月22日

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Arxiv

0+阅读 · 2024年7月29日

Reliable Reasoning Beyond Natural Language

Arxiv

0+阅读 · 2024年7月16日

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Arxiv

0+阅读 · 2024年6月27日

NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

Arxiv

0+阅读 · 2024年5月12日

Instruction Matters, a Simple yet Effective Task Selection Approach in Instruction Tuning for Specific Tasks

Arxiv

0+阅读 · 2024年4月25日

Toolink: Linking Toolkit Creation and Using through Chain-of-Solving on Open-Source Model

Arxiv

0+阅读 · 2024年3月18日

How predictable is language model benchmark performance?

Arxiv

0+阅读 · 2024年1月9日

LLMs cannot find reasoning errors, but can correct them!

Arxiv

0+阅读 · 2024年1月9日

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

Arxiv

0+阅读 · 2023年10月31日

Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations

Arxiv

0+阅读 · 2023年10月23日

S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

Arxiv

0+阅读 · 2023年10月23日

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

Arxiv

0+阅读 · 2023年10月10日

Large Language Models as Optimizers

Arxiv

0+阅读 · 2023年9月7日

参考链接

微信扫码咨询专知VIP会员