基准测试论文 - 专知

会员服务 ·

基准测试

基准测试是指通过设计科学的测试方法、测试工具和测试系统，实现对一类测试对象的某项性能指标进行定量的和可对比的测试。

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Arxiv

0+阅读 · 2023年5月1日

LST-Bench: Benchmarking Log-Structured Tables in the Cloud

Arxiv

0+阅读 · 2023年5月1日

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Arxiv

0+阅读 · 2023年5月2日

Differentially Private In-Context Learning

Arxiv

0+阅读 · 2023年5月2日

A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images

Arxiv

0+阅读 · 2023年4月30日

Rethinking Benchmarks for Cross-modal Image-text Retrieval

Arxiv

0+阅读 · 2023年4月21日

Balancing Simulation-based Inference for Conservative Posteriors

Arxiv

0+阅读 · 2023年4月21日

RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models

Arxiv

0+阅读 · 2023年4月21日

Towards a Benchmark for Scientific Understanding in Humans and Machines

Arxiv

2+阅读 · 2023年4月21日

Jedi: Entropy-based Localization and Removal of Adversarial Patches

Arxiv

0+阅读 · 2023年4月20日

Power Law Trends in Speedrunning and Machine Learning

Arxiv

0+阅读 · 2023年4月19日

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

Arxiv

0+阅读 · 2023年4月20日

Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

Arxiv

2+阅读 · 2023年4月20日

Test-driving RISC-V Vector hardware for HPC

Arxiv

0+阅读 · 2023年4月20日

From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification

Arxiv

0+阅读 · 2023年4月19日

参考链接

微信扫码咨询专知VIP会员