Search | arXiv e-print repository

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Authors: George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, Swarat Chaudhuri

Abstract: We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalization… ▽ More We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10945 [pdf, other]

Blockchain Governance: An Empirical Analysis of User Engagement on DAOs

Authors: Brett Falk, Tasneem Pathan, Andrew Rigas, Gerry Tsoukalas

Abstract: In this note, we examine voting on four major blockchain DAOs: Aave, Compound, Lido and Uniswap. Using data directly collected from the Ethereum blockchain, we examine voter activity. We find that in most votes, the "minimal quorum," i.e., the smallest number of active voters who could swing the vote is quite small. To understand who is actually driving these DAOs, we use data from the Ethereu… ▽ More In this note, we examine voting on four major blockchain DAOs: Aave, Compound, Lido and Uniswap. Using data directly collected from the Ethereum blockchain, we examine voter activity. We find that in most votes, the "minimal quorum," i.e., the smallest number of active voters who could swing the vote is quite small. To understand who is actually driving these DAOs, we use data from the Ethereum Name Service (ENS), Sybil.org, and Compound, to divide voters into different categories. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2311.18717 [pdf, other]

NFT Wash Trading: Direct vs. Indirect Estimation

Authors: Brett Hemenway Falk, Gerry Tsoukalas, Niuniu Zhang

Abstract: Recent studies estimate around 70% of traded value on off-chain crypto exchanges like Binance is wash trading. This paper turns to NFT markets, where the on-chain nature of transactions-a key tenet of Web3 innovation-enables more direct estimation methods to be applied. Focusing on three of the largest NFT marketplaces, we find 30-40% of NFT volume and 25-95% of traded value involve wash trading.… ▽ More Recent studies estimate around 70% of traded value on off-chain crypto exchanges like Binance is wash trading. This paper turns to NFT markets, where the on-chain nature of transactions-a key tenet of Web3 innovation-enables more direct estimation methods to be applied. Focusing on three of the largest NFT marketplaces, we find 30-40% of NFT volume and 25-95% of traded value involve wash trading. We leverage this direct approach to critically evaluate recent indirect estimation methods suggested in the literature, revealing major differences in effectiveness, with some failing altogether. Trade-roundedness filters, as suggested in Cong et al. (2023), emerge as the most accurate indirect estimation method. In fact, we show how direct and indirect approaches can be closely aligned via hyper-parameter fine-tuning. Our findings underscore the crucial role of technological innovation in detecting and regulating financial misconduct in digital finance. △ Less

Submitted 5 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.04353 [pdf, other]

An In-Context Learning Agent for Formal Theorem-Proving

Authors: Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin, Swarat Chaudhuri

Abstract: We present an in-context learning agent for formal theorem-proving in environments like Lean and Coq. Current state-of-the-art models for the problem are finetuned on environment-specific proof data. By contrast, our approach, called COPRA, repeatedly asks a high-capacity, general-purpose large language model (GPT-4) to propose tactic applications from within a stateful backtracking search. Propos… ▽ More We present an in-context learning agent for formal theorem-proving in environments like Lean and Coq. Current state-of-the-art models for the problem are finetuned on environment-specific proof data. By contrast, our approach, called COPRA, repeatedly asks a high-capacity, general-purpose large language model (GPT-4) to propose tactic applications from within a stateful backtracking search. Proposed tactics are executed in the underlying proof environment. Feedback from the execution is used to build the prompt for the next model query, along with selected information from the search history and lemmas retrieved from an external database. We evaluate our implementation of COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the CompCert project. On these benchmarks, COPRA significantly outperforms few-shot invocations of GPT-4. It also compares favorably against finetuning-based approaches, outperforming ReProver, a state-of-the-art finetuned approach for Lean, in terms of the pass@1 metric. Our code and data are available at https://github.com/trishullab/copra. △ Less

Submitted 8 August, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2212.00292 [pdf, other]

Economics of NFTs: The Value of Creator Royalties

Authors: Brett Hemenway Falk, Gerry Tsoukalas, Niuniu Zhang

Abstract: Non-Fungible Tokens (NFTs) promise to revolutionize how content creators (e.g., artists) price and sell their work. One core feature of NFTs is the option to embed creator royalties which earmark a percentage of future sale proceeds to creators, each time their NFTs change hands. As popular as this feature is in practice, its utility is often questioned because buyers, the argument goes, simply ``… ▽ More Non-Fungible Tokens (NFTs) promise to revolutionize how content creators (e.g., artists) price and sell their work. One core feature of NFTs is the option to embed creator royalties which earmark a percentage of future sale proceeds to creators, each time their NFTs change hands. As popular as this feature is in practice, its utility is often questioned because buyers, the argument goes, simply ``price it in at the time of purchase''. As intuitive as this argument sounds, it is incomplete. We find royalties can add value to creators in at least three distinct ways. (i) Risk sharing: when creators and buyers are risk sensitive, royalties can improve trade by splitting the risks associated with future price volatility; (ii) Dynamic pricing: in the presence of information asymmetry, royalties can extract more revenues from better-informed speculators over time, mimicking the benefits of ``dynamic pricing''; (iii) Price discrimination: when creators sell multi-unit NFT collections, royalties can better capture value from heterogeneous buyers. Our results suggest creator royalties play an important and sometimes overlooked role in the economics of NFTs. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2110.08673 [pdf, other]

Scaling Blockchains: Can Committee-Based Consensus Help?

Authors: Alon Benhaim, Brett Hemenway Falk, Gerry Tsoukalas

Abstract: In the high-stakes race to develop more scalable blockchains, some platforms (Binance, Cosmos, EOS, TRON, etc.) have adopted committee-based consensus (CBC) protocols, whereby the blockchain's record-keeping rights are entrusted to a committee of elected block producers. In theory, the smaller the committee, the faster the blockchain can reach consensus and the more it can scale. What's less clear… ▽ More In the high-stakes race to develop more scalable blockchains, some platforms (Binance, Cosmos, EOS, TRON, etc.) have adopted committee-based consensus (CBC) protocols, whereby the blockchain's record-keeping rights are entrusted to a committee of elected block producers. In theory, the smaller the committee, the faster the blockchain can reach consensus and the more it can scale. What's less clear, is whether such protocols ensure that honest committees can be consistently elected, given blockchain users typically have limited information on who to vote for. We show that the approval voting mechanism underlying most CBC protocols is complex and can lead to intractable optimal voting strategies. We empirically characterize some simpler intuitive voting strategies that users tend to resort to in practice and prove that these nonetheless converge to optimality exponentially quickly in the number of voters. Exponential convergence ensures that despite its complexity, CBC exhibits robustness and has some efficiency advantages over more popular staked-weighted lottery protocols currently underlying many prominent blockchains such as Ethereum. △ Less

Submitted 1 December, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

arXiv:0805.4680 [pdf, ps, other]

Telex: Principled System Support for Write-Sharing in Collaborative Applications

Authors: Lamia Benmouffok, Jean-Michel Busca, Joan Manuel Marquès, Marc Shapiro, Pierre Sutra, Georgios Tsoukalas

Abstract: The Telex system is designed for sharing mutable data in a distributed environment, particularly for collaborative applications. Users operate on their local, persistent replica of shared documents; they can work disconnected and suffer no network latency. The Telex approach to detect and correct conflicts is application independent, based on an action-constraint graph (ACG) that summarises the… ▽ More The Telex system is designed for sharing mutable data in a distributed environment, particularly for collaborative applications. Users operate on their local, persistent replica of shared documents; they can work disconnected and suffer no network latency. The Telex approach to detect and correct conflicts is application independent, based on an action-constraint graph (ACG) that summarises the concurrency semantics of applications. The ACG is stored efficiently in a multilog structure that eliminates contention and is optimised for locality. Telex supports multiple applications and multi-document updates. The Telex system clearly separates system logic (which includes replication, views, undo, security, consistency, conflicts, and commitment) from application logic. An example application is a shared calendar for managing multi-user meetings; the system detects meeting conflicts and resolves them consistently. △ Less

Submitted 10 June, 2008; v1 submitted 30 May, 2008; originally announced May 2008.

Report number: RR-6546

Showing 1–7 of 7 results for author: Tsoukalas, G