Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Nain, A K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04242  [pdf, other

    cs.LG cs.AI cs.NE

    The Ungrounded Alignment Problem

    Authors: Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones

    Abstract: Modern machine learning systems have demonstrated substantial abilities with methods that either embrace or ignore human-provided knowledge, but combining benefits of both styles remains a challenge. One particular challenge involves designing learning systems that exhibit built-in responses to specific abstract stimulus patterns, yet are still plastic enough to be agnostic about the modality and… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 7 pages, plus references and appendix

  2. arXiv:2407.09298  [pdf, other

    cs.CL

    Transformer Layers as Painters

    Authors: Qi Sun, Marc Pickett, Aakash Kumar Nain, Llion Jones

    Abstract: Despite their nearly universal adoption for large language models, the internal workings of transformers are not well understood. We aim to better understand the impact of removing or reorganizing information throughout the layers of a pretrained transformer. Such an understanding could both yield better usage of existing models as well as to make architectural improvements to produce new variants… ▽ More

    Submitted 5 August, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: 12 pages total, including references and appendices

  3. arXiv:2308.08061  [pdf

    cs.CL cs.LG cs.SE

    The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models

    Authors: Abi Aryan, Aakash Kumar Nain, Andrew McMahon, Lucas Augusto Meyer, Harpreet Singh Sahota

    Abstract: When deploying machine learning models in production for any product/application, there are three properties that are commonly desired. First, the models should be generalizable, in that we can extend it to further use cases as our knowledge of the domain area develops. Second they should be evaluable, so that there are clear metrics for performance and the calculation of those metrics in producti… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 11 pages