Integer (computer science): Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Revision as of 10:26, 10 May 2023 edit Fragrant Peony (talk \| contribs) Extended confirmed users, Pending changes reviewers 14,353 edits Reverting edit(s) by 195.224.170.238 (talk) to rev. 1148952249 by Johnuniq: non-constructive (RW 16.1) Tags: RW Undo ← Previous edit		Revision as of 11:33, 30 June 2024 edit undo Jruderman (talk \| contribs) Extended confirmed users, Rollbackers 2,855 edits →‎Syntax Next edit →
(17 intermediate revisions by 13 users not shown)
Line 9: The ''internal representation'' of this datum is the way the value is stored in the computer's memory. Unlike mathematical integers, a typical datum in a computer has some minimal and maximum possible value. The most common representation of a positive integer is a string of [[bit]]s, using the [[binary numeral system]]. The order of the memory [[byte]]s storing the bits varies; see [[endianness]]. The ''width'', ''precision'', or ''~~precision~~bitness''<ref>{{Cite book \|last=Barr \|first=Adam \|url=https://books.google.com/books?id=BxdxDwAAQBAJ&newbks=0&printsec=frontcover&pg=PA268&dq=%22bitness%22&hl=en \|title=The Problem with Software: Why Smart Engineers Write Bad Code \|date=2018-10-23 \|publisher=MIT Press \|isbn=978-0-262-34821-8 \|language=en}}</ref> of an integral type is the number of bits in its representation. An integral type with ''n'' bits can encode 2<sup>''n''</sup> numbers; for example an unsigned type typically represents the non-negative values 0 through 2<sup>''n''</sup>−1. Other encodings of integer values to bit patterns are sometimes used, for example [[binary-coded decimal]] or [[Gray code]], or as printed character codes such as [[ASCII]]. There are four well-known [[signed number representations\|ways to represent signed numbers]] in a binary computing system. The most common is [[two's complement]], which allows a signed integral type with ''n'' bits to represent numbers from −2<sup>(''n''−1)</sup> through 2<sup>(''n''−1)</sup>−1. Two's complement arithmetic is convenient because there is a perfect [[bijection\|one-to-one correspondence]] between representations and values (in particular, no separate +0 and −0), and because [[addition]], [[subtraction]] and [[multiplication]] do not need to distinguish between signed and unsigned types. Other possibilities include [[offset binary]], [[sign-magnitude]], and [[ones' complement]]. Some computer languages define integer sizes in a machine-independent way; others have varying definitions depending on the underlying processor word size. Not all language implementations define variables of all integer sizes, and defined sizes may not even be distinct in a particular implementation. An integer in one [[programming language]] may be a different size in a different language or, on a different processor, or in an execution context of different bitness; see {{Section link\|\|Words}}. Some [[Decimal computer\|older computer architectures]] used decimal representations of integers, stored in [[binary-coded decimal\|binary-coded decimal (BCD)]] or other format. These values generally require data sizes of 4 bits per decimal digit (sometimes called a [[nibble]]), usually with additional bits for a sign. Many modern CPUs provide limited support for decimal integers as an extended datatype, providing instructions for converting such values to and from binary values. Depending on the architecture, decimal integers may have fixed sizes (e.g., 7 decimal digits plus a sign fit into a 32-bit word), or may be variable-length (up to some maximum digit size), typically occupying two digits per byte (octet). Line 129: \|- \| rowspan="2" style="text-align:right;"\| 64 \| rowspan="2" \| word, doubleword, longword, long, long long, quad, quadword, qword, int64, i64, u64 \|''Signed:'' From −9,223,372,036,854,775,808 to [[9223372036854775807\|9,223,372,036,854,775,807]], from −(2<sup>63</sup>) to 2<sup>63</sup> − 1 \| style="text-align:right;" \| {{#expr:63ln2/ln10 round 2}} Line 192: The table above lists integral type widths that are supported in hardware by common processors. High level programming languages provide more possibilities. It is common to have a 'double width' integral type that has twice as many bits as the biggest hardware-supported type. Many languages also have ''bit-field'' types (a specified number of bits, usually constrained to be less than the maximum hardware-supported width) and ''range'' types (that can represent only the integers in a specified range). Some languages, such as [[Lisp (programming language)\|Lisp]], [[Smalltalk]], [[REXX]], [[~~Haskell (programming language)\|~~Haskell]], [[Python (programming language)\|Python]], and [[Raku (programming language)\|Raku]], support ''arbitrary precision'' integers (also known as ''infinite precision integers'' or ''[[bignum]]s''). Other languages that do not support this concept as a top-level construct may have libraries available to represent very large numbers using arrays of smaller variables, such as Java's {{mono\|BigInteger}} class or [[Perl]]'s "{{mono\|bigint}}" package.<ref>{{cite web \|url=http://download.oracle.com/javase/6/docs/api/java/math/BigInteger.html \|title=BigInteger (Java Platform SE 6) \|publisher=Oracle \|access-date=2011-09-11 }}</ref> These use as much of the computer's memory as is necessary to store the numbers; however, a computer has only a finite amount of storage, so they, too, can only represent a finite subset of the mathematical integers. These schemes support very large numbers; for example one kilobyte of memory could be used to store numbers up to 2466 decimal digits long. A [[Boolean datatype\|Boolean]] or [[Flag (computing)\|Flag]] type is a type that can represent only two values: 0 and 1, usually identified with ''false'' and ''true'' respectively. <!-- Pascal has them the other way around --> This type can be stored in memory using a single bit, but is often given a full byte for convenience of addressing and speed of access. Line 213: One important cause of non-portability of software is the incorrect assumption that all computers have the same word size as the computer used by the programmer. For example, if a programmer using the C language incorrectly declares as {{mono\|int}} a variable that will be used to store values greater than 2<sup>15</sup>−1, the program will fail on computers with 16-bit integers. That variable should have been declared as {{mono\|long}}, which has at least 32 bits on any computer. Programmers may also incorrectly assume that a pointer can be converted to an integer without loss of information, which may work on (some) 32-bit computers, but fail on 64-bit computers with 64-bit pointers and 32-bit integers. This issue is resolved by C99 in [[stdint.h]] in the form of {{code\|intptr_t}}. The ''bitness'' of a program may refer to the word size (or bitness) of the processor on which it runs, or it may refer to the width of a memory address or pointer, which can differ between execution modes or contexts. For example, 64-bit versions of [[Microsoft Windows]] support existing 32-bit binaries, and programs compiled for Linux's [[x32 ABI]] run in 64-bit mode yet use 32-bit memory addresses.<ref>{{cite news \|author=Thorsten Leemhuis \|date=2011-09-13 \|title=Kernel Log: x32 ABI gets around 64-bit drawbacks \|url=http://www.h-online.com/open/features/Kernel-Log-x32-ABI-gets-around-64-bit-drawbacks-1342061.html \|archive-url=https://web.archive.org/web/20111028081253/http://www.h-online.com/open/features/Kernel-Log-x32-ABI-gets-around-64-bit-drawbacks-1342061.html \|archive-date=28 October 2011 \|access-date=2011-11-01 \|publisher=www.h-online.com}}</ref> ===Standard integer=== The standard integer size is platform-dependent. In [[C (programming language)\|C]], it is denoted by {{mono\|int}} and required to be at least 16 bits. Windows and Unix systems have 32-bit {{mono\|int}}s on both 32-bit and 64-bit architectures. ===Short integer=== Line 280 ⟶ 287: ! [[Programming language]] ! Approval Type ! [[~~Platform~~Computing ~~(computing)~~platform\|Platform]]s ! Data type name ! Storage in [[bytes]] Line 288 ⟶ 295: \| [[C (programming)\|C]] ISO/ANSI C99 \| International Standard \| [[Unix]], 16/32-bit systems<ref name="agnerfog" /><br>[[Windows]], 16/32/64-bit systems<ref name="agnerfog" /> \| {{mono\|long}} \| {{mono\|long}}{{efn\|name=cross1\|The terms {{mono\|long}} and {{mono\|int}} are equivalent<ref>{{cite web \|url=http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf \|title=ISO/IEC 9899:201x \|access-date=2013-03-27 \|publisher=open-std.org}}</ref>}} \| 4<br>(minimum requirement 4) \| −2,147,483,647 to +2,147,483,647 Line 297 ⟶ 304: \| International Standard \| [[Unix]],<br>64-bit systems<ref name="agnerfog" /><ref name="drdobbsinteger" /> \| {{mono\|long~~}}{{efn\|name=cross1~~}} \| 8<br>(minimum requirement 4) \| −9,223,372,036,854,775,807 to +9,223,372,036,854,775,807 Line 305 ⟶ 312: \| International Standard \| [[Unix]], [[Windows]],<br>16/32-bit system \| {{mono\|long~~}}{{efn\|name=cross1~~}} \| 4 <ref>{{cite web \| title=Fundamental types in C++\|url=http://cppreference.com/wiki/language/types\|publisher=cppreference.com\|access-date=5 December 2010}}</ref><br>(minimum requirement 4) Line 314 ⟶ 321: \| International Standard<br>[[ECMA-372]] \| [[Unix]], [[Windows]],<br>16/32-bit systems \| {{mono\|long~~}}{{efn\|name=cross1~~}} \| 4 <ref>{{cite web\| url=http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-372.pdf \|title=Chapter 8.6.2 on page 12\|publisher=ecma-international.org}}</ref><br>(minimum requirement 4) \| −2,147,483,648 to +2,147,483,647<br> Line 369 ⟶ 376: ===Long long=== {{~~redirect-distinguish~~Redirect\|long long\|~~long~~\|Long (disambiguation)~~{{!}}long\|Long, Long, Long~~}} In the [[C99]] version of the [[C (programming language)\|C programming language]] and the [[C++11]] version of [[C++]], a <CODE>long long</CODE> type is supported that has double the minimum capacity of the standard <CODE>long</CODE>. This type is not supported by compilers that require C code to be compliant with the previous C++ standard, C++03, because the {{mono\|long long}} type did not exist in C++03. For an ANSI/ISO compliant compiler, the minimum requirements for the specified ranges, that is, −(2<sup>63</sup>−1)<ref name="c-std-6.2.6.2p2" /> to 2<sup>63</sup>−1 for signed and 0 to 2<sup>64</sup>−1 for unsigned,<ref name="c-std-5.2.4.2.1" /> must be fulfilled; however, extending this range is permitted.<ref>{{cite web\| url=http://www.ericgiguere.com/articles/ansi-c-summary.html\|title=The ANSI Standard: A Summary for the C Programmer\|first=Eric\|last=Giguere\|date=December 18, 1987\|access-date=2010-09-04}}</ref><ref>{{cite web\|url=http://flash-gordon.me.uk/ansi.c.txt\|title=American National Standard Programming Language C specifies the syntax and semantics of programs written in the C programming language.\|access-date=2010-09-04\|url-status=dead\|archive-url=https://web.archive.org/web/20100822072551/http://flash-gordon.me.uk/ansi.c.txt\|archive-date=2010-08-22}}</ref> This can be an issue when exchanging code and data between platforms, or doing direct hardware access. Thus, there are several sets of headers providing platform independent exact width types. The C [[standard library]] provides ''[[stdint.h]]''; this was introduced in C99 and C++11. == Syntax == {{main\|Integer literal}} ~~Literals~~[[Integer ~~for integers~~literal]]s can be written as regular [[Arabic numerals]], consisting of a sequence of digits and with negation indicated by a [[hyphen-minus\|minus sign]] before the value. However, most programming languages disallow use of commas or spaces for [[digit grouping]]. Examples of integer literals are: <code>42</code> Line 383 ⟶ 391: * Many programming languages, especially those influenced by [[C (programming language)\|C]], prefix an integer literal with <code>0X</code> or <code>0x</code> to represent a [[hexadecimal]] value, e.g. <code>0xDEADBEEF</code>. Other languages may use a different notation, e.g. some [[assembly language]]s append an <code>H</code> or <code>h</code> to the end of a hexadecimal value. * [[Perl]], [[Ruby (programming language)\|Ruby]], [[Java (programming language)\|Java]], [[Julia (programming language)\|Julia]], [[D (programming language)\|D]], [[Go (programming language)\|Go]], [[Rust (programming language)\|Rust]] and [[Python (programming language)\|Python]] (starting from version 3.6) allow embedded [[underscore]]s for clarity, e.g. <code>10_000_000</code>, and fixed-form [[Fortran]] ignores embedded spaces in integer literals. C (starting from [[~~C2x~~C23 (C standard revision)\|C23]]) and C++ use single quotes for this purpose. * In [[C (programming language)\|C]] and [[C++]], a leading zero indicates an [[octal]] value, e.g. <code>0755</code>. This was primarily intended to be used with [[Modes (Unix)\|Unix modes]]; however, it has been criticized because normal integers may also lead with zero.<ref>ECMAScript 6th Edition draft: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-literals-numeric-literals {{webarchive\|url=https://web.archive.org/web/20131216202526/https://people.mozilla.org/~jorendorff/es6-draft.html \|date=2013-12-16 }}</ref> As such, [[Python (programming language)\|Python]], [[Ruby (programming language)\|Ruby]], [[~~Haskell (programming language)\|~~Haskell]], and [[OCaml]] prefix octal values with <code>0O</code> or <code>0o</code>, following the layout used by hexadecimal values. * Several languages, including [[Java (programming language)\|Java]], [[C Sharp (programming language)\|C#]], [[Scala (programming language)\|Scala]], [[Python (programming language)\|Python]], [[Ruby (programming language)\|Ruby]], [[OCaml]], C (starting from C23) and C++ can represent binary values by prefixing a number with <code>0B</code> or <code>0b</code>.