2012-12-03

Announcing intalg: integer and number theory algorithms in pure Python 2.x

This is the initial release announcement for pybasealgo (fast implementation of basic algorithms in pure Python 2.x, minimum version:) and the first set of algorithms, intalg (integer and number theory algorithms). The algorithm implementations can be used in Project Euler and programming contest problem solving. These are free software under the GPL. Click on the links above to download the source.

intalg contains the following algorithms:

  • integer bit count (bit_count)
  • square root rounding down (sqrt_floor)
  • kth root rounding down (root_floor)
  • bounded prime list builder using the sieve of Eratosthenes (primes_upto, first_primes_moremem, first_primes)
  • bounded prime generator using the sieve of Eratosthenes (yield_primes_upto, yield_first_primes)
  • unbounded prime generator using the sieve of Eratosthenes (yield_primes, yield_composites)
  • high precision upper bound for natural and base 2 logarithms (log2_256_more, log_more)
  • greatest common divisor using Euclidean algorithm (gcd)
  • primality test using the Rabin-Miller test with fast parameters (is_prime)
  • finding the next prime (next_prime)
  • computation of binary coefficients (choose)
  • integer factorization to prime divisors (factorize, yield_slow_factorize, finder_slow_factorize)
  • computation of the number of divisors (divisor_count)
  • computation of the sum of divisors (divisor_sum)
  • random number generation using the Mersenne twister MT19937, without using floating point numbers (MiniIntRandom)
  • Pollard's Rho algorithm aiding integer factorization (pollard)
  • Brent's algorithm aiding integer factorization (pollard)
  • simplification of fractions (simplify_nonneg)
  • RLE: run-length encoding (rle)
  • conversion of fraction of very large integers to float (fraction_to_float)

Design principles:

  • Pure Python: Provides full functionality using only built-in Python modules.
  • Fast: Fast implementations of fast algorithms, e.g. Newton iteration for square root and kth root, sieve of Eratosthenes for prime number generation, Brent's algorithm for integer factorization. Typically faster than most of the sample code and other libraries found online. (If you find something faster, please let me know by posting a comment.) Not as fast as C code (e.g. pygmp).
  • Correct: Many Python code examples (e.g. Brent's algorithm) found on the net are buggy (especially in corner cases) or are too slow. pybasealgo strives to be correct. Production use (i.e. I use this library for solving Project Euler problems) and unit tests help this.
  • Simple: Maintains a balance between speed and implementation simplicity. It implements sophisticated and fast algorithms in a simple way, but it doesn't contain the fastest possible algorithm if it's very complex (e.g. for integer factorization, pyecm is 1472 lines long, our intalg contains Brent's algorithm, which is slower but much simpler).
  • No-float: Doesn't use floating point numbers, not even for temporary computations (e.g. math.log(...)). This is to avoid rounding errors, which make the correctness of an algorithm hard to prove and they make some computations nondeterministic on some architectures and systems. And also to support arbitrarily large numbers: floats have a maximum (e.g. all IEEE 754 64-bit are less than 2e308), so they cannot be used in calculations with very large numbers. Python uses temporary floating point numbers for random integer generation, so pybasealgo provides an integer-only replacement.
  • Arbitrarily large integers: All algorithms work with small integers (of type int) and arbitrarily large integers (of type long). There is no automatic truncation of the results.
  • Deterministic: Every function returns the same value for the same input, and it runs at the same speed. (Except if some return values are cached in memory, then subsequent calls in the same process will be faster.) This makes debugging and benchmarking a lot easier. Determinism is very important in a programming contest situation: if your program succeeds for you, you don't want it to fail or time out when the judges rerun it. Even those algorithms which use random numbers (e.g. Pollard's Rho algorithm and Brent's algorithm for integer factorization) are deterministic by default, because they use pseudorandom numbers with a seed which depends only on the input.

Have fun programming and Python and I wish you great success in Project Euler and in programming contests.

1 comment:

zsbana said...

Do you know about libtommath? It's a bigint library with clear code and a detailed book explaining all the algorithms. I know there are other good sources explaining Rabin-Miller (including the Cormen-Leiserson-Rivest-Stein book), but it's still worth a look IMO. http://libtom.org/?page=features&newsitems=5&whatfile=ltm