Friday, October 10, 2014

Convert Numbers to Binary String

One application uses binary matrix to define user rights. That is very smart move, which is easy to implement, and it saves space and bust performance.

However, when you look at the value in SQL table it does not make ANY sense at all.

In order to reverse engineer you have to convert numbers to binaries in order to determine which bits are responsible for certain rights/operations.

At first I tried to do it manually, using calculator, but was quickly tired of it. So many copy-paste and switching modes between Decimal and Binary in addition to inability to process really big numbers bigger than "9223372036854775807", which is just NUMERIC(19) and you limited to see only 64 bits.

The best I've found in the Internet was post of Mark S. Rasmussen: http://improve.dk/converting-between-base-2-10-and-16-in-t-sql/

There he presented very simple conversion functions. The problem is - I do not like functions.

So I decided to reinvent the wheel, hoping my code would be better.

My approach is basing on preparing binary quartets from 0 to 15 (see the chart).
Decimal NumberBinary Quartet
00000
10001
20010
30011
40100
50101
60110
70111
81000
91001
101010
111011
121100
131101
141110
151111

The very first query was little ugly using While loop:


DECLARE @TheNumber NUMERIC(32) = 56346543654365645465463543439347;
DECLARE @Quartets TINYINT = DATALENGTH(@TheNumber)*2;
DECLARE @BitCode VARCHAR(136)='';
WHILE @Quartets > 0
BEGIN
SELECT @Quartets -=1, @BitCode = CASE @TheNumber % 16
              WHEN 0 THEN '0000' WHEN 1 THEN '0001'
              WHEN 2 THEN '0010' WHEN 3 THEN '0011'
              WHEN 4 THEN '0100' WHEN 5 THEN '0101'
              WHEN 6 THEN '0110' WHEN 7 THEN '0111'
              WHEN 8 THEN '1000' WHEN 9 THEN '1001'
              WHEN 10 THEN '1010' WHEN 11 THEN '1011'
              WHEN 12 THEN '1100' WHEN 13 THEN '1101'
              WHEN 14 THEN '1110' WHEN 15 THEN '1111'
       END + @BitCode, @TheNumber = FLOOR(@TheNumber/16.);
END
PRINT @BitCode;



My Second try was based on recursion:


DECLARE @TheNumber NUMERIC(32) = 56346543654365645465463543439347;
DECLARE @Quartets TINYINT = DATALENGTH(@TheNumber)*2;

;WITH TheNumber AS (
       SELECT 0 as Level, @TheNumber % 16 as Digit,
              FLOOR(@TheNumber/16) as TheNumber
       UNION ALL
       SELECT Level + 1, TheNumber % 16,  FLOOR(TheNumber/16.)
       FROM TheNumber WHERE Level < @Quartets
), Quartet AS (
       SELECT 0 as Digit, '0000' as Quartet UNION ALL SELECT 1, '0001'
              UNION ALL SELECT 2, '0010' UNION ALL SELECT 3, '0011'
              UNION ALL SELECT 4,'0100' UNION ALL SELECT 5, '0101'
              UNION ALL SELECT 6, '0110' UNION ALL SELECT 7, '0111'
              UNION ALL SELECT 8, '1000' UNION ALL SELECT 9, '1001'
              UNION ALL SELECT 10, '1010' UNION ALL SELECT 11, '1011'
              UNION ALL SELECT 12, '1100' UNION ALL SELECT 13, '1101'
              UNION ALL SELECT 14, '1110' UNION ALL SELECT 15, '1111'
)
SELECT (  
       SELECT q.Quartet + ''
       FROM TheNumber as n
       INNER JOIN Quartet as q 
              ON n.Digit = q.Digit
       ORDER BY n.Level DESC
       FOR XML PATH('')
) as BNumber;
I'd expect it to be little faster, but couldn't notice any difference between these two queries.

The problem was still not solved. How to massively decode numbers without calling functions?
And here is the final script I came up with:

;WITH  Quartet AS (
       SELECT 0 as Digit, '0000' as Quartet UNION ALL SELECT 1, '0001'
              UNION ALL SELECT 2, '0010' UNION ALL SELECT 3, '0011'
              UNION ALL SELECT 4,'0100' UNION ALL SELECT 5, '0101'
              UNION ALL SELECT 6, '0110' UNION ALL SELECT 7, '0111'
              UNION ALL SELECT 8, '1000' UNION ALL SELECT 9, '1001'
              UNION ALL SELECT 10, '1010' UNION ALL SELECT 11, '1011'
              UNION ALL SELECT 12, '1100' UNION ALL SELECT 13, '1101'
              UNION ALL SELECT 14, '1110' UNION ALL SELECT 15, '1111'
)
SELECT  TOP 10000 m.message_id, BinaryCode
              = q7.Quartet + q6.Quartet + q5.Quartet + q4.Quartet
              + q3.Quartet + q2.Quartet + q1.Quartet + q0.Quartet
FROM sys.messages as m
INNER JOIN Quartet as q0 ON q0.Digit = m.message_id % 16
INNER JOIN Quartet as q1 ON q1.Digit = FLOOR(m.message_id/POWER(16,1)) % 16
INNER JOIN Quartet as q2 ON q2.Digit = FLOOR(m.message_id/POWER(16,2)) % 16
INNER JOIN Quartet as q3 ON q3.Digit = FLOOR(m.message_id/POWER(16,3)) % 16
INNER JOIN Quartet as q4 ON q4.Digit = FLOOR(m.message_id/POWER(16,4)) % 16
INNER JOIN Quartet as q5 ON q5.Digit = FLOOR(m.message_id/POWER(16,5)) % 16
INNER JOIN Quartet as q6 ON q6.Digit = FLOOR(m.message_id/POWER(16,6)) % 16
INNER JOIN Quartet as q7 ON q7.Digit = FLOOR(m.message_id/POWER(16,7)) % 16

That query looks even uglier and supports only data type "INT", but it can do inline conversions without using a function!

Later I tested that query against Mark S. Rasmussen's query within a function:
* I modified it little bit to produce the similar result and accept INT as a parameter.


CREATE FUNCTION [dbo].[DecimalToBinary]
(
       @Input int
)
RETURNS varchar(32)
AS
BEGIN
       DECLARE @Output varchar(32) = ''

       WHILE @Input > 0 BEGIN
              SET @Output = @Output + CAST((@Input % 2) AS varchar)
              SET @Input = @Input / 2
       END
       RETURN RIGHT(REPLICATE('0',32) + REVERSE(@Output), 32)
END
GO
SELECT  TOP 10000 message_id, dbo.DecimalToBinary(message_id)
FROM sys.messages

My testing showed that usage of function was 3.5 times slower than just inline query.

The very old battle: Ugly code vs Slow code.





No comments:

Post a Comment