Żmij 1.0 released: a C++ double-to-string library delivering shortest correctly-rounded decimals ~2.8–4× faster than Ryū

Marek Knápek@programming.dev · 18 days ago

Instead of 32 bits they could use the entire cache line. It is loaded from RAM to CPU anyway. It is 64 bytes / 512 bits.

Marek Knápek@programming.dev · 2 months ago

Another link: https://github.com/vitaut/zmij/releases/tag/v1.0

Marek Knápek@programming.dev · 2 months ago

Żmij 1.0 released: a C++ double-to-string library delivering shortest correctly-rounded decimals ~2.8–4× faster than Ryū

Marek Knápek@programming.dev · 2 months ago

Yes, that is correct. We were using C++ and COM and didn’t protect ourselves against accidental exception being thrown across the COM boundary. But it never happened. Except the one case CString::Load (or whatever the function name was) failed because of Office visual style had bug in it.

Marek Knápek@programming.dev · 2 months ago

The main language of the app was English. The user could change it to German, French, Russian, Chinese, Spanish, etc. But sometimes the translation was not complete. Sometimes some plug-in was not translated. We used string IDs to identify various texts. Sometimes the ID pointed to the translated text, sometimes not, thus English version of that text was used instead.

Also I reported the bug in the visual style to Microsoft. They refused to fix it and said something like “don’t do that”.

Marek Knápek@programming.dev · 2 months ago

Win32 MFC desktop application, should be able to run continuously unsupervised for days or weeks. When I log in via RDP it crashed. Yes, just in that moment. In the end I figured it out.

MFC supports multiple visual styles, such as classic Windows, few Office versions and few more. We used feature for loading (possibly translated) strings (resource) from various DLLs. First from a currently active plug-in DLL, if the string was not there, then from a sub-system DLL, if it was not there then from main EXE. Plus there was some logic about translation and fallback to English in case of missing or not up-to-date translation. The cause was in one of the MFC visual styles, it changed a global variable related to that string loading thing for a short duration of time, then changed it back. It was only in one of the styles and there was no need to change it, the code could just load the value and use it as a parameter to a function that needed it. Our code in a different thread failed to load a string and threw an exception. Cross a COM interface boundary. Every COM method is decorated by a bunch of macros our senior engineer didn’t understand and was not willing to learn. One of the macros was noexcept. Thus an exception being thrown cross an noexcept boundary crashed our app.

Marek Knápek@programming.dev · 4 months ago

No! One code point could be encoded by up to 4 UTF-8 code units, not glyph. Glyphs do not map to code points one to one. One glyph could be encoded by more than one code point (and each code point could be encoded by more than one code unit). Code points are Unicode thing, code units are Unicode encoding thing, glyphs are font+Unicode thing. For example the glyph á might be single code point or two code points. Single code point because this is common letter in some languages, and was used in computers before Unicode was invented, two code points because this might be the base letter a followed by an diacritic combining mark. Not all diacritic letters have single code point variant. Also emojis, they are single glyph but multiple code points, for example skin tone modifier for various faces emojis, or male+female characters combined into single glyph forming a family glyph. Also country flags are single glyph, but multiple code points. Unicode is BIG, there are A LOT of stuff in it. For example sorting based on users language, conversion to upper/lower case is also not trivial (google the turkish i).

Marek Knápek@programming.dev · 2 years ago

You can always take a look how for example Windows 3.11 and earlier did it for their *.rtf file format and their “write.exe” editor / viewer / renderer (if you want to call it that way).

Marek Knápek@programming.dev · 2 years ago

You have stack buffer out of bounds write. On line 52 you declare h an array of 70 unsigned ints. On line 57 you store reference to such array. Later, on line 35 you write out of bounds, one element past end of the array. The _SPR_history[i] writes to _SPR_history[70]. Created an issue: https://github.com/X64X2/sh/issues/1

Marek Knápek@programming.dev · 2 years ago

Doesn’t depend on programming language but something with visual debugger. You know that stuff when you can see current line of your source code highlighted, press a key to step into, step over and so on. You can see values inside your variables. You can also change your variables mid-run right form the debugger.

Because you spend 20% of your time writing bugs and the other 80% debugging them. At least make it pleasant experience (no printf-style debugging).

Back in the day I was using Turbo Pascal, Delphi, Visual Basic, C#, Java, PHP with Zend, Java Script, today I’m using Visual C++.

Marek Knápek@programming.dev · 2 years ago

32 MiB Working Sets on a 64 GiB machine

Marek Knápek@programming.dev · 2 years ago

Yes, I know this. It took me long time to figure this out. My entire life I focused on technical skills / programming / math / logic. As I deemed them most important for the job. I was like: “Hey, if you cannot program, why do you work as programmer (you stupido)?” Only few years ago I realized that even as programmer (as opposed to sales man) you really need those “meh” soft skills. And that they are really important and I should not call them “meh”. I’m very good at solving problems, improving product’s performance, memory consumption, discovering and fixings bugs, security vulnerabilities. But I’m very very bad at communicating my skills and communicating with people in general. I’m not able to politely tell people that theirs idea is bad, I just say “that’s stupid”. And I’m mostly/sometimes right (if I’m not 100% sure, I don’t say anything), but the damage caused by the way I say it is often inreversible. That post of mine about the job interview and CV was half joke and half reality. I just freeze/stutter when I’m asked something that is obvious because it is written I my CV. I’m immediately thinking “Did he not received the CV?” or “Did he not read it?” “Why the fuck is he not prepared for the call? Why are we wasting time asking me what should be obvious because I sent it in advance?” I’m more robot than human. Put me in front of problem and forget to tell me that it is impossible to solve … and I will solve it. But easy small talk … disaster. Communicating what the problem really was … disaster. Communicating how I solved it … disaster. “It was not working before and now it works fine, what the hell do you want from me now?” Yes, I’m very bad in team, in collective. I didn’t know the reason why, but since few years ago I know the root of the problem. It’s not that everybody around me is stupid and don’t know basic stuff (what I consider basic), but me unable to communicate with other humans.

Marek Knápek@programming.dev · 2 years ago

The interview starts … the interviewer asks me “Tell me about yourself.” … I respond “Did you receive my CV? I put all important details about me … right there. What questions do you have about my past jobs?” The interviewer encourages me again to tell him about myself, my past projects, etc. … Me: Awkward silence. … Me to myself: Dafuq? Should I read the CV from top to bottom OR WHAT?

Marek Knápek@programming.dev · 3 years ago

Yes, but (there is always a but) it does not apply if you implement off-line encryption. Meaning no on-line service encrypting / decrypting attacker provided data (such as SSL / TLS / HTTPS). Meaning if you are running the cipher on your own computer with your own keys / plaintexts / ciphertexts. There is nobody to snoop time differences or power usage differences when using different key / different ciphertext. Then I would suggest this is fine. The only one who can attack you is yourself. In fact, I implemented AES from scratch in C89 language, this source code is at the same time compatible with C++14 constexpr evaluation mode. I also implemented the Serpent cipher, Serpent was an AES candidate back then when there were no AES and Rijndael was not AES yet. The code is on my GitHub page.

Marek Knápek@programming.dev · 3 years ago

The Little Things: The Missing Performance in std::vector

Marek Knápek@programming.dev · 3 years ago

std::vector::reserve + std::vector::push_back in loop is sub-optimal, because push_back needs to check for re-allocation, but that never comes.

std::vector::resize + std::vector::operator[] in loop is also sub-optimal, because resize default-initializes all elements only to be overwritten soon anyway.

This article’s author suggests push_back_unchecked.

I suggest std::vector::insert with pair of random access iterators with custom dereference operator that does the “transform element” or “generate element” functionality. The standard will have resize_and_overwrite hopefully soon.

Moar discussion:

https://codingnest.com/the-little-things-the-missing-performance-in-std-vector/

https://twitter.com/horenmar_ctu/status/1695823724673466532

https://twitter.com/horenmar_ctu/status/1695331079165489161

https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/

https://www.reddit.com/r/cpp/comments/162tohr/the_little_things_the_missing_performance_in/jy21hgd/

https://twitter.com/basit_ayantunde/status/1644895468399337473

https://twitter.com/MarekKnapek/status/1645272474517422081

https://www.reddit.com/r/cpp/comments/cno9ep/improving_stdvector/

Marek Knápek@programming.dev · 3 years ago

The Little Things: The Missing Performance in std::vector

Marek Knápek@programming.dev · 3 years ago

Another alternative to C++ exceptions (instead of return code) is to use global (or thread local) variable. This is exactly what errno that C and POSIX are using or GetLastError what Windows is using. Of course, this has its own pros and cons.

Marek Knápek@programming.dev · 3 years ago

Makes sense, how would you represent floor(1e42) or ceil(1e120) as integer? It would not fit into 32bit (unsigned) or 31bit (signed) integer. Not even into 64bit integer.

Marek Knápek@programming.dev · 3 years ago

AES encryption in C from scratch on web.

Marek Knápek@programming.dev · 3 years ago

It is trade-off between convenience and security. With my approach stolen cookies are not usable from different computer / IP, the attacker needs additional work, he needs the victim computer to do the harm, his computer cannot do any harm. The downside is the user needs another log-in in case of his external IP changes. How often is it? Switch between mobile/WiFi. Otherwise … almost never … maybe 1x per day? I’m not proposing to log-out the user after IP change, I’m proposing to keep multiple sessions (on server) / auth cookies (on client) for each IPv4 or IPv6 prefix (let’s say /56).

Marek Knápek@programming.dev · 3 years ago

And that JavaScript has access to cookies, that’s just a basic part of how web browsers work. Lemmy can’t do anything to prevent that.

Yes and No. Cookies could be accessed by JS on the client. BUT. When the cookie is sent by the server with additional HttpOnly header, then the cookie cannot be accessed from JS. Look at Lemmy GitHub issue, they discuss exactly this. Lemmy server absolutely has power to prevent this.

Again, Lemmy can’t do anything about that. Once there’s a vulnerability that allows an attacker to inject arbitrary JS into the site, Lemmy can’t do anything to prevent that JS from making requests.

I believe they can. But I’m not sure about this one. The server could send a response preventing the web browser to request content from other domains. Banks are using this. There was an attack years ago when attacker created a web page with i-frame in it. The i-frame was full screen to confuse the victim it is actually using the Banks site and not the attacker site. The bank web site was inside the inner i-frame, the code in the outer frame then had access to sensitive data in the inner frame. I believe there are HTTP response headers that instruct the web browser to not allow this. But I’m not sure I remember how exactly this works.

completely independent backend

Yes, it would be more costly, but more secure. It is trade-off, which one is more important to you? In case of chat/blog/forum app such as Lemmy I prefer cheap, in case of my Bank website I prefer secure.

Marek Knápek@programming.dev · 3 years ago

Oh I forgot another line of defense / basic security mitigation. If a server produces an access token (such as JWT or any other old school cookie / session ID), pair it with an IP address. So in case of cookie theft, the attacker cannot use this cookie from his computer (IP address). If the IP changes (mobile / WiFi / ADSL / whatever), the legitimate user should log-in again, now storing two auth cookies. In case of another IP change, no problemo, one of the stored cookies will work. Of course limit validity of the cookie in time (lets, say, keep it valid only for a day or for a week or so).

Marek Knápek@programming.dev · 3 years ago

So what happened:

Someone posted a post.
The post contained some instruction to display custom emoji.
So far so good.
There is a bug in JavaScript (TypeScript) that runs on client’s machine (arbitrary code execution?).
The attacker leveraged the bug to grab victim’s JWT (cookie) when the victim visited the page with that post.
The attacker used the grabbed JWTs to log-in as victim (some of them were admins) and do bad stuff on the server.

Am I right?

I’m old-school developer/programmer and it seems that web is peace of sheet. Basic security stuff violated:

User provided content (post using custom emojis) caused havoc when processing (doesn’t matter if on server or on client). This is lack of sanitization of user-provided-data.
JavaScript (TypeScript) has access to cookies (and thus JWT). This should be handled by web browser, not JS. In case of log-in, in HTTPS POST request and in case of response of successful log-in, in HTTPS POST response. Then, in case of requesting web page, again, it should be handled in HTTPS GET request. This is lack of using least permissions as possible, JS should not have access to cookies.
How the attacker got those JWTs? JavaScript sent them to him? Web browser sent them to him when requesting resources form his server? This is lack of site isolation, one web page should not have access to other domains, requesting data form them or sending data to them.
The attacker logged-in as admin and caused havoc. Again, this should not be possible, admins should have normal level of access to the site, exactly the same as normal users do. Then, if they want to administer something, they should log-in using separate username + password into separate log-in form and display completely different web page, not allowing them to do the actions normal users can do. You know, separate UI/applications for users and for admins.

Am I right? Correct me if I’m wrong.

Again, web is peace of sheet. This would never happen in desktop/server application. Any of the bullet points above would prevent this from happening. Even if the previous bullet point failed to do its job. Am I too naïve? Maybe.

Marek.

Marek Knápek@programming.dev · 3 years ago

Cryptographic hash functions calculator on-line.

Marek Knápek@programming.dev · 3 years ago

Cryptographic hash functions calculator on-line.

Marek Knápek