Operation cancelling and std::fstream

Posted on May 23, 2019 by Andrzej Krzemieński

In the previous post we have sketched out the view that error handling is about expressing the success dependency between operations. I have also indicated the guideline “destructors only for releasing resources”. In this post we are going to see what it means in practice. We will try to save some data to a file using std::ofstream.

Here’s the situation. We are writing a text processor application. We are going to write function save() that saves user’s text document into disk. That is, we say, “this function saves data to disk”, but we really mean, “this function either saves data to disk or it reports failure.” Failure to save work in the word processor is not that big of a deal provided that the program informs the user about it. If I push button “save” and program says, “no space on disk”, I know that I cannot trust the program and have to take an action myself: clean up my disk, or copy the contents to clipboard and send it over mail. The worse thing that can happen is when I push button “save” and program behaves as if it saved my work whereas nothing was saved. I will now believe that everything is fine and continue working and even more of my work will likely get lost.

Let’s start with the most natural thing to do:

void save()
{
  std::ofstream f {"work.txt"};
  f << provideA();
  f << provideB();
} // flush in destructor

We did not put any explicit error checking code because we assume that exception handling mechanism will do just the right thing. But it will not. What happens if the file for some reason cannot be opened? No exception will be thrown, because IO Streams by default do not throw exceptions. In fact IO streams were designed before we had exceptions in C++. So, in order to have our ofstream throw exceptions we have to instruct it to do so before we even open the file:

void save()
{
  std::ofstream f; 
  f.exceptions(ios_base::failbit | ios_base::badbit);

  f.open("work.txt");
  f << provideA();
  f << provideB();
} // flush in destructor

We start with initializing a dummy fstream, which cannot fail. We tell when to report exceptions. And then when we open the file, we will get an exception thrown if opening the file fails. Now, what happens if buffering B in the second write instruction fails? We will have written only partial data, and the destructor will flush it into disk. Saving half of data may be worse than not saving data at all. But that is not the biggest concern. As long as we are just using << we are only writing the contents into the internal buffer. The real write to the disk is going to be performed in destructor. It is this write to the disk that is the most likely operation to fail, because now we are really messing with the filesystem. What happens if this write fails? The answer is: nothing. This happens in the destructor. The Standard Library is very cautious not to throw exceptions from destructors. So whatever fails, the library will keep it secret. You will not be informed by any means. Function save() will return fine, and you will be led to believe that it succeeded, but no data will really be stored. This happens when one too literally follows the rule, “do not throw from destructors.”

But I am not saying that you should start throwing from destructors. I am saying, you should design and use your types in such a way that destructors never need to signal failure. Our code could be rewritten like this:

void save()
{
  std::ofstream f; 
  f.exceptions(ios_base::failbit | ios_base::badbit);

  f.open("work.txt");
  f << provideA();
  f << provideB();
  f.flush(); // write data (can throw)
} // only close

Now, function flush() performs the write to disk. Even though it is obvious that the last operation on the ofstream is to flush data to disk, we still want to write it down explicitly. This way everyone can see that the write is here. We can see that it can be canceled if preceding operations failed. We can see that it can fail, and cause subsequent operations to be canceled.

Now the destructor does not have to write anything. It only needs to release the file handle to the operating system. Can this operation fail? Yes. But such failure does not require our callers to be canceled: we have done our job: all data has been written to file in the previous operation. We may be leaking the resource now, but this is not what our callers rely on: they can move on.

Our code is not as short as it could have been; but it is correct. There is something more than just not being short: now the programmer becomes exposed to the fact that writing to file is done in staged: first buffer, then flush. But I think it is in fact desired. In C++, which is performance sensitive, such design strategy as buffering, is part of the contract.

Note: In the above example, I call function flush() even though function close() seems more appropriate. Function close() provides additional useful information: our intention is not to write to this file again (in this function). We will explore this in detail in the next post.

This entry was posted in programming and tagged correctness, destructors, exception handling, resource handling. Bookmark the permalink.

11 Responses to Operation cancelling and std::fstream

Dmitrii Volosnykh says:

May 24, 2019 at 6:53 am

Nice article, thanks!

One question, though. Assume that provideA() and provideB() generate data of length bigger than that of an ofstream’s underlying buffer. I guess, ofstream will implicitly flush the buffer. Thus, we are back in the same boat of partially saved contents. Do we have to go further and use ostringstream as an intermediate storage of user’s data. What do you think?

Reply
- Andrzej Krzemieński says:
  
  May 24, 2019 at 7:14 am
  
  Indeed. If it is our contract: “either entire big data is stored or nothing is stored”, then we may simply conclude that ofstream is not our tool. (I am not sure, though: maybe it has a way to manually set the size of the buffer.) ostringstream might not cut it either. Then we will have to devise a custom solution: create a temporary file and use ofstream to write to it. If the write was partial remove the temporary file (or just leave it) and report failure to write (but not the failure to remove the temporary file). If the write is complete, just rename the file to what you need.
  
  Reply
  - Dmitrii Volosnykh says:
    
    May 24, 2019 at 7:20 am
    
    Actually, it does by means of ofstream::rdbuf()->pubsetbuf(data, length), yet this solution is still limited by RAM: text processor already stores full document, and we make our application to store another copy inside buffer this way. Bottom line, your suggestion with temporary file is expected to work better, indeed.
    
    Reply
Gast says:

May 27, 2019 at 10:06 am

You’re contradicting yourself. On the one hand you flush(), because it may throw, but not close(), which also may throw. Ignoring errors is everything but not a good example for proper error handling or correctness of code.
“we have done our job” is wrong, because the creator of the file has to close it with error handling.
So there is no excuse to not write f.close(); With void save(std::ofstream& f) you would be right.

Reply
- Andrzej Krzemieński says:
  
  May 27, 2019 at 10:37 am
  
  Thank you for pointing this out. It is important. However, I must admit I cannot think of a situation in the environment that would return positive response from flush() but a negative response from fclose(). If you know of one, I think it would benefit all readers.
  
  Reply
  - Gast says:
    
    May 28, 2019 at 9:39 am
    
    Hmm, you call flush() to be really sure that every bit has taken its seat, but you only hope that close()/CloseHandle()/protocol/server does not more than throwing the handle away?
    I know one, every write to file. For example, but not limited to this: Creating file on network, maybe checked or not. write write write, every byte to cache, no check. Then CloseHandle() is the only point where an error can be returned. FlushFileBuffers() does not or limited help because it writes only buffers, not meta data (which is not unlogical behaviour). flush() is not close().
    Since Atari ST I looked for good error handling and presentation (there in C). Later I did a little test. Windows 95 versus Windows NT 4, write one file and fill floppy disk with data, program used only system calls, handled all errors and enabled the user to retry the failing function call. While program runs unload the disk — for the younger one readers, it means pressing the mechanical eject button.
    Windows 95: Blue Screen, crash. It had written all data to cache, returned OK on every function inclusive close and then could not flush. Retrying or cancelling or do anything was impossible.
    Windows NT: Did all checks and let the program know. Put floppy in again, hit Retry, continue writing data. Everthing OK, every byte filled, data written well.*
    
    Since this test I check EVERY single return code, and I accept only one place to generally ignore errors: Close/Destroy in destructor, because may called while handling an error. I tried that, but more is not possible without breaking your fingers. And I enable to retry all functions that can be useful retried.
    So you may pardon me that I have an eagle eye on writings about proper error handling. It’s possible, it’s not hard, but it’s some work that noone does (wide spread libraries like std-glibc++ and Qt included).
    
    * Amiga has solved this by itself with the system message box “You have to put in floppy ‘namexy’!!!!!
    
    Reply
    - Andrzej Krzemieński says:
      
      May 29, 2019 at 7:02 am
      
      I agree with the desire to have the application prepared for every corner-case situation that can occur. However, before engaging into that, I want to be sure if I am actually covering real situations (rare or exceptional is also real in my book). You mentioned that close() could write some meta information to a file that flush() does not. Do you know what system does this distinction? Is there any documentation or article you could point me to?
      
      On the other hand, I observed min my experiments that there are situations where a write is obviously interrupted, but the standard library functions do not report it by any means. For instance, when I request a big file to be copied into a pendrive with std::filesystem interface, and during the copy I unplug the pendrive, the system function returns success through all channels (it has three), even though it is obvious to the system that the copying failed: it is supposed to last a minute, but it stops the second I unplug the pendrive. The library vendors say that such implementation is conformant.
      
      Anyway, by holding this conversation, I am not questioning your goal of handling all corner-case situations. I just want to explore the problem and learn from you as much as I can. I have put a note in the post reflecting your disagreement. I want to explore this further.
Yankes says:

May 28, 2019 at 8:00 pm

I think error is that `std::fstream` save anything in destructor. What if you throw error during save?
This should work like transactions, you need explicitly use “commit” to save state.

Reply
Gast says:

May 31, 2019 at 10:49 am

Aargh, no Reply button, that’s an error.
Linux close(): “EIO An I/O error occurred.”
https://docs.microsoft.com/en-us/windows/desktop/api/handleapi/nf-handleapi-closehandle does not limit the possible return codes. A quickest test shows that Windows updates access times (meta data) with write, not with close (or not every time). But Windows knows asynchronous file functions (called Overlapped). CloseHandle() is synchronous, has to wait and the async-function may return an error… According to https://stackoverflow.com/questions/8885748/closehandle-returns-before-the-serial-port-is-actually-closed , a thread can start an operation, while another one is in CloseHandle().
But, no, I have no examples that may happen today in real life. My tests about errors in reality where 20 years ago, where I have seen that modern OS handle (hardware) errors good. So I concentrate on handling in my code.

> the application prepared for every corner-case situation that can occur.
I don’t see errors on close and destroy not as a corner case in general, because I can do it like every other thing on the object. The corner case is error handling while error handling (and quitting program/thread after a certain point), because it’s not possible. Only then I close() and hope, … and know, that one error is handled. But how often do this two things appear together? Really rare I would say.

> the system function returns success through all channels
The OS or std::filesystem? The latter, I bet.

Reply
- Andrzej Krzemieński says:
  
  May 31, 2019 at 10:54 am
  
  Thanks! Let me consume this information.
  
  > > the system function returns success through all channels
  > The OS or std::filesystem? The latter, I bet.
  Yes, my test uses std::filesystem.
  
  Reply
  - Gast says:
    
    June 3, 2019 at 7:04 am
    
    That is my experience. If I want to find out how a foreign library does with errors, I first look for freeing resources. 10 Minutes in Qt … had CloseHandle() without assignment (“No, that can’t be, you’re wrong, it’s used by the best known companies.”). In WWW I found only one question about that mistake. That shows, how low the priority of consistently error handling is and how high the equation FreeResource() == allways success.
    
    Reply