SFTP download of CSV file does not seem to be UTF-8 encoded - is there an option for this?

I don’t believe this actually is a fix. This works fine now IF I can specify that the CSV file is UTF-8 encoded (which I can do in some of my batch processes that create CSV files), but when a CSV file is created as an ANSI file (or a vanilla CSV file by Excel or some other process), it is not being processed correctly (as it was before). I am getting the same error as before with the new build:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 58
Char : 5
The iterator could not be initialised.

Also, is there a way to specify when creating a text file in Automise that it is UTF-8 encoded? I am running into the issue with a CSV file I am creating in Automise itself.

I figured out how to unset the “Write Byte Order Mark”, which is now required. However, for files I am getting from other sources, there is no way to change the file in such a way that the CSV iterator does not fail. I get a CSV file, start an iterator on that file, and it fails with:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 54
Char : 3
The iterator could not be initialised.

Please let me know what I can do. Several batch processes have started failing over the last 24 hours.

I’m not able to reproduce this here, are you able to send one of the failing csv files to support@finalbuilder.com so we can test with it here. Failing that, can you open the offending file in notepa++ and see which encoding it reports?

The file that failed last night in our batch process was a text file (.txt) that was encoded in ANSI according to Notepad++. I have no control (that I know of) over the encoding of the file, but when I try to process it with a CSV Iterator, I get the same message:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 58
Char : 5
The iterator could not be initialised.

This was something that has worked for months, if not years, before now.

Another example. I create a file called CaseData.csv within Automise. I encode it as ANSI and I uncheck “Write byte order mark”. When I add text to the file, I use the same encoding and uncheck “Write byte order mark”. When I then try to use a CSV Iterator on the file, I get the error:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 58
Char : 5
The iterator could not be initialised.

According to Notepad, the file I am creating is coded in ANSI (as expected).

I have found a file that is generating the error with one line of data. I will send that file to support@finalbuilder.com

Hi Jonathon

I see the same error here with the file you sent, however when I open it in notepad++ and select “View menu\Show Symbol\Show All Characters” - it shows that there is an invalid byte on line 2 position 1223 with the value xA0 which is not a valid ansi character.

I guess what I am wondering is: When I am creating a text file in Automise, what settings should I use? Should I encode as UTF-8 and check the “write byte order mark” box? Also, how do I handle files received from others where I have no control over how the file was originally created/saved? This used to not be an issue, but now it is, and I am confused.

It seems to me that the CSV Iterator action should have some way for the user to indicate the encoding that should be used to process the file. Does that make sense at all?

I found a workaround of sorts by using a Powershell script to change the encoding on all files to UTF8 before processing them, but it is not something that is scalable and right now it means that my Automise projects need to run under a different ID than normal. I am hanging around tonight in the hopes that we can have more of a conversation than one post per day :slight_smile:

Hi Jonathan

This whole encoding issue stems from the fact that :

a) until recently, the csv iterator action only supported ANSI or UTF-16 - that was due to us using the Scripting.FileSystem com object - which is provided by windows but no longer updated. It was unable to deal with files that have a Byte Order Mark - this was an issue that you reported in this post CSV Iterator and IF Statement bug

b) In order to handle files with different encodings - we check for a byte order mark (which indicates the encoding) - if that doesn’t exist we fall back to UTF-8 - which is a superset of ANSI so will handle ansi files just fine.

c) Your files have no BOM (so encoding cannot be determined) and invalid bytes 0xA0 in them which is neither valid ANSI or UTF-8. This is what causes the error you are seeing. I loaded up your sample file in a bunch of editors, some loaded them without complaining (until I tried to save) - some complained about the encoding (either invalid ascii or invalid utf-8 continuation).

I guess the Scripting.FileSystem com object was more lenient when it comes to handling invalid characters - however it barfed on files with BOM’s - we can’t win.

I’m currently exploring other options - I did get it to work by not specifying a default encoding - however I now need to test with other encodings to see how it handles them.

We’re testing another option now that will try and handle the encoding error and retry with a different encoding.

I really appreciate your research and your willingness to find a workaround. If I need to specify encoding with files I create, I can do that. My issue is that sometimes I am given files that I did not create, and unless I use the Powershell script, I have no control over what the encoding is once it hits the iterator. I am wondering - is there a way to specify encoding in the CSV iterator action? Then I can tell it what encoding to try with the particular file I am giving to that action.

We did consider that, however that would be painful for some users when working with filesets etc.

I will have build uploaded shortly that will behave a lot better - the trick was to try opening with the fallback (for no bom scenarios) as utf-8 - if that fails then try again with the fallback being Encoding.Default - that did work for all my test files. I’m not sure this will be the end of this issue for some users though, as Encoding.Default can be different on different machines.

1 Like

That will allow me to turn off the Powershell script, if it works. That would be great.

Please try this build

Much appreciated. I am off tomorrow, but I will work on getting this in next week. My workarounds will hold for the weekend, I think. I am glad I stayed up to converse with you.

We may not be done with this just yet, the mitigation I put in place has flaws that a collegue just pointed out. More investigation needed.

Ok, this build should do the trick

https://downloads.finalbuilder.com/downloads/automise/500/AT500_1372.exe
https://downloads.finalbuilder.com/downloads/automise/500/ATCMD500_1372.exe
https://downloads.finalbuilder.com/downloads/automise/500/AutomiseRunner500_1372.exe

1 Like

I have installed this, and reverted my action lists to my previous logic. We’ll see tonight if things succeed.

Looks like it worked. Some of the batch processes will not be run until later in the week, but there were no issues that I can see with existing nightly processes. Thank you for working with me to turn this around.