SFTP download of CSV file does not seem to be UTF-8 encoded - is there an option for this?

jonathanmeltzer · March 21, 2022, 2:24pm

We have a project which connects via SFTP and downloads a CSV file. The file gets downloaded without a problem, but I then have several Text Replace actions using the downloaded file. Those actions are all failing with a message of
“An error occurred:
No mapping for the Unicode character exists in the target multi-byte code page”

This error seems to indicate that the file does not have UTF-8 encoding, and may have one or more characters that cannot be interpreted correctly. Is there an option, either in the SFTP download or in the Text Replace action, to get past this?

Weirdly, I was working with the same project last week with a different file (of the same types of records) and did not see this issue.

Using Automise 5.0.0.1358.

jonathanmeltzer · March 21, 2022, 2:32pm

Right after adding this item, I had a similar issue with a CSV iterator of a standard CSV file:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 58
Char : 5
The iterator could not be initialised.

I am assuming this is related, but again, this did not happen before today. I was running build 1327, but I installed 1358 thinking that it would maybe get past the problem, but it did not. At this point, I am considering this a bug. Please let me know what I can do.

EDIT: Since it gives a line number and a character position, I looked at that line of the file, but there is nothing I can see on that line that would cause a problem:

Line 58: "LBOW-S",,"0013300001j0TuQAAU",,, (Name of client account follows)

jonathanmeltzer · March 21, 2022, 3:13pm

Related to my first post above: If I manually save the CSV file as an ANSI file, the action list which modifies the file succeeds. However, I cannot see a way to automatically save the file as an ANSI file when it is downloaded using SFTP.

jonathanmeltzer · March 21, 2022, 3:34pm

This does not work for my second post. The file is already an ANSI file and is still running into the problem.

jonathanmeltzer · March 21, 2022, 8:56pm

Making sure that the file is encoded in ANSI before it got to me seemed to solve the first issue. With the second one, making sure that the file was encoded in UTF-8 seemed to solve it. Very strange issues to come up all of a sudden, though.

jonathanmeltzer · March 21, 2022, 9:17pm

However, when I end up with a CSV file that is UTF-8 encoded, the processing of that file is not working. The headers are getting associated with the correct fields, but the data in the rows are not. I would like very much to do a screenshare/call tomorrow (Tuesday Eastern US time) between 8 and 5. I can show you what is happening. If there is a good way to get you the log without giving you the data file (which I cannot do), then please let me know that.

Vincent · March 21, 2022, 10:50pm

Hi Jonathon

This is a regression in 1356 (which was supposed to fix the encoding issue) - we back ported a streamreader class from a newer version of Delphi (the version we use for AT6 didn’t have it), however we didn’t realise that class was relying on an internal change for string indexes in the later version (zero based rather than one based) - I don’t know how I didn’t notice this before when testing.

The other issue I found was that the stream reader was defaulting to ASCII encoding when no BOM was found - this is a bit short sighted and I have changed it to default to UTF-8 if no BOM is found (this will be safe for ASCII files too.

A build is running now, I will post a link to it shortly.

Vincent · March 21, 2022, 11:07pm

This build has the fix.

https://downloads.finalbuilder.com/downloads/automise/500/AT500_1365.exe
https://downloads.finalbuilder.com/downloads/automise/500/ATCMD500_1365.exe
https://downloads.finalbuilder.com/downloads/automise/500/AutomiseRunner500_1365.exe

jonathanmeltzer · March 22, 2022, 12:56pm

I don’t believe this actually is a fix. This works fine now IF I can specify that the CSV file is UTF-8 encoded (which I can do in some of my batch processes that create CSV files), but when a CSV file is created as an ANSI file (or a vanilla CSV file by Excel or some other process), it is not being processed correctly (as it was before). I am getting the same error as before with the new build:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 58
Char : 5
The iterator could not be initialised.

Also, is there a way to specify when creating a text file in Automise that it is UTF-8 encoded? I am running into the issue with a CSV file I am creating in Automise itself.

jonathanmeltzer · March 22, 2022, 8:48pm

I figured out how to unset the “Write Byte Order Mark”, which is now required. However, for files I am getting from other sources, there is no way to change the file in such a way that the CSV iterator does not fail. I get a CSV file, start an iterator on that file, and it fails with:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 54
Char : 3
The iterator could not be initialised.

Please let me know what I can do. Several batch processes have started failing over the last 24 hours.

Vincent · March 22, 2022, 9:23pm

I’m not able to reproduce this here, are you able to send one of the failing csv files to support@finalbuilder.com so we can test with it here. Failing that, can you open the offending file in notepa++ and see which encoding it reports?

jonathanmeltzer · March 23, 2022, 2:31pm

The file that failed last night in our batch process was a text file (.txt) that was encoded in ANSI according to Notepad++. I have no control (that I know of) over the encoding of the file, but when I try to process it with a CSV Iterator, I get the same message:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 58
Char : 5
The iterator could not be initialised.

This was something that has worked for months, if not years, before now.

jonathanmeltzer · March 23, 2022, 3:19pm

Another example. I create a file called CaseData.csv within Automise. I encode it as ANSI and I uncheck “Write byte order mark”. When I add text to the file, I use the same encoding and uncheck “Write byte order mark”. When I then try to use a CSV Iterator on the file, I get the error:

Error Executing script : OnInitialiseIterator
FB80ActionCtx.FileReaderImpl
No mapping for the Unicode character exists in the target multi-byte code page
Line: 58
Char : 5
The iterator could not be initialised.

According to Notepad, the file I am creating is coded in ANSI (as expected).

jonathanmeltzer · March 23, 2022, 3:56pm

I have found a file that is generating the error with one line of data. I will send that file to support@finalbuilder.com

Vincent · March 23, 2022, 10:54pm

Hi Jonathon

I see the same error here with the file you sent, however when I open it in notepad++ and select “View menu\Show Symbol\Show All Characters” - it shows that there is an invalid byte on line 2 position 1223 with the value xA0 which is not a valid ansi character.

jonathanmeltzer · March 24, 2022, 11:51am

I guess what I am wondering is: When I am creating a text file in Automise, what settings should I use? Should I encode as UTF-8 and check the “write byte order mark” box? Also, how do I handle files received from others where I have no control over how the file was originally created/saved? This used to not be an issue, but now it is, and I am confused.

It seems to me that the CSV Iterator action should have some way for the user to indicate the encoding that should be used to process the file. Does that make sense at all?

jonathanmeltzer · March 24, 2022, 11:02pm

I found a workaround of sorts by using a Powershell script to change the encoding on all files to UTF8 before processing them, but it is not something that is scalable and right now it means that my Automise projects need to run under a different ID than normal. I am hanging around tonight in the hopes that we can have more of a conversation than one post per day

Vincent · March 24, 2022, 11:44pm

Hi Jonathan

This whole encoding issue stems from the fact that :

a) until recently, the csv iterator action only supported ANSI or UTF-16 - that was due to us using the Scripting.FileSystem com object - which is provided by windows but no longer updated. It was unable to deal with files that have a Byte Order Mark - this was an issue that you reported in this post CSV Iterator and IF Statement bug

b) In order to handle files with different encodings - we check for a byte order mark (which indicates the encoding) - if that doesn’t exist we fall back to UTF-8 - which is a superset of ANSI so will handle ansi files just fine.

c) Your files have no BOM (so encoding cannot be determined) and invalid bytes 0xA0 in them which is neither valid ANSI or UTF-8. This is what causes the error you are seeing. I loaded up your sample file in a bunch of editors, some loaded them without complaining (until I tried to save) - some complained about the encoding (either invalid ascii or invalid utf-8 continuation).

I guess the Scripting.FileSystem com object was more lenient when it comes to handling invalid characters - however it barfed on files with BOM’s - we can’t win.

I’m currently exploring other options - I did get it to work by not specifying a default encoding - however I now need to test with other encodings to see how it handles them.

Vincent · March 25, 2022, 12:20am

We’re testing another option now that will try and handle the encoding error and retry with a different encoding.

jonathanmeltzer · March 25, 2022, 12:26am

I really appreciate your research and your willingness to find a workaround. If I need to specify encoding with files I create, I can do that. My issue is that sometimes I am given files that I did not create, and unless I use the Powershell script, I have no control over what the encoding is once it hits the iterator. I am wondering - is there a way to specify encoding in the CSV iterator action? Then I can tell it what encoding to try with the particular file I am giving to that action.

SFTP download of CSV file does not seem to be UTF-8 encoded - is there an option for this?

Products

Support

Resources

Company