Categories: MSDN / DotNet / Java / Scripts / Linux / PHP Ask - La ask - La Answer

Help With TABS in VB.Net

I am writing an app in VB.Net and have encountered a problem. I have a text file with data extracted from a database. I need to remove the TAB chars and replace them with spaces. The problem is that when I open the file in a text editor (such as Programmer's File Editor), I can see the length of a TAB (in terms of characters) - I can see the column number where the TAB begins and and where it ends, allowing me to calculate it's length in number of characters. The problem I am having is determining the actual number of characters that a TAB contains in VB.Net (as the text file contains TABs with variable character lengths - some TABs are 5 characters long, some are 7, others 8, etc.) and replacing it with spaces of the same length. For example,

If I find a TAB that is 7 characters long, I must replace it with an empty string that is 7 characters long.

I know how to do the location, removal, and replacement of the TAB but just to calculate the exact amount of characters that the TAB is, is the problem. I have tried using the Length function in VB.Net but it takes the TAB character as 1 character space.

Any help would be greatfully appreciated.
[1197 byte] By [Seelen] at [2007-11-11 9:58:37]
# 1 Re: Help With TABS in VB.Net
A tab is one character, the "size" of the displayed tab may be dependent on the font your using or the tab stops in the application it's displaying in. To replace the tabs with a space, you can use the replace function and search for CHR(9) which is a tab character.

If you need to keep the characters alignment, why are you removing the tabs?
joewmaki at 2007-11-11 21:45:10 >
# 2 Re: Help With TABS in VB.Net
I do understand what you mean but I think that you misunderstand my question. The TABs are in a standard ASCII text file so there is only one font being used (MS Sans Serif, I think). I need to remove the TABs because each line in the text file represents a record in which specific data starts at certain "columns" in the line (eg. data at "column 1 to column 15" may represent their social security number, then "column 16 to column 25" may represent their first name, etc.)

I know a better way would be to use a file of records to put the data in but the company I am working for gets the data from their clients in this format (their clients extract the data from their database send it to them. This is not going to change anytime soon so I just have to find a solution to this text file problem). The text file must be free of TABs so that the data it contains can be loaded into another database but some lines have 1 or more TABs where as other lines do not have any TABs.

When I remove the TABs using CHR(9) or vbTabs, only one character space is replaced and the alignment in the text file goes completely off.

To repeat, when I open the text file in a text editor such as Programmer's File Editor I can see at which "column" the TAB starts at and which "column" the TAB ends, thus allowing me to calculate it's "size/length". I have found TABs of differing sizes/lengths such as 4 characters, 7 characters, 8 characters, etc.

I have attached an example file of such a TAB problem. You can have a look and see what I mean. When I open up the file in Programmer's File Editor, I can see that the TAB in the example file starts at "column" 19 and ends at "column" 25 giving it a size/length of 6 "characters".

Now if I use the CHR(9) to replace the TAB, the TAB will be replaced with a single character whereas I need to replace the TAB with the same amount of spaces as it's size/length in order to maintain formatting.

Got any other suggestions?
Seelen at 2007-11-11 21:46:10 >
# 3 Re: Help With TABS in VB.Net
Thank you for the clarification. I now see the issue.

The only way I can think of is to recreate the file line by line. Read a line, check it for tab characters, if they exist parse out each column of data, strip the tabs from the problem columns and then rewrite with each column padded appropriately with spaces. Of course, this assumes you know the structure of the data in the file. Hopefully someone else can come up with a more efficeint solution.

The correct place to do this would be during the process that parses the text file and imports the data into your database. But I suspect you don't control this process.
joewmaki at 2007-11-11 21:47:15 >
# 4 Re: Help With TABS in VB.Net
Ok, I feel ashamed to even mention this. I don't know if you fixed this yet, if so I'd like to know. As a work around if this file only comes in say once a week may be as follows. If there's going to be a file everyday, this isn't even an option.

You can always open the file in excel, select Delimited then Tab and Space, then save it. Open a new database in access or even an existing database which ever you please, import the .xls file you created, then export the data you imported back out to a file.

Again, this is a crappy way to do this but at least it will get the file in a format you can work with while tyring to figure out how to fix this.

Another thought:

If you know the exact layout of the file. You could always open the file, then loop through each line/record in the file and do a .replace(" ", "") this will remove every white space from the string.

example:

dim str as string = "THIS IS A TEST"
str = str.replace(" ", "")

Might work, might not... Hope this help a littel :)

jb
jcb1269 at 2007-11-11 21:48:13 >
# 5 Re: Help With TABS in VB.Net
Hi guys. Unfortunately I still have not found a solution to my problem. As to your solution jcb, I did think of doing it that way, it but we get anywhere between 30 to 50 files a day, each with lines of between 1 and 2 million. So as you can see it is a real processor intensive process of removal (all special characters and TABS have to be removed). I have optomised this cleaning process to run at around 1 MB per second when removing, but now I just need to add in the TAB removal and replacement.

I thought that maybe using a template file would help (which I could compare against) but in the files we receive, sometimes data is missing, so again, if it is compared, then the alignment goes off.

I just had a thought. What if I try to get the actual byte length of the string before removal of the TAB and after removal of the TAB. Compare them and if there is a difference then fill in the spaces where the TAB was removed so that the byte lengths match once again. Do your'll think this could possibly work?

Let me know if your'll think of any other possible solutions (no matter how rediculous they may seem because at this point I am left scratching my head not knowing how to solve this problem). Also, don't worry about it being fast or anything like that. I can always optimize my code later on, the main thing is that it works.

Again any help/thoughts on solving this problem will be highly appreciated!!!
Seelen at 2007-11-11 21:49:18 >
# 6 Re: Help With TABS in VB.Net
Hi again guys,

I just tried using the byte length method by placing TABs of differing lengths (TAB sizes with "character" lengths of 2, 3, 4, 5, 6, 7, 8) and they all reported a byte size of 3 bytes. So no matter what the "character" length of a TAB, all are reported to be 3 bytes in size, so that doesn't work. Another idea blown to pieces!!! :eek:

Well I'm out of ideas for now - I've tried everything I could think of but still no success. You guys got any other ideas?
Seelen at 2007-11-11 21:50:17 >
# 7 Re: Help With TABS in VB.Net
Do you know the structure of the files. Can you parse the data into the required columns for import into your databases? Since your trying to duplicate the existing structure (without special characters) I can't assume you know it. I still think your only option is to parse the data into the required columns, strip characters from each column and recreate the file correctly. This can run pretty quickly using a stringbuilder.
joewmaki at 2007-11-11 21:51:16 >