Word VBA >> !! Reading character by character from a string and check it

by U3RldmU » Sun, 25 Jul 2004 03:11:33 GMT

hi folks,

i have a table, with 11 rows in a word doc.
i need to read Names and Dates from a Sentence.
Names are all in UPPERCASE as STEVE.
Dates are in different formats.
i need a routine to read character by character and check if Uppercase, using UCase or if Lowercase using LCase or any other character.
Using the same routine i need to recognise a date from the sentence.
These are then displayed in text boxes in visual basic.
In this table there is the following characters: text, integers, spaces " " and foward slashes /.

any comment appreciated. i need to start from somewhere!

Regards,
Steve


Word VBA >> !! Reading character by character from a string and check it

by Jezebel » Sun, 25 Jul 2004 07:24:49 GMT


First, UCase and LCase don't check characters, they convert them.

You can do this sort of checking direct comparison

If char >= "A" and char <= "Z" then ...

or using the Like operator

If char Like "[A-Z]" then ...


But better still to use regular expressions and the Find function.
Experiment using the Find dialog. Check 'Use Wildcards' ... eg look for
<[A-Z]*> to find words entirely in upper case.








using UCase or if Lowercase using LCase or any other character.
" and foward slashes /.





Word VBA >> !! Reading character by character from a string and check it

by Chad DeMeyer » Tue, 27 Jul 2004 01:13:30 GMT

Although you could use something like

If sText = UCase(sText) Then
'sText is upper case
End If
If sText = LCase(sText) Then
'sText is lower case
End If

This could be performed on whole words rather than just characters, since
the Range and Selection object have a .Words property which returns a
collection of ranges, each member being what MSWord recognizes as a word.

The date part is harder. There is an IsDate function in VBA which tests the
argument supplied and returns True if the expression can be converted to a
date, but how do you know the start and end points of the range to test?
Perhaps you could test each occurrence of three consecutive words with
IsDate, assuming that whatever the format of your dates they always contain
month, date, and year.

Regards,
Chad






"




!! Reading character by character from a string and check it

by Trevor Lowing » Tue, 27 Jul 2004 01:39:46 GMT

Steve,

It sure sounds like a good time to use Regular Expressions. Below is an
example I used to do somthing similar. You can find additional
information here:

http://msdn.microsoft.com/library/default.asp?URL=/library/en-us/dnclinic/html/scripting051099.asp

And for example patterns peopel have already built:

http://www.regexlib.com




Private Function FindPattern(strSearch)
'----------------------------------------------------------
' Name : FindTasker
' Scope : Private
' Type : Function
' Input(s) :
' Returns : Pattern If Found
' Description : Searches input string for Pattern and returns the
first occurance.
'----------------------------------------------------------
On Error GoTo Err_Init
Dim strMatch As String
Dim colMatches As Object
Dim objMatch As Object
Dim objRegExp As Object

' Create regular expression. Late Binding to prevent version errors
Set objRegExp = CreateObject("VBScript.RegExp")

With objRegExp
.Pattern = "([a-zA-Z])+-?\d{6}-?(\d)*" ' Set pattern
.IgnoreCase = True ' Set case
insensitivity.
.Global = True ' Set global
applicability.
End With
Set colMatches = objRegExp.Execute(strSearch) ' Execute
search.
'Load the Tasker Number
If colMatches.Count > 0 Then
Set objMatch = colMatches(0)
strMatch = UCase(objMatch.Value)
FindTasker = strMatch
End If
Set colMatches = Nothing
Set objMatch = Nothing
Set objRegExp = Nothing
Exit Function

Err_Init:
Err.Clear
'HandleError CurrentModule, "FindTasker", Err.Number, Err.Description

End Function









--


Similar Threads

1. read file character by character using winapi

2. count number of digits, characters, whitespace characters and words in a string

use a regular expression to count number of digits, characters,
whitespace characters and words in a string


so far i got this and it won't work



use strict;
use warnings;

my $string1 = "hello there my id is 2 104503";
my $string2 = "today is a nice day";

number1 ($string1 );
number1 ($string2 );
number2 ($string1 );
number2 ($string2 );

sub number1
{
	my $string = shift();


	if ($string =~ / \d/ ) {
		print "'$string' has a digit.\n";
}
	else {
		print "'$string' has no  digits.\n";

}
}

sub number2
{
	my $string = shift();


	if ($string =~ /\w/ ) {
		print "'$string' has a digit.\n";
}
	else {
		print "'$string' has no  digits.\n";

}
}


$in = <STDIN>;

3. Checking for invalid characters within a string - VB.Net

4. Check the first 2 characters of string

Hi Group,
How can I check that the first 2 characters of a string are the percent (%) 
sign?
string name = strSurname

Regards 

5. reading through a text file checking the first character on each line

6. check characters in string

Hi I have a userform with a textbox where the user will enter an email 
address.

What I would like to do is have some code that checks that the text entered 
is a valid email address format (like website forms do).

so what I was thinking was having code that:

checked that there were no spaces (or other characters like / etc) in the 
string
check that the string contained an @ character
and checked that after the @ character there is a .

Is anyone able to point me in the right direction?

Thanks. 


7. how to check for similar words in two character string variables

8. how to check for similar words in two character string

I won't comment on fuzzy matching/etc. because i'm not an expert there, and
a search of the L will come up with all sorts of results.  I will comment on
the general practice of finding words from var1 in var2.

This will iterate through VAR1 and find that word in VAR2.  It's entirely
literal, and certainly isn't up to the task you ask; not only do you need to
add fuzzy matching, but INDEXW is probably a bit too literal for your needs
regardless, and you almost certainly should exclude trivial strings [&, 'W',
etc.] from your search.

data var1;
infile datalines truncover;
input
@1 obsnum 6.
@7 var1 $50.
;
datalines;
    1 CATARACT EXTRACTION WITH IOL-RIGHT
    2 CATARACT EXTRACTION WITH IOL-LEFT
    3 SPINE THORACO LUMBAR POSTERIOR FUSION SILO
    4 SPINE THORACO LUMBAR POSTERIOR FUSION SILO
    5 KNEE ARTHROPLASTY TOTAL UNILATERAL
    6 PHACOEMULSIFICATION W IOL
    7 LEG-LIGATION & STRIPPING VARICOSE VEINS -BILATERAL
    8 LEG-LIGATION & STRIPPING VARICOSE VEINS -BILATERAL
    9 EYE-EXTRACTION CATARACT IOL
   10 EYE-EXTRACTION CATARACT IOL
;;;;
run;
data var2;
infile datalines truncover;
input
@1 obsnum 6.
@7 var2 $80.
;
datalines;
    1 Excision total, lens extracapsular phakoemulsification technique w
    2 Excision total, lens extracapsular phakoemulsification technique w
    3 Installation of external appliance, circulatory system NEC extraco
    4 Fusion, spinal vertebrae open posterior approach [posterolateral a
    5 Implantation of internal device, knee joint with combined sources
    6 Excision total, lens extracapsular phakoemulsification technique w
    7 Excision partial, veins of leg NEC without use of tissue open appr
    8 Destruction, skin of leg using device NEC [electrocautery]
    9 Excision total, lens extracapsular phakoemulsification technique w
   10 Excision total, lens extracapsular phakoemulsification technique w
   ;;;;;;;
run;

data allvars;
merge var1 var2;
by obsnum;
run;

data want;
set allvars;
found=0;
format comp_found $100.;
do _n_ = 1 to countc(trim(compbl(var1)),' -')+1;
 _var1wd= scan(var1,_n_,' -');
 if indexw(upcase(var2),upcase(_var1wd))>0 then do;
    found=found+1;
    comp_found = catx('|',comp_found,_var1wd);
 end;
end;
run;

If you want to compare word in var1 to each word iteratively in var2 [not
using indexw], you would do:

data want;
set allvars;
found=0;
format comp_found $100.;
do _n_ = 1 to countc(trim(compbl(var1)),' -')+1;
 _var1wd= scan(var1,_n_,' -');
 do _t = 1 to countc(trim(compbl(var2)),' -')+1;
   _var2wd= scan(var2,_t,' -');
*   put _var1wd= _var2wd=;
   if upcase(_var1wd) = upcase(_var2wd) then do;
    found=found+1;
    comp_found = catx('|',comp_found,_var1wd);
   end;
 end;
end;
run;


You can then do whatever you want in terms of fuzzy matching instead of the
tenth line [the equality]. SOUNDEX is generally not very good, from what I
recall of previous discussions, but whichever method floats your boat would
fit in here.  I also have no idea what you want to do with these matches, so
I count them for you and concatenate them together.  Of course drop the
temporary variables, in the real execution.  Also, the SCAN and COUNTC
should have the appropriate word delimiters if things other than dash and
space are potential word delimiters.  If words with dashes included need to
be checked [so, IOL-RIGHT], you may have to play with the data some to get
it to behave [since IOL-RIGHT will not be checked in its entirety, just the
individual words].

Note that this is NOT very efficient; much more efficient would be using a
hash table, but I doubt that I have the knowledge of hash to come up with
the solution [though I may well try if I have a bit more time tonight].
Temporary arrays might also be faster, and not assigning the scanned
portions to variables might be faster, though I'm not really sure there and
i doubt it's much of a difference unless you're doing this on a godawfully
huge dataset.

-Joe

On Fri, Jul 17, 2009 at 6:01 PM, Cornel Lencar
< XXXX@XXXXX.COM >wrote:

> Hi,
>
> I would like to have the Text Miner application available, but I don't.
>
> I havea dataset with two character string variables. Each variable can
> have from one to many (30-40) individual, distinct words in it. I would
> like to check if any of the words in VARIABLE1 can be found in VARIABLE2.
> It would be nice to see if there are more than one of the VARIABLE1 words
> found in VARIABLE2.
>
> An example set:
>
>   Obs VARIABLE1
>
>     1 CATARACT EXTRACTION WITH IOL-RIGHT
>     2 CATARACT EXTRACTION WITH IOL-LEFT
>     3 SPINE THORACO LUMBAR POSTERIOR FUSION SILO
>     4 SPINE THORACO LUMBAR POSTERIOR FUSION SILO
>     5 KNEE ARTHROPLASTY TOTAL UNILATERAL
>     6 PHACOEMULSIFICATION W IOL
>     7 LEG-LIGATION & STRIPPING VARICOSE VEINS -BILATERAL
>     8 LEG-LIGATION & STRIPPING VARICOSE VEINS -BILATERAL
>     9 EYE-EXTRACTION CATARACT IOL
>    10 EYE-EXTRACTION CATARACT IOL
>
>   Obs VARIABLE2
>
>     1 Excision total, lens extracapsular phakoemulsification technique w
>     2 Excision total, lens extracapsular phakoemulsification technique w
>     3 Installation of external appliance, circulatory system NEC extraco
>     4 Fusion, spinal vertebrae open posterior approach [posterolateral a
>     5 Implantation of internal device, knee joint with combined sources
>     6 Excision total, lens extracapsular phakoemulsification technique w
>     7 Excision partial, veins of leg NEC without use of tissue open appr
>     8 Destruction, skin of leg using device NEC [electrocautery]
>     9 Excision total, lens extracapsular phakoemulsification technique w
>    10 Excision total, lens extracapsular phakoemulsification technique w
>
> Observations 4, 5, 6, 7, and 8 have some common words between VARIABLE1
> and VARIABLE2 although there are differences in the case type, in the fact
> that in VAR2 some words are composite, and also some words differ sligthly:
>     6 PHACOEMULSIFICATION phakoemulsification
>
> I imagine that the first variable needs to be split in the separate words
> and each word needs to be checked against every of the VARIABLE2 words,
> maybe with soundex?
>
> Any suggestions are more than welcomed.
>
> Sincerely,
>
> Cornel Lencar
>